Professional Documents
Culture Documents
Unit - I
Unit - I
Unit - I
Data Structure Stack Queues Linked List Trees Application of Trees Sets and Disjoint Set Union Huffman Algorithm Graphs Depth First Search Breadth First Search Searching Techniques Hashing
1.0
Introduction
This is the introductory unit of Data Structure. This chapter gives an in-depth knowledge of data structure. This unit helps to design and develop the computer programs of any kind. The fundamental nature of programming and data processing requires efficient algorithms for access of the data in main memory and storage devices. The effectiveness is directly linked to the structure of the data being processed.
1.1
Objective
The objective of this lesson is to make the learner understand the fundamental concepts of Data Structure. This lesson makes you familiar with the concepts of stacks, queues, linked lists, trees and graphs. They are the most commonly used data structures.
1.2
Content
Page 1
1.2.2 Stack
The most common form of data organizing in computer programs is the ordered list or linear list. The stack is a linear data structure. Stacks are suitable data structures for backtracking or to return to the previous state. Definition: - Stack is an ordered collection of items into which insertion or deletion of items could be done, at one end, that is, at the top. An item may be inserted or deleted only from the top of the stack. This means that, the last element to be added to the stack is the first item to be removed. So the stacks are called Last In First Out (LIFO) lists. Example of stack Most common example of stack is a stack of dishes; another one is a stack of folded towels. Basic Stack Operations
Page 2
Top
Figure 1.1 Example of a Stack Note: Each time a new element is inserted in the stack, the stack pointer value is incremented by one, before another element is placed on the stack. Similarly, the pointer is decremented by one each time a deletion is made from the stack. Since the stack provides insertion and deletion of items, the stack is called dynamic data structure, that is, constantly changing object. From the definition one conclude that the operation takes place at a single end called top. New item may be added to the top of the stack or items, which are at the top of the stack, may be removed. The pointer keeps track of the elements in the stack. For Example: if the data is feeded into the stack that is, A, B, C, D and E are inserted into a stack in that order, then the first element to be removed would be E. Using Linked Lists, in which a node is collection of data link information, stacks can be represented. A stack can be represented using nodes with two fields possibly called data and link. The data field of each node contains the actual element in the stack and the corresponding link field points to the node containing the next item in the stack. The link field of the last node is zero. The following figure fig1.2 shows the data link of the five elements linked in stack. E Stack D C B A 0
Implementation of stacks To specify the basic operations of stacks, the programming details for their implementation is needed. Begin with contiguous implementation, where the stack entries are stored in an array. Page 3
Page 5
Figure 1.3 queue with five elements The above figure contains five elements A1 to A5. A1 is at the front of the queue and A5 is the at the rear. An element can been deleted only from the front of the queue. That is, A1 is removed and A2 is the front of the queue. Types of Queues There are three types of queues are given below: i. Circular Queue ii. Deque iii. Priority Queue
[4] [ [3] [2] J2 [1] [n-2] J1 [0] [n-1] front =0; rear=3 Figure 1.4 Circular queue of capacity n-1 containing four elements J1, J2, J3 and J4 J4 J3 [n-3] [n-4]
Page 6
[5]
Page 7
Algorithm to insert into a queue QINSERT(Q, F, R, N, Y) Q Queue F & R Front and Rear pointers N number of elements Y element to be inserted 1. [Check for Overflow] if R >= N then write(Overflow) return. 2. [Increment rear pointer] R=R+1 3. [Insert element] Q[R] = Y 4. [Set the front pointer properly] if F = 0 then F=1 Algorithm to delete an element from Queue QDELETE(Q, F, R) 1. [Check for underflow] if F = 0 then Write (Underflow)
Page 8
Page 9
iv.
Figure 1.5 linear linked list Basic operation of linked list List is dynamic structure that is number of nodes on a list may vary dramatically based on a list operation. The operations are given below 1. Adding element in a linked list 2. Removing element in a linked list Adding an element into the linked list There are three ways of adding elements in a list; one is adding an element to the head of a linked list, another one is adding an element to the tail of a list and the last method being adding an element after the current node. To add a new node in front of a linked list, the pointer must allocate the appropriate memory to the node that is, the head of the list. Then check the pointer of the current element if it is not the last element in the list. If so, it will change the pointers accordingly. Algorithm to insert into a list at the beginning of the list if pHead = NULL then pHead = pNode pTail = pNode end if. where pHead is the head node, pTail is the tail node and pNode is the node to be inserted. Algorithm to insert node at the tail of the list If pHead != Null then Page 10
Page 11
1.2.5 Trees
Tree is a non-linear data structure. This structure is mainly used to represent data containing a hierarchical relationship between elements. Example: record, family trees. In this structure data is represented in the hierarchical form. that is, relationship between individual data items. Definition: - A Tree is a data structure used to represent data containing a hierarchical relation between its elements. Applications of Tree For example a tree can be used to represent the Unix file system in which files and subdirectories are stored under directories. Another example: is to represent the records in a file in which elementary items are stored under group items. Basic structure of tree In a tree each item is called node or leaf. Each item contains data and pointer, which contain address of the node. The first node in the tree is called root. Each piece under node is called subtree. A node under which the subtree exists is called parent node. A node has no subtree is called terminal node. Level of root = 0. Level of any node = level of parent + 1 0
A
Figure 1.6 Sample tree Consider the above figure has four levels of trees and thirteen nodes. The highest level that is, level 1 node A is called as root node. The number of sub trees of a node is called its degree. The degree of A is 3, of C is 1 and G is 0. Nodes having zero degree are Page 12
1 2 D 3 4 H
2 3
Figure 1.7 Two Sample Binary Tree Definition: - A Binary tree is a finite set of nodes that is, either empty or consists of a root and two disjoint binary trees called left and right subtrees. Properties of binary Tree 1. A binary tree with N internal nodes has maximum of (N+1) nodes. 2. The external path length of any binary tree with N Internal nodes is 2N greater than the internal path length. 3. The height of a full binary tree with N internal nodes is about log2N. A binary tree is a useful data structure when two-way decisions must be made at each point in a process. A binary tree can reduce the number of comparisons. The first number in the list is placed in a node that is, established as the root of a binary tree. Each successive number in the list is then compared to the number in the root. If it matches, we have a duplicate. If it is smaller, the left subtree is examined; if it is larger, the right subtree is examined. If the subtree is empty, the number is not a duplicate and is placed into a new node at that position in the tree. If the subtree is nonempty, compare the number to the contents of the root of the subtree and the entire process is repeated with the subtree. Page 13
Figure 1.8 Expression tree This is an example of an expression of tree for (A+B*C)-(D*E) Inorder The general strategy of Left-Root-Right. If T is not empty, 1. First traverse(inorder) the left subtree. 2. Visit the root. 3. Traverse the right subtree in inorder. Postorder Traversal In this traversal, it is a left-right-root strategy. 1.Traverse the left subtree in postorder. 2.Traverse the right subtree in postorder. 3.Visit the root. Preorder Traversal In this traversal strategy is root-left-right. Preorder traversal is employed in depth first search. Visit the first root, then recursively the following one: Visit the root. Page 14
Many algorithms that use binary trees proceed in two phases. The first phase builds a binary tree and the second traverses the tree. Applications of Trees Tree has several applications. One important application of tree is representation. The implementation of set manipulation in tree is given below. set
a
S1 g h
b Page 15 e j
S2
S3 d
Figure 1.9(a) Possible tree representation of sets Let us now have a look at the operations that could be performed on these sets, namely: 1. Disjoint set union: Union of two disjoint sets Si and Sj is represented as SiUSj and it would contain all elements x such that x is in either S i or Sj. Thus, according to our example, S1US2 = {a, g, h, i, b, e, j}. Since disjoint sets had been considered, it can be assumed that after the union of Si and Sj, the sets Si and Sj do not exist independently; that is, they are replaced by SiUSj in the collection of sets. Find(i): This is for finding the set to which a particular element belongs. Thus, d is in set S3 and a is in set S1.
Union and Find Operations The union of two trees can be easily obtained by making one tree, a sub tree of the other. Thus the two sets S1 and S2 from Figure(a) can be united and S1US2 could then have one of the representations of Figure (b).
a g
S1US2 (or)
h e
b j
b
Page 16
S1US2 Figure 1.10(b) Possible representations of S1US2 Union can be accomplished easily if, with each set name, a pointer is kept to the root of the tree representing the set. If, in addition, each root has a pointer to the set name, then to determine which set an element is currently in, let parent links follow to the root of its tree and use the pointer to the set name. To unite sets Si and Sj, unite the trees with roots. The roots for our example are FindPointer(Si) and FindPointer(Sj). FindPointer is a function that determines the root of a set. This is done by an examination of the [set name, pointer] table. The operation of Find(i) is to determine the root of the tree containing the element i. The function Union(i,j) requires two trees with roots i and j be joined.
Page 17
Figure 1.11 Encoding Tree an encoding for each character is found by following the tree from the root to the character in the leaf: the encoding is the string of symbols on each branch followed. For example: String Encoding TEA 10 00 010 SEA 011 00 010 TEN 10 00 110 As desired, the highest frequency letters - E and T - have two digit encodings, whereas all the others have three digit encodings. Encoding would be done with a lookup table. A divide-and-conquer approach might lead to the question, which characters should appear in the left and right sub trees and trying to build the tree from the top down. As with the optimal binary search tree, this will lead to an exponential time algorithm. Operation of the Huffman Algorithm The following diagrams show how a Huffman encoding tree is built using a straight-forward greedy algorithm which combines the two smallest-weight trees at every step.
Page 18
Combine the two lowest frequencies, F and E, to form a sub-tree of weight 14. Move it into its correct place.
Again combine the two lowest frequencies, C and B, to form a sub-tree of weight 25. Move it into its correct place.
Now the sub-tree with weight, 14 and D are combined to make a tree of weight, 30. Move it to its correct place.
Now the two lowest weights are held by the "25" and "30" sub-trees, so, combine them to make one of weight, 55. Move it after the A.
Page 19
Finally, combine the A and the "55" sub-tree to produce the final tree. The encoding table is:
A C B F E D 0 100 101 1100 1101 111
A greedy approach places our n characters in n sub-trees and starts by combining the two least weight nodes into a tree which is assigned the sum of the two leaf node weights as the weight for its root node. The time complexity of the Huffman algorithm is O(n log n). Using a heap to store the weight of each tree, each iteration requires O(log n) time to determine the cheapest weight and insert the new weight. There are O(n) iterations, one for each item. Decoding Huffman-Encoded Data With these variable length strings, its not possible to break up an encoded string of bits into characters.
Page 20
1.2.8 Graphs
A Graph G consists of set of vertices (V) and set of edges (E). V is finite and nonempty. E is a set of pairs of vertices called edges. V(G) represent vertices and E(G) represents edges respectively. 1 2 3 1 1
3 c. G3
a. G1
Definition: - A directed graph G consists of a set V and E. the set V is a finite, nonempty set of vertices. The set E is a set of pairs of vertices; these pairs are called edges. The notations V (G) and E(G) represents the set of vertices and edges respectively of graph G.
Page 21
1 1 1 1 1 1 1 a 11 1 1
B 5 5 5 5
22 E
1 F 1 1 3 3 3
G D C 4 4 4
Figure 1.14 An Undirected Graph (1) In the fig.1,2,3,4,5 are the nodes and A,B,C,D,E,F,G are the edges. Undirected graph: - The pair of vertices representing any edge is unordered. Thus the pairs (u,v) and (v,u) represent the same edge. Path is a sequence of distinct vertices each adjacent to the next. Cycle is a path containing at least three vertices such that the last vertex on the path is adjacent to the first. Connected A graph is connected if there is a path from any vertex to any other vertex. Free tree is defined as a connected undirected graph with no cycles. Directed Graph Each edge of graph G(V,E) is represented by a directed pair<u,v> u is the tail and v is the Head of the edge therefore <v,u> and <u,v> represent two different edges in a directed graph or digraph. E is a set of ordered pair of edges There are several kinds of directed graph: 1 1
B 5 C
2 E
3 D
Figure 1.15 A graph and its strongly and weakly connected components Page 22
loop
Indegree of a node n is the number of edges having n as head. Outdegree of a node n is the number of edges having n as tail. 1
B 5 C
2 E
3 D
Figure 1.16 A Directed Graph Application of Graph The most common example of graph is a collection of cities and road between them. Graphs are most frequently applied in the following areas such as electrical circuit, transportation networks, chemical structures of crystals and artificial intelligence. Graph Representation There are three most commonly used graphs representation: Adjacency Matrices Adjacency Multi - lists Adjacency lists
Page 23
Figure 1.17 Graph and Adjacency Matrix Adjacency Matrix The adjacency matrix of G is a two dimensional n X n array say A, with the property that A(i,j)=1 if there is a edge between i an j for a directed graph and A(i,j)=0 if there is no such a edge in G. Consider the adjacency matrix for the graph G1, G2 and G3 in the figure 1.15. The adjacency matrix for an undirected graph is symmetric and for directed graph need not be symmetric. The space needed to represent a graph using its adjacency matrix is n2 bits. For an undirected graph the degree of any vertex is its row sum n j=1 A (i,j) for a directed graph the row sum is the out-degree while the column is the in degree. This below table shows the adjacency matrix for the given graph(1.14)
Page 24
0 1 1 0 1 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1 0 Figure 1.18 represent the adjacency matrix for the graph(1.14) Adjacency Multi-list In the adjacency list the representation of undirected graph each edge (vi,vj) is represented by two entries one on the list for vi and the other on the list for vj. The adjacency lists are actually maintained as multilist. For each edge there will be exactly one node, but this node will be in two lists i.e the adjacency list for each of the two nodes it is incident to. The new node structure is: M vertex1 vertex 2 list 1 list 2
Where m is a one-bit mark field that may be used to indicate whether or not the edge has been examined. The storage requirements are the same as for normal adjacency list except for the addition of the mark bit M. Adjacency Lists In this representation, n rows of adjacency matrix are represented as n linked lists. There is one list for each vertex in G. Each node has at least two fields, vertex containing the indices of vertices adjacent to vertex i and Link. 1 5
2 3
4 The adjacency list for the above graph is given below 1 2 3 4 5 1 2 3 2 3 3 4 5 4 5 5 4 Figure 1.18 Adjacency list structure for graph 5
Page 25
pop4 the nodes 6,7 are added pop5, the node 8 is added to queue visit 4 visit 5
pop 9 visit 9
Algorithm BFS visits all vertices reachable from v. Algorithm BFS(v) // A breadth first search of G is carried out //beginning at vertex v. For any node i, //visited[i]=1 if i has already been visited. The //graph G and array visited[] are global; //visited[] is initialized to zero. { u:v; //q is a queue of unexplored vertices. visited[v]:=1; repeat { for all vertices w adjacent from u do
Page 27
Page 28
pop2 the node 2 have adjacent node 5 2 and 5 unprocessed are push into stack visit 5
pop5 the node 5 have adjacent nodes. 8 so, 5 & 8 are pushed
Page 29
pop4 push 5 and 3 to stack visit 3 result : 1,2,4,6,9,7,5,8,3. The algorithm for Depth First search
Algorithm DFS(v) // v, w are the vertices of G. // Calls function DFS(v) Mark v as discovered. For each vertex w such that v,w belonging to G, If w is undiscovered then Dfs(G,w) Else Check vw without visiting w Mark v as finished. Dfs(G,w) Explore vw Visit w Explore from there as much as possible and backtrack from w to v. The details of depth first search technique available in 5th chapter.
1.2.10
Hashing
Searching Techniques
Hashing is one of the search time algorithms, this technique is often used to implement in dictionary. Hashing is one type of address generating technique. Definition: - Hashing is the search technique, where the elements are ordered with respect to some function of the key value. This function is called a hash function.
Page 30
Page 31
Page 32
1.3
Revision Points
Data Structure A data structure is a logical method of representing data in memory Stack A stack is a data structure in which insertions and deletions are restricted to one end. A stack, in other words, in called a LIFO (Last In First Out) list. Queue A Queue is a data structure in which insertions are made at one end (called the front) and deletions are made at another end (called the rear). It is also called as FIFO (First in First Out) list. Tree A Tree is a data structure used to represent data containing a hierarchical relation between its elements Binary Tree A Binary tree is a finite set of nodes that is, either empty or consists of a root and two disjoint binary trees called left and right subtrees. Huffman Algorithm Huffman problem is that of finding the minimum length bit string which can be used to encode a string of symbols. Graph A Graph G consists of set of vertices (V) and set of edges (E). V is finite and non-empty. E is a set of pairs of vertices called edges. V(G) represents vertices and E(G) represents edges respectively. Hashing Hashing is one type of address generating technique. The elements are ordered with respect to some function of the key value. This function is called a hash function.
1.4
Intext Questions
1 2 3 4 5 Define the data structure. List the types of data structure. Define a stack. Describe the stack operations in detail. Define a queue.
Page 33
1.5
Summary
Data Structure is a way of organizing data that considers not only the items stored, but also their relationship to each other. Stack is an ordered collection of items into which new items may be inserted and from which items may be deleted at one end, called top of the stack. Stack is a container, which follows LIFO principle Queue is an ordered list in which all insertions take place at one end, the rear and all deletion takes place at other end, the front. Queue is a container, which follows FIFO principle. A linked list is a chain of structure in which each structure consists of data as well as pointer, which stores the address of the next logical structure in the list. Linked List can grow and shrink dynamically. A tree is a data structure used to represent data containing a hierarchical relation between its elements. A binary tree is a finite set of nodes that is, either empty or consists of a root and two disjoint binary trees called left and right sub trees. Directed Graph has each edge, represented by a directed pair<u,v> u being the Tail and v the Head of the Graph. Breadth first search will use a queue as an auxiliary structure to hold nodes for future processing. Depth First search will use a stack pushing all unvisited vertices adjacent to the one being visited onto the stack and popping the stack to find the next vertex to visit. Hashing is one type of address generating technique. The elements are ordered with respect to some function of the key value. This function is called a hash function.
1.6
Terminal Exercises
1 Define a linked list.
Page 34
Supplementary Materials
1. Ellis Horowitz, Sartaj Sahni, Fundamentals of Computer Algorithms, Galgotia Publications, 1997. 2. Aho, Hopcroft, Ullman, Data Structures and Algorithms, Addison Wesley, 1987. 3. Sara Baase, Computer Algorithms: Introduction to Design and Analysis, Addison Wesley, 1989. 4. Jean Paul Trembly & Paul G.Sorenson, An introduction to Data Structures with Applications, McGraw-Hill, 1984.
1.8
Assignments
1. Explain the array implementation of Stack. 2. Discuss in detail about the various hashing functions.
1.9
Mark Allen Weiss, Data Structures and Algorithm Analysis in C++, Addison Wesley, 1999. 1. Yedidyah Langsam, Moshe J.Augenstein, Aaron M. Tanenbaum, Data Structures Using C and C++, Prentice-Hall, 1997.
Page 35
1.11 Keywords
Data Structure Tree Binary Tree Stack Queue Last in First out (LIFO) First in First out (FIFO)
Page 36