Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

UNIT III

Tree Data Structrue

A tree is a non-linear abstract data type with a hierarchy-based structure. It consists of nodes (where the
data is stored) that are connected via links. The tree data structure stems from a single node called a root
node and has subtrees connected to the root.

DEFINITION OF TREE:
Tree is collection of nodes (or) vertices and their edges (or) links. In tree data structure, every
individual element is called as Node. Node in a tree data structure stores the actual data of
that particular element and link to next element in hierarchical structure.

Note: 1. In a Tree, if we have N number of nodes then we can have a maximum of N-


1 number of links or edges.
2. Tree has no cycles.

TREE TERMINOLOGIES:
1. Root Node: In a Tree data structure, the first node is called as Root Node. Every tree
must have a root node. We can say that the root node is the origin of the tree data structure.
In any tree, there must be only one root node. We never have multiple root nodes in a tree.

2. Edge: In a Tree, the connecting link between any two nodes is called as EDGE. In a tree
with 'N' number of nodes there will be a maximum of 'N-1' number of edges.
3. Parent Node: In a Tree, the node which is a predecessor of any node is called as
PARENT NODE. In simple words, the node which has a branch from it to any other node is
called a parent node. Parent node can also be defined as "The node which has child /
children".

Here, A is parent of B&C. B is the parent of D,E&F and so on…


4. Child Node: In a Tree data structure, the node which is descendant of any node is called
as CHILD Node. In simple words, the node which has a link from its parent node is called as
child node. In a tree, any parent node can have any number of child nodes. In a tree, all the
nodes except root are child nodes.

5. Siblings: In a Tree data structure, nodes which belong to same Parent are called
as SIBLINGS. In simple words, the nodes with the same parent are called Sibling nodes.

6. Leaf Node: In a Tree data structure, the node which does not have a child is called
as LEAF Node. In simple words, a leaf is a node with no child. In a tree data structure, the
leaf nodes are also called as External Nodes. External node is also a node with no child. In a
tree, leaf node is also called as 'Terminal' node.
7. Internal Nodes: In a Tree data structure, the node which has atleast one child is called
as INTERNAL Node. In simple words, an internal node is a node with atleast one
child.

In a Tree data structure, nodes other than leaf nodes are called as Internal Nodes. The root
node is also said to be Internal Node if the tree has more than one node. Internal nodes are
also called as 'Non-Terminal' nodes.

8. Degree: In a Tree data structure, the total number of children of a node is called as
DEGREE of that Node. In simple words, the Degree of a node is total number of children it
has. The highest degree of a node among all the nodes in a tree is called as 'Degree of Tree'

Degree of Tree is: 3

9. Level: In a Tree data structure, the root node is said to be at Level 0 and the children of
root node are at Level 1 and the children of the nodes which are at Level 1 will be at Level 2
and so on... In simple words, in a tree each step from top to bottom is called as a Level and
the Level count starts with '0' and incremented by one at each level (Step).

10. Height: In a Tree data structure, the total number of edges from leaf node to a particular
node in the longest path is called as HEIGHT of that Node. In a tree, height of the root node
is said to be height of the tree. In a tree, height of all leaf nodes is '0'.

11. Depth: In a Tree data structure, the total number of egdes from root node to a particular
node is called as DEPTH of that Node. In a tree, the total number of edges from root node to
a leaf node in the longest path is said to be Depth of the tree. In simple words, the highest
depth of any leaf node in a tree is said to be depth of that tree. In a tree, depth of the root
node is '0'.

12. Path: In a Tree data structure, the sequence of Nodes and Edges from one node to
another node is called as PATH between that two Nodes. Length of a Path is total number
of nodes in that path. In below example the path A - B - E - J has length 4.
13. Sub Tree: In a Tree data structure, each child from a node forms a subtree
recursively. Every child node will form a subtree on its parent node.

Tree Traversal

Traversal is a process to visit all the nodes of a tree and may print their values too. Because, all
nodes are connected via edges (links) we always start from the root (head) node. That is, we
cannot randomly access a node in a tree. There are three ways which we use to traverse a tree −

 In-order Traversal
 Pre-order Traversal
 Post-order Traversal

Generally, we traverse a tree to search or locate a given item or key in the tree or to print all the
values it contains.

In-order Traversal

In this traversal method, the left subtree is visited first, then the root and later the right sub-tree.
We should always remember that every node may represent a subtree itself.

If a binary tree is traversed in-order, the output will produce sorted key values in an ascending
order.
We start from A, and following in-order traversal, we move to its left subtree B.B is also
traversed in-order. The process goes on until all the nodes are visited. The output of in-order
traversal of this tree will be −

D→B→E→A→F→C→G

Pre-order Traversal

In this traversal method, the root node is visited first, then the left subtree and finally the right
subtree.

We start from A, and following pre-order traversal, we first visit A itself and then move to its left
subtree B. B is also traversed pre-order. The process goes on until all the nodes are visited. The
output of pre-order traversal of this tree will be −

A→B→D→E→C→F→G

Post-order Traversal

In this traversal method, the root node is visited last, hence the name. First we traverse the left
subtree, then the right subtree and finally the root node.
We start from A, and following pre-order traversal, we first visit the left subtree B. B is also
traversed post-order. The process goes on until all the nodes are visited. The output of post-order
traversal of this tree will be −

D→E→B→F→G→C→A

Binary Tree ADT in Data Structure

Basic concept

A binary tree is defined as a tree in which no node can have more than two children. The highest
degree of any node is two. This indicates that the degree of a binary tree is either zero or one or
two.

In the above fig., the binary tree consists of a root and two sub trees TreeLeft & TreeRight. All
nodes to the left of the binary tree are denoted as left subtrees and all nodes to the right of a
binary tree are referred to as right subtrees.

Implementation

A binary tree has maximum two children; we can assign direct pointers to them. The declaration
of tree nodes is same as in structure to that for doubly linked lists, in that a node is a structure
including the key information plus two pointers (left and right) to other nodes.
Types of Binary Tree

Strictly binary tree

Strictly binary tree is defined as a binary tree where all the nodes will have either zero or two
children. It does not include one child in any node.

Skew tree

A skew tree is defined as a binary tree in which every node except the leaf has only one child
node. There are two types of skew tree, i.e. left skewed binary tree and right skewed binary tree.

Left skewed binary tree

A left skew tree has node associated with only the left child. It is a binary tree contains only left
subtrees.

Right skewed binary tree

A right skew tree has node associated with only the right child. It is a binary tree contains only
right subtrees.

Full binary tree or proper binary tree

A binary tree is defined as a full binary tree if all leaves are at the same level and every non leaf
node has exactly two children and it should consist of highest possible number of nodes in all
levels. A full binary tree of height h has maximum 2h+1 – 1 nodes.

Complete binary tree

Every non leaf node has exactly two children but all leaves are not necessary to belong at the
same level. A complete binary tree is defined as one where all levels have the highest number of
nodes except the last level. The last level elements should be filled from left to right direction.

Almost complete binary tree

An almost complete binary tree is defined as a tree in which each node that has a right child also
has a left child. Having a left child does not need a node to have a right child
Differences between General Tree and Binary Tree

General Tree

 General tree has no limit of number of children.


 Evaluating any expression is hard in general trees.

Binary Tree

 A binary tree has maximum two children


 Evaluation of expression is simple in binary tree.

Application of trees

 Manipulation of arithmetic expression


 Construction of symbol table
 Analysis of Syntax
 Writing Grammar
 Creation of Expression Tree

Expression tree in data structure

The expression tree is a tree used to represent the various expressions. The tree data structure is
used to represent the expressional statements. In this tree, the internal node always denotes the
operators.

o The leaf nodes always denote the operands.


o The operations are always performed on these operands.
o The operator present in the depth of the tree is always at the highest priority.
o The operator, which is not much at the depth in the tree, is always at the lowest priority
compared to the operators lying at the depth.
o The operand will always present at a depth of the tree; hence it is considered the highest
priority among all the operators.
o In short, we can summarize it as the value present at a depth of the tree is at the highest
priority compared with the other operators present at the top of the tree.
o The main use of these expression trees is that it is used to evaluate,
analyze and modify the various expressions.
o It is also used to find out the associativity of each operator in the expression.
o For example, the + operator is the left-associative and / is the right-associative.
o The dilemma of this associativity has been cleared by using the expression trees.
o These expression trees are formed by using a context-free grammar.
o We have associated a rule in context-free grammars in front of each grammar production.
o These rules are also known as semantic rules, and by using these semantic rules, we can
be easily able to construct the expression trees.
o It is one of the major parts of compiler design and belongs to the semantic analysis phase.
o In this semantic analysis, we will use the syntax-directed translations, and in the form of
output, we will produce the annotated parse tree.
o An annotated parse tree is nothing but the simple parse associated with the type attribute
and each production rule.
o The main objective of using the expression trees is to make complex expressions and can
be easily be evaluated using these expression trees.
o It is immutable, and once we have created an expression tree, we can not change it or
modify it further.
o To make more modifications, it is required to construct the new expression tree wholly.
o It is also used to solve the postfix, prefix, and infix expression evaluation.

Expression trees play a very important role in representing the language-level code in the form
of the data, which is mainly stored in the tree-like structure. It is also used in the memory
representation of the lambda expression. Using the tree data structure, we can express the
lambda expression more transparently and explicitly. It is first created to convert the code
segment onto the data segment so that the expression can easily be evaluated.

The expression tree is a binary tree in which each external or leaf node corresponds to the
operand and each internal or parent node corresponds to the operators so for example expression
tree for 7 + ((1+8)*3) would be:
Applications of Tree:
 File Systems: The file system of a computer is often represented as a tree. Each folder or
directory is a node in the tree, and files are the leaves.
 XML Parsing: Trees are used to parse and process XML documents. An XML document
can be thought of as a tree, with elements as nodes and attributes as properties of the nodes.
 Database Indexing: Many databases use trees to index their data. The B-tree and its
variations are commonly used for this purpose.
 Compiler Design: The syntax of programming languages is often defined using a tree
structure called a parse tree. This is used by compilers to understand the structure of the
code and generate machine code from it.
 Artificial Intelligence: Decision trees are often used in artificial intelligence to make
decisions based on a series of criteria.
Binary Search tree

What is a Binary Search tree?

A binary search tree follows some order to arrange the elements. In a Binary search tree, the
value of left node must be smaller than the parent node, and the value of right node must be
greater than the parent node. This rule is applied recursively to the left and right subtrees of the
root.

Let's understand the concept of Binary search tree with an example.

In the above figure, we can observe that the root node is 40, and all the nodes of the left subtree
are smaller than the root node, and all the nodes of the right subtree are greater than the root
node.
Similarly, we can see the left child of root node is greater than its left child and smaller than its
right child. So, it also satisfies the property of binary search tree. Therefore, we can say that the
tree in the above image is a binary search tree.

Suppose if we change the value of node 35 to 55 in the above tree, check whether the tree will be
binary search tree or not.

In the above tree, the value of root node is 40, which is greater than its left child 30 but smaller
than right child of 30, i.e., 55. So, the above tree does not satisfy the property of Binary search
tree. Therefore, the above tree is not a binary search tree.

Advantages of Binary search tree


o Searching an element in the Binary search tree is easy as we always have a hint that
which subtree has the desired element.
o As compared to array and linked lists, insertion and deletion operations are faster in BST.

Threaded Binary Tree

In this article, we will understand about the threaded binary tree in detail.

What do you mean by Threaded Binary Tree?

In the linked representation of binary trees, more than one half of the link fields contain NULL
values which results in wastage of storage space. If a binary tree consists of n nodes
then n+1 link fields contain NULL values. So in order to effectively manage the space, a method
was devised by Perlis and Thornton in which the NULL links are replaced with special links
known as threads. Such binary trees with threads are known as threaded binary trees. Each
node in a threaded binary tree either contains a link to its child node or thread to other nodes in
the tree.
Types of Threaded Binary Tree

There are two types of threaded Binary Tree:

o One-way threaded Binary Tree


o Two-way threaded Binary Tree

One-way threaded Binary trees:

In one-way threaded binary trees, a thread will appear either in the right or left link field of a
node. If it appears in the right link field of a node then it will point to the next node that will
appear on performing in order traversal. Such trees are called Right threaded binary trees. If
thread appears in the left field of a node then it will point to the nodes inorder predecessor. Such
trees are called Left threaded binary trees. Left threaded binary trees are used less often as they
don't yield the last advantages of right threaded binary trees. In one-way threaded binary trees,
the right link field of last node and left link field of first node contains a NULL. In order to
distinguish threads from normal links they are represented by dotted lines.
The above figure shows the inorder traversal of this binary tree yields D, B, E, A, C, F. When
this tree is represented as a right threaded binary tree, the right link field of leaf node D which
contains a NULL value is replaced with a thread that points to node B which is the inorder
successor of a node D. In the same way other nodes containing values in the right link field will
contain NULL value.

Two-way threaded Binary Trees:

In two-way threaded Binary trees, the right link field of a node containing NULL values is
replaced by a thread that points to nodes inorder successor and left field of a node containing
NULL values is replaced by a thread that points to nodes inorder predecessor.

The above figure shows the inorder traversal of this binary tree yields D, B, E, G, A, C, F. If we
consider the two-way threaded Binary tree, the node E whose left field contains NULL is
replaced by a thread pointing to its inorder predecessor i.e. node B. Similarly, for node G whose
right and left linked fields contain NULL values are replaced by threads such that right link field
points to its inorder successor and left link field points to its inorder predecessor. In the same
way, other nodes containing NULL values in their link fields are filled with threads.

In the above figure of two-way threaded Binary tree, we noticed that no left thread is possible for
the first node and no right thread is possible for the last node. This is because they don't have any
inorder predecessor and successor respectively. This is indicated by threads pointing nowhere.
So in order to maintain the uniformity of threads, we maintain a special node called the header
node. The header node does not contain any data part and its left link field points to the root
node and its right link field points to itself. If this header node is included in the two-way
threaded Binary tree then this node becomes the inorder predecessor of the first node and inorder
successor of the last node. Now threads of left link fields of the first node and right link fields of
the last node will point to the header node.

AVL Trees

he first type of self-balancing binary search tree to be invented is the AVL tree. The name AVL
tree is coined after its inventor's names − Adelson-Velsky and Landis.

In AVL trees, the difference between the heights of left and right subtrees, known as the Balance
Factor, must be at most one. Once the difference exceeds one, the tree automatically executes
the balancing algorithm until the difference becomes one again.

BALANCE FACTOR =
HEIGHT(LEFT SUBTREE) − HEIGHT(RIGHT SUBTREE)

There are usually four cases of rotation in the balancing algorithm of AVL trees: LL, RR, LR,
RL.
LL Rotations

LL rotation is performed when the node is inserted into the right subtree leading to an
unbalanced tree. This is a single left rotation to make the tree balanced again −

The node where the unbalance occurs becomes the left child and the newly added node becomes
the right child with the middle node as the parent node.

RR Rotations

RR rotation is performed when the node is inserted into the left subtree leading to an unbalanced
tree. This is a single right rotation to make the tree balanced again −

The node where the unbalance occurs becomes the right child and the newly added node
becomes the left child with the middle node as the parent node.

LR Rotations

LR rotation is the extended version of the previous single rotations, also called a double rotation.
It is performed when a node is inserted into the right subtree of the left subtree. The LR rotation
is a combination of the left rotation followed by the right rotation. There are multiple steps to be
followed to carry this out.

 Consider an example with “A” as the root node, “B” as the left child of “A” and “C” as
the right child of “B”.
 Since the unbalance occurs at A, a left rotation is applied on the child nodes of A, i.e. B
and C.
 After the rotation, the C node becomes the left child of A and B becomes the left child of
C.
 The unbalance still persists, therefore a right rotation is applied at the root node A and the
left child C.
 After the final right rotation, C becomes the root node, A becomes the right child and B is
the left child.

RL Rotations

RL rotation is also the extended version of the previous single rotations, hence it is called a
double rotation and it is performed if a node is inserted into the left subtree of the right subtree.
The RL rotation is a combination of the right rotation followed by the left rotation. There are
multiple steps to be followed to carry this out.

 Consider an example with “A” as the root node, “B” as the right child of “A” and “C” as
the left child of “B”.
 Since the unbalance occurs at A, a right rotation is applied on the child nodes of A, i.e. B
and C.
 After the rotation, the C node becomes the right child of A and B becomes the right child
of C.
 The unbalance still persists, therefore a left rotation is applied at the root node A and the
right child C.
 After the final left rotation, C becomes the root node, A becomes the left child and B is
the right child.

B Trees

B trees are extended binary search trees that are specialized in m-way searching, since the order
of B trees is 'm'. Order of a tree is defined as the maximum number of children a node can
accommodate. Therefore, the height of a b tree is relatively smaller than the height of AVL tree
and RB tree.

They are general form of a Binary Search Tree as it holds more than one key and two children.
The various properties of B trees include −

 Every node in a B Tree will hold a maximum of m children and (m-1) keys, since the
order of the tree is m.
 Every node in a B tree, except root and leaf, can hold at least m/2 children
 The root node must have no less than two children.
 All the paths in a B tree must end at the same level, i.e. the leaf nodes must be at the same
level.
 A B tree always maintains sorted data.

B trees are also widely used in disk access, minimizing the disk access time since the height of a
b tree is low.

Note − A disk access is the memory access to the computer disk where the information is stored
and disk access time is the time taken by the system to access the disk memory.

Basic Operations of B Trees

The operations supported in B trees are Insertion, deletion and searching with the time
complexity of O(log n) for every operation.

Insertion operation

The insertion operation for a B Tree is done similar to the Binary Search Tree but the elements
are inserted into the same node until the maximum keys are reached. The insertion is done using
the following procedure −

Step 1 − Calculate the maximum (m−1) and, minimum


(⌈m2⌉−1) number of keys a node can hold, where m is denoted by the order of the B Tree.
Step 2 − The data is inserted into the tree using the binary search insertion and once the keys
reach the maximum number, the node is split into half and the median key becomes the internal
node while the left and right keys become its children.

Step 3 − All the leaf nodes must be on the same level.

The keys, 5, 3, 21, 9, 13 are all added into the node according to the binary search property but if
we add the key 22, it will violate the maximum key property. Hence, the node is split in half, the
median key is shifted to the parent node and the insertion is then continued.
Another hiccup occurs during the insertion of 11, so the node is split and median is shifted to the
parent.

While inserting 16, even if the node is split in two parts, the parent node also overflows as it
reached the maximum keys. Hence, the parent node is split first and the median key becomes the
root. Then, the leaf node is split in half the median of leaf node is shifted to its parent.

The final B tree after inserting all the elements is achieved.


B+ Tree

B+ Tree is an extension of B Tree which allows efficient insertion, deletion and search
operations.

In B Tree, Keys and records both can be stored in the internal as well as leaf nodes. Whereas, in
B+ tree, records (data) can only be stored on the leaf nodes while internal nodes can only store
the key values.

The leaf nodes of a B+ tree are linked together in the form of a singly linked lists to make the
search queries more efficient.

B+ Tree are used to store the large amount of data which can not be stored in the main memory.
Due to the fact that, size of main memory is always limited, the internal nodes (keys to access
records) of the B+ tree are stored in the main memory whereas, leaf nodes are stored in the
secondary memory.

The internal nodes of B+ tree are often called index nodes. A B+ tree of order 3 is shown in the
following figure.

Advantages of B+ Tree

1. Records can be fetched in equal number of disk accesses.


2. Height of the tree remains balanced and less as compare to B tree.
3. We can access the data stored in a B+ tree sequentially as well as directly.
4. Keys are used for indexing.
5. Faster search queries as the data is stored only on the leaf nodes.
Insertion in B+ Tree

Step 1: Insert the new node as a leaf node

Step 2: If the leaf doesn't have required space, split the node and copy the middle node to the
next index node.

Step 3: If the index node doesn't have required space, split the node and copy the middle element
to the next index page.

Example :

Insert the value 195 into the B+ tree of order 5 shown in the following figure.

195 will be inserted in the right sub-tree of 120 after 190. Insert it at the desired position.

The node contains greater than the maximum number of elements i.e. 4, therefore split it and
place the median node up to the parent.
Now, the index node contains 6 children and 5 keys which violates the B+ tree properties,
therefore we need to split it, shown as follows.

Deletion in B+ Tree

Step 1: Delete the key and data from the leaves.

Step 2: if the leaf node contains less than minimum number of elements, merge down the node
with its sibling and delete the key in between them.

Step 3: if the index node contains less than minimum number of elements, merge the node with
the sibling and move down the key in between them.

Example

Delete the key 200 from the B+ Tree shown in the following figure.

200 is present in the right sub-tree of 190, after 195. delete it.

Merge the two nodes by using 195, 190, 154 and 129.
Now, element 120 is the single element present in the node which is violating the B+ Tree
properties. Therefore, we need to merge it by using 60, 78, 108 and 120.

Now, the height of B+ tree will be decreased by 1.

B Tree VS B+ Tree

SN B Tree B+ Tree

1 Search keys can not be repeatedly Redundant search keys can be present.
stored.

2 Data can be stored in leaf nodes as well Data can only be stored on the leaf nodes.
as internal nodes

3 Searching for some data is a slower Searching is comparatively faster as


process since data can be found on data can only be found on the leaf
internal nodes as well as on the leaf nodes.
nodes.

4 Deletion of internal nodes are so Deletion will never be a complexed process


complicated and time consuming. since element will always be deleted from
the leaf nodes.

5 Leaf nodes can not be linked together. Leaf nodes are linked together to make the
search operations more efficient.
Heap Data Structure

Heap is a special case of balanced binary tree data structure where the root-node key is compared
with its children and arranged accordingly. If α has child node β then −

key(α) ≥ key(β)

As the value of parent is greater than that of child, this property generates Max Heap. Based on
this criteria, a heap can be of two types −

For Input → 35 33 42 10 14 19 27 44 26 31

Min-Heap − Where the value of the root node is less than or equal to either of its children.

Max-Heap − Where the value of the root node is greater than or equal to either of its children.

Both trees are constructed using the same input and order of arrival.

Application of Heap Data Structure:


 Priority queues: The heap data structure is commonly used to implement priority queues,
where elements are stored in a heap and ordered based on their priority. This allows
constant-time access to the highest-priority element, making it an efficient data structure
for managing tasks or events that require prioritization.
 Heapsort algorithm: The heap data structure is the basis for the heapsort algorithm, which
is an efficient sorting algorithm with a worst-case time complexity of O(n log n). The
heapsort algorithm is used in various applications, including database indexing and
numerical analysis.
 Memory management: The heap data structure is used in memory management systems to
allocate and deallocate memory dynamically. The heap is used to store the memory blocks,
and the heap data structure is used to efficiently manage the memory blocks and allocate
them to programs as needed.
 Graph algorithms: The heap data structure is used in various graph algorithms, including
Dijkstra’s algorithm, Prim’s algorithm, and Kruskal’s algorithm. These algorithms require
efficient priority queue implementation, which can be achieved using the heap data
structure.
 Job scheduling: The heap data structure is used in job scheduling algorithms, where tasks
are scheduled based on their priority or deadline. The heap data structure allows efficient
access to the highest-priority task, making it a useful data structure for job scheduling
applications.

You might also like