Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

Graduate Studies Program

Term: Fall 2022/2023

Computing

Lecture 3
Non-Linear Dynamic Data Structures (Trees)
(Draft)
This lecture notes have been compiled from different resources,
I’d like to thank all authors who make them available to use.

1
Lecture Outline
◼ Trees Concepts and Implementation.
◼ Binary Search Tree (BST).
◼ Tree Traversal
◼ In-order Traversal
◼ Pre-order Traversal
◼ Post-order Traversal
◼ AVL Trees and Rotations Techniques
◼ Left rotation
◼ Right rotation
◼ Left-Right rotation
◼ Right-Left rotation
◼ Binary Trees.
◼ Heaps and Priority Queues.
◼ Encoding Messages and Huffman Coding Tree.
2
Tree Structures
◼ Tree structures permit both efficient access and updating large
collections of data.
◼ A tree is a nonlinear, two-dimensional data structure.
◼ A tree is a finite nonempty set of elements.
◼ The elements are represented as nodes connected by edges.
◼ One of these elements is called the root.
◼ The remaining elements, if any, are partitioned into trees, which are
called the sub-trees.
◼ Trees are useful for hierarchically ordered data.
◼ Trees in particular are widely used and relatively easy to implement.
◼ Trees are useful for many things besides searching.
◼ Just a few examples of applications that trees can speed up include
prioritizing jobs, describing mathematical expressions and the syntactic
elements of computer programs, or organizing the information needed
to drive data compression algorithms.
3
A tree properties
◼ Path − Path refers to the sequence of nodes along the edges of a tree.
◼ Root − The node at the top of the tree is called root. There is only one
root per tree and one path from the root node to any node.
◼ Child − The node below a given node connected by its edge.
◼ A node with no children is called a leaf node.
◼ The children of a specific node are called siblings.
◼ Traversing−Traversing means passing through nodes in a specific order.
◼ Levels − Level of a node represents the generation of a node. If the root
node is at level 0, then its next child node is at 1, its grandchild is at 2, …
◼ Depth or Height of a Tree = no of levels.
◼ Degree of a Node = no of its children.

4
Tree Implementation
◼ A tree data structure can be implemented using node
object/structure and connecting them through references.
◼ Tree Node
◼ The code of a tree node has a data part and references to its left
and right child nodes.

struct node {
int data;
struct node *leftChild;
struct node *rightChild;
};

5
Binary Search Tree (BST)
◼ Binary Search Tree exhibits a special behavior:
◼ A node's left child must have a value less than its parent's value and
the node's right child must have a value greater than its parent
value.

◼ The basic operations that can be performed on a binary search


tree data structure, are the following:
◼ Insert − Inserts an element in a tree/create a tree.
◼ Search − Searches an element in a tree.
◼ Pre-order Traversal − Traverses a tree in a pre-order manner.
◼ In-order Traversal − Traverses a tree in an in-order manner.
◼ Post-order Traversal − Traverses a tree in a post-order manner.

6
Insert Operation Algorithm
If root is NULL
then create root node
return
// If root exists then
compare the data with node.data
do until insertion position is located
If data is greater than node.data
goto right subtree
else
goto left subtree
enddo
insert data

7
Search Operation
If root.data is equal to search.data
return root
else
while data not found
If data is greater than node.data
goto right subtree
else
goto left subtree
If data found
return node
endwhile
return data not found
end if

8
Tree Traversal
◼ Traversal is a process to visit all the nodes of a tree.

◼ Generally, traversing a tree is performed to search or locate a


given item or key in the tree or to print all the values it contains.

◼ Because, all nodes are connected via edges (links), we always


start from the root (head) node, so we cannot randomly access a
node in a tree.

◼ There are three ways which we use to traverse a tree:

◼ In-order Traversal

◼ Pre-order Traversal

◼ Post-order Traversal

9
In-order Traversal
◼ In this traversal method, the left subtree is visited first, then the root
and later the right sub-tree.
◼ If a binary tree is traversed in-order, the output will produce sorted key
values in an ascending order.

◼ We start from A, and following in-order traversal, we move to its left


subtree B. B is also traversed in-order.
◼ The process goes on until all the nodes are visited.
◼ The output of in-order traversal of this tree will be:
◼ D → B → E → A → F → C → G

10
Pre-order Traversal
◼ In this traversal method, the root node is visited first, then the left
subtree and finally the right subtree.

◼ We start from A, and following pre-order traversal, we first visit A itself


and then move to its left subtree B. B is also traversed pre-order.
◼ The process goes on until all the nodes are visited.
◼ The output of pre-order traversal of this tree will be:
◼ A→B→D→E→C→F→G

11
Post-order Traversal
◼ In this traversal method, the root node is visited last, hence the name.
First we traverse the left subtree, then the right subtree and finally the
root node.

◼ We start from A, and following pre-order traversal, we first visit the left
subtree B. B is also traversed post-order.
◼ The process goes on until all the nodes are visited.
◼ The output of post-order traversal of this tree will be:
◼ D → E → B → F → G → C → A

12
Balance out the existing BST
◼ if the input to binary search tree comes in a sorted (ascending or
descending) manner? It will then look like this −

◼ It is observed that BST's worst-case performance is closest to linear


search algorithms, that is Ο(n).
◼ In real-time data, we cannot predict data pattern and their frequencies.
So, a need arises to balance out the existing BST.

13
AVL Trees
◼ Adelson, Velski & Landis (AVL) invent AVL tree.
◼ AVL tree checks the height of the left and the right sub-trees and
assures that the difference is not more than 1. This difference is called
the Balance Factor.
◼ Here we see that the first tree is balanced and the next two trees are
not balanced:

◼ In the second tree, the left subtree of C has height 2 and the right
subtree has height 0, so the difference is 2.
◼ In the third tree, the right subtree of A has height 2 and the left is
missing, so it is 0, and the difference is 2.
◼ AVL tree permits difference (balance factor) to be only 1.
◼ BalanceFactor = height(left-subtree) − height(right-subtree)
14
AVL Rotations Techniques
◼ If the difference in the height of left and right sub-trees is more than 1,
the tree is balanced using some rotation techniques.
◼ Four kinds of rotations can be performed to balance a tree:
◼ Left rotation

◼ Right rotation

◼ Left-Right rotation

◼ Right-Left rotation

◼ The first two rotations are single rotations and the next two rotations
are double rotations.
◼ Left Rotation
◼ If a tree becomes unbalanced, when a node is inserted into the right

subtree of the right subtree, then we perform a single left rotation:

15
Right Rotation
◼ In our example, node A has become unbalanced as a node is inserted in
the right subtree of A's right subtree. We perform the left rotation by
making A the left-subtree of B.
◼ AVL tree may become unbalanced, if a node is inserted in the left
subtree of the left subtree. The tree then needs a right rotation.

◼ Here, the unbalanced node becomes the right child of its left child by
performing a right rotation.

16
Left-Right Rotation
◼ Double rotations are slightly complex version of already explained versions
of rotations.
◼ A left-right rotation is a combination of left rotation followed by right
rotation.

◼ A node has been inserted into the right subtree of the left subtree. This
makes C an unbalanced node, these scenarios cause AVL tree to perform
left-right rotation.
◼ We first perform the left rotation on the left subtree of C,
this makes A, the left subtree of B.
◼ Node C is still unbalanced, however now,
it is because of the left-subtree of the left-subtree.
◼ We shall now right-rotate the tree,
making B the new root node of this subtree,
C now becomes the right subtree of its own left subtree.

17
Right-Left Rotation
◼ The second type of double rotation is Right-Left Rotation. It is a
combination of right rotation followed by left rotation.
◼ A node has been inserted into the left subtree of the right subtree,
this makes A, an unbalanced node with balance factor 2.

◼ First, we perform the right rotation along C node, making C the right
subtree of its own left subtree B. Now, B becomes the right subtree of A.

◼ Node A is still unbalanced because of the right subtree of its right subtree
and requires a left rotation.
◼ The tree is now balanced.

18
Binary Trees
◼ Has a root at the topmost level.
◼ Each node has zero, one, or two children.
◼ A node that has no child is called a leaf.
◼ For a node x, we denote the left child, right child and
the parent of x as left(x), right(x) and parent(x),
respectively.

19
Binary Trees

A complete binary tree is one where a node can have 0 (for the
leaves) or 2 children and all leaves are at the same depth.

A complete tree An almost complete tree

20
Height (depth) of a binary tree
◼ The number of edges on the longest path from the
root to a leaf.

Height = 4

21
Height (depth) of a binary tree

◼ A complete binary tree of N nodes has height O(log N)

d 2d
Total: 2d+1 - 1
22
Heaps and Priority Queues
◼ There are many situations, both in real life and in computing
applications, where we wish to choose the next “most
important” from a collection of people, tasks, or objects.
◼ When scheduling programs for execution in a multitasking
operating system, at any given moment there might be several
programs (usually called jobs) ready to run. The next job
selected is the one with the highest priority.
◼ Priority is indicated by a particular value associated with the job
(and might change while the job remains in the waiting list).
◼ When a collection of objects is organized by importance or
priority, we call this a priority queue.
◼ A normal queue data structure will not implement a priority
queue efficiently because search for the element with highest
priority will take (n) time.

23
Heaps and Priority Queues
◼ A BST that organizes records by priority could be used, with the
total of n inserts and n remove operations requiring (n log n)
time in the average case.
◼ However, there is always the possibility that the BST will
become unbalanced, leading to bad performance.
◼ Instead, we would like to find a data structure that is
guaranteed to have good performance for this special
application such as heaps.
◼ Heap is a special case of balanced binary tree data structure
where the root-node key is compared with its children and
arranged accordingly.

24
Heaps (memory pool)
Heaps are “almost complete binary trees” -- All levels are full
except possibly the lowest level.
A heap is a complete binary tree, where the entry at each
node is greater than or equal to the entries in its children.
◼ There are two variants of the heap
◼ A max-heap has the property that every node stores a value that
is greater than or equal to the value of either of its children. the
root stores the maximum of all values in the tree.
◼ A min-heap has the property that every node stores a value that is
less than or equal to that of its children. the root stores the
minimum of all values in the tree.
◼ To add an entry to a heap, place the new entry at the next
available spot, and perform a reheapification upward.
◼ To remove the biggest entry, move the last node onto the
root, and perform a reheapification downward.

25
Heaps and Priority Queues

Motivating Example
3 jobs have been submitted to a printer in the order A, B, C.
Sizes: Job A – 100 pages
Job B – 10 pages
Job C – 1 page

Average waiting time with FIFO service:


(100+110+111) / 3 = 107 time units
Average waiting time for shortest-job-first service:
(1+11+111) / 3 = 41 time units

Need to have a queue which does insert and deletemin


Priority Queue
26
Common Implementations
1) Linked list
• Insert in O(1)
• Find the minimum element in O(n), thus deletion is O(n)

2) Search Tree (AVL tree)


• Insert in O(log n)
• Delete in O(log n)

27
Insertion
◼ Add the new element to the lowest level
1
◼ Restore the min-heap property if violated:
2 5
◼ “bubble up”: if the parent of the
element is larger than the element, 4 3 6
then interchange the parent and child.
Insert 2.5

2 5
4 3 6 2.5

Percolate up to maintain the heap property


1

2 2.5
4 3 6 5
28
DeleteMin: first attempt
1. Copy the last number to the root (i.e. overwrite the minimum
element stored there).
2. Restore the min-heap property by “bubble down”.

29
DeleteMin

30
DeleteMin

31
Heap time complexity
◼ Heap supports the following operations efficiently
◼ Locate the current minimum in O(1) time
◼ Delete the current minimum in O(log N) time
◼ Insert node operation in O(height) = O(log n) time

32
Encoding Messages

◼ Encode a message composed of a string of characters


◼ Codes used by computer systems
◼ ASCII
◼ uses 8 bits per character
◼ can encode 256 characters
◼ Unicode
◼ 16 bits per character
◼ can encode 65536 characters
◼ includes all characters encoded by ASCII
◼ ASCII and Unicode are fixed-length codes
◼ all characters represented by same number of bits

33
Problems

◼ Suppose that we want to encode a message constructed from


the symbols A, B, C, D, and E using a Fixed-Length Coding

◼ How many bits are required to encode each symbol?

 at least 3 bits are required

2 bits are not enough (can only encode four symbols)

 How many bits are required to encode the message


DEAACAAAAABA?

 there are twelve symbols, each requires 3 bits

 12*3 = 36 bits are required

34
Drawbacks of fixed-Length Coding
◼ Wasted space
◼ Same number of bits used to represent all characters
◼ ‘a’ and ‘e’ occur more frequently than ‘q’ and ‘z’

◼ Potential solution: use Variable-Length Coding


◼ variable number of bits to represent characters when
frequency of occurrence is known
◼ short codes for characters that occur frequently.

35
Advantages of Variable-Length Coding
◼ The advantage of variable-length coding over fixed-length is short
codes can be given to characters that occur frequently
◼ on average, the length of the encoded message is less than
fixed-length encoding
◼ Potential problem: how do we know where one character ends and
another begins?
◼ not a problem if number of bits is fixed!

A = 00 0010110111001111111111
B = 01
C = 10
AC DBADD D DD
D = 11

36
Prefix Property Example

◼ A code has the prefix property if no character code is the prefix


(start of the code) for another character
◼ Example:
Symbol Code
P 000
Q 11
01001101100010
R 01
S 001 RSTQPT
T 10

◼ 000 is not a prefix of 11, 01, 001, or 10


◼ 11 is not a prefix of 000, 01, 001, or 10 …

37
Code without prefix property
◼ The following code does not have prefix property

Symbol Code
P 0
Q 1
R 01
S 10
T 11

◼ The pattern 1110 can be decoded as QQQP, QTP, QQS, or TS

38
Greedy Algorithms
◼ “Take what you can get now” strategy.
◼ Make locally optimal decision at each step.
◼ No regard for future consequences.
◼ Not always a correct thing to do.
◼ If a greedy algorithm can be proven to yield the
global optimum for a given problem class, it
typically becomes the method of choice.

39
Huffman Coding
◼ Huffman Coding: is a greedy method of storing strings of data
as binary code in an efficient manner.
◼ Huffman coding uses ‘losseless data compression’, which means
no information is lost.
◼ Huffman coding uses “variable length coding”, which means
that symbols you are encoded are converted to a binary based
on how often that symbol is used.
◼ The used way is binary tree.

40
Building the Huffman tree
◼ The process of building the Huffman tree for n letters is quite simple.
◼ First, create a collection of n initial Huffman trees, each of which is a
single leaf node containing one of the letters.

◼ Put the n partial trees onto a min-heap (a priority queue)


organized by weight (frequency).
◼ Next, remove the first two trees (the ones with lowest weight) from
the heap.
◼ Join these two trees together to create a new tree whose root has the
two trees as children, and whose weight is the sum of the weights of the
two trees.
◼ Put this new tree back on the heap.
◼ This process is repeated until all of the partial Huffman trees have
been combined into one.

41
Building the Huffman tree
◼ Because the first two letters on the list are Z and K, they are selected
to be the first trees joined together.
◼ They become the children of a root node with weight 9.
◼ Thus, a tree whose root has weight 9 is placed back on the list, where it
takes up the first position.
◼ The next step is to take values 9 and 24 off the heap (corresponding
to the partial tree with two leaf nodes built in the last step, and the
partial tree storing the letter M, respectively) and join them together.
◼ The resulting root node has weight 33, and so this tree is placed
back into the heap.
◼ Its priority will be between the trees with values 32 (for letter C) and
37 (for letter U).
◼ This process continues until a tree whose root has weight 306 is built.

42
Building the Huffman tree

43
Building the Huffman tree

44
Assigning and Using Human Codes
◼ Once the Huffman tree has been constructed, it is an easy
to assign codes to individual letters.

◼ Each leaf contains symbol (character), Beginning at the root, we


assign either a ‘0’ or a ‘1’
◼ label edge from node to left child with 0
◼ label edge from node to right child with 1

45
Assigning and Using Human Codes
◼ The Huffman code for a letter is simply a binary number
determined by the path from the root to the leaf
corresponding to that letter.
◼ Thus, the code for E is ‘0’ because the path from the root to the
leaf node for E takes a single left branch.
◼ The code for K is ‘111101’ because the path to the node for K
takes four right branches, then a left, and finally one last right.
◼ Given codes for the letters, it is a simple matter to use these
codes to encode a text message.
◼ We simply replace each letter in the string with its binary code. A
lookup table can be used for this purpose.

46
Assigning and Using Human Codes
◼ Decoding the message is done by looking at the bits in the coded
string from left to right until a letter is decoded.
◼ This can be done using the Huffman tree in a reverse process
from that used to generate the codes.
◼ Decoding a bit string begins at the root of the tree.
◼ We take branches depending on the bit value — left for ‘0’ and
right for ‘1’ — until reaching a leaf node.
◼ This leaf contains the first character in the message.
◼ We then process the next bit in the code restarting at the root to
begin the next character.

47
Decode Example
◼ To decode the bit string “1011001110111101” we begin at the
root of the tree and take a right branch for the first bit which is
‘1.’
◼ Because the next bit is a ‘0’ we take a left branch.
◼ We then take another right branch (for the third bit ‘1’), arriving
at the leaf node corresponding to the letter D.
◼ Thus, the first letter of the coded word is D.
◼ We then begin again at the root of the tree to process the fourth
bit, which is a ‘1.’
◼ Taking a right branch, then two left branches (for the next two
bits which are ‘0’), we reach the leaf node corresponding to the
letter U. Thus, the second letter is U.
◼ In similar manner we complete the decoding process to find that
the last two letters are C and K, spelling the word “DUCK.”

48
Average Code word Length
◼ The average expected cost per character is simply the sum of the
cost for each character (ci) times the probability of its occurring
(pi), or

◼ where fi is the (relative) frequency of letter i and fT is the total


for all letter frequencies. For this set of frequencies, the expected
cost per letter is

◼ A fixed-length code for these eight characters would require log 8


= 3 bits per letter as opposed to about 2.57 bits per letter for
Huffman coding. Thus, Huffman coding is expected to save about
14% for this set of letters.

49
Building a Huffman tree
◼ Find frequencies of each symbol occurring in message
◼ Begin with a forest of single node trees
◼ each contain symbol and its frequency
◼ Do recursively
◼ select two trees with smallest frequency at the root
◼ produce a new binary tree with the selected trees as children and
store the sum of their frequencies in the root
◼ Recursion ends when there is one tree
◼ this is the Huffman coding tree

50
Example
◼ Build the Huffman coding tree for the message
This is his message
◼ Character frequencies

A G M T E H _ I S

1 1 1 1 2 2 3 3 5

◼ Begin with forest of single trees

1 1 1 1 2 2 3 3 5
A G M T E H _ I S

51
Step 1

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
52
Step 2

2 2

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
53
Step 3

2 2 4

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
54
Step 4

2 2 4

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
55
Step 5

2 2 4 6

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
56
Step 6

4 4

2 2 2 2 6
E H

1 1 1 1 3 3 5
A G M T _ I S
57
Step 7

8 11

4 4 6 5
S

2 2 2 2 3 3
E H _ I

1 1 1 1
A G M T
58
Step 8
19

8 11

4 4 6 5
S

2 2 2 2 3 3
E H _ I

1 1 1 1
A G M T
59
Label edges
19
0 1

8 11
0 1 0 1

4 4 6 5
0 1 0 1 0 1 S

2 2 2 2 3 3
0 1 0 1 E H _ I

1 1 1 1
A G M T
60
Huffman code & encoded message

This is his message


11 S
010 E
011 H
100 _
101 I
0000 A
0001 G
0010 M
0011 T
00110111011110010111100011101111000010010111100000001010
61
Huffman Encoding - Time Complexity
◼ Sort keys O(n log n)
◼ Repeat n times
◼ Form new sub-tree O(1)
◼ Move sub-tree O(logn)
(binary search)
◼ Total O(n log n)
◼ Overall O(n log n)

62
Data Compression
◼ Huffman encoding is a simple example of data
compression representing data in fewer bits than it
would otherwise need.
◼ A more sophisticated method is GIF (Graphics
Interchange Format) compression, for .gif files
◼ Another is JPEG (Joint Photographic Experts Group),
for .jpg files
◼ Unlike the others, JPEG is lossy—it loses
information

63

You might also like