Professional Documents
Culture Documents
Huffman's Algorithm Lecture1
Huffman's Algorithm Lecture1
CS 102
The (Real) Basic Algorithm
1. Scan text to be compressed and tally
occurrence of all characters.
2. Sort or prioritize characters based on
number of occurrences in text.
3. Build Huffman code tree based on
prioritized list.
4. Perform a traversal of tree to determine
all code words.
5. Scan text again and create new file
using the Huffman codes.
CS 102
INTRODUCTION TO HUFFMAN CODE:
• Huffman codes are an effective technique of ‘lossless data
compression’ which means no information is lost.
• The algorithm builds a table of the frequencies of each character in a
file .
• The table is then used to determine an optimal way of representing each
character as a binary string.
HUFFMAN CODE:
• The algorithm as described by David Huffman assigns every symbol to
a leaf node of a binary code tree. These nodes are weighted by the
number of occurrences of the corresponding symbol called frequency
or cost.
• The tree structure results from combining the nodes step-by-step
until all of them are embedded in a root tree.
• The algorithm always combines the two nodes providing the lowest
frequency in a bottom up procedure. The new interior nodes gets the
sum of frequencies of both child nodes.
ALGORITHM
HUFFMAN CODING ALGORITHM:
• 1. Initialization: Put all symbols on a list sorted according to their frequency
counts.
• 2. Repeat until the list has only one symbol left:
1. From the list pick two symbols with the lowest frequency counts
2. Form a Huffman sub-tree that has these two symbols as child
nodes and create a parent node.
3. Assign the sum of the children’s frequency counts to the
parent and insert it into the list such that the order is maintained.
4. Delete the children from the list.
3. Assign a codeword for each leaf based on the path from the root.
PSEUDOCODE
HUFFMAN(C)
C is set of characters
1: n=|c|
2: Q=C
3: For i=1 to n-1
4: allocate a new node z.
5: z. left=x= Extract-min(Q)
6: z. right= y=Extract-min(Q).
7: z. freq= x. freq + y. freq
8: Insert(Q,z)
9: return
CODE:
• struct tree_t
• {
• tree_t *left;
• tree_t *right;
• char character;
• };
• to store the tree elements (note that if left and right are NULL, then we would know that
the node is a leaf and that character stores a valid char of interest) and
• struct list_t
• {
• list_t *next;
• int total_frequency;
• tree_t *tree;
• }
• to store the list of trees that still need to be merged together as a linked list.
EXAMPLE 1:
HUFFMAN TREE:
100
a/45 55
25 30
e/9 f/5
CONSIDER THE TABLE :
(DESIGN HUFFMAN TREE)
characters Frequency Fixed length code Variable length
a 45 000 ?
b 13 001 ?
c 12 010 ?
d 16 011 ?
e 9 100 ?
f 5 101 ?
Encoding the File:Traverse Tree for Codes
a/45 55
0 1
25 30
0 1 0 1
0 1
f/5 e/9
CONSIDER THE TABLE :
(DESIGN HUFFMAN TREE)
characters Frequency Fixed length code Variable length
a 45 000 0
b 13 001 101
c 12 010 100
d 16 011 111
e 9 100 1101
f 5 101 1100
EXAMPLE 2:
Code Tree According to Huffman:
• The branches of the tree represent the binary values 0 and
1 according to the rules for common prefix-free code trees.
The following example bases on a data source using a set of five different symbols. The symbol's
frequencies are:
Symbol Frequency
A 24
B 12
C 10
D 8
E 8
The two rarest symbols 'E' and 'D' are connected first, followed by 'C' and 'D'. The new parent nodes
have the frequency 16 and 22 respectively and are brought together in
the next step. The resulting node and the remaining symbol 'A' are subordinated to the root node
that is created in a final step.
Code Tree according to Huffman:
http://www.binaryessence.com/dct/en000080.htm
EXAMPLE 3:
Eerie eyes seen near lake. 26 characters
Building a Tree
Scan the original text
Eerie eyes seen near lake.
• What characters are present?
E e r i space
y s n a r l k .
CS 102
Building a Tree
Scan the original text
Eerie eyes seen near lake.
• What is the frequency of each character in the text?
CS 102
Building a Tree
Prioritize characters
• Create binary tree nodes with character and
frequency of each character.
• Place nodes in a priority queue
• The lower the occurrence, the higher the
priority in the queue
CS 102
Building a Huffman Tree:
• The queue after inserting all nodes
E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8
CS 102
Building a Tree
E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8
CS 102
Building a Huffman Tree
y l k . r s n a sp e
1 1 1 1 2 2 2 2 4 8
E i
1 1
Building a Huffman Tree
y l k . r s n a sp e
2
1 1 1 1 2 2 2 2 4 8
E i
1 1
Building a Huffman Tree
k . r s n a sp e
2
1 1 2 2 2 2 4 8
E i
1 1
y l
1 1
Building a Huffman Tree:
2
k . r s n a 2 sp e
1 1 2 2 2 2 4 8
y l
E i 1 1
1 1
Building a Huffman Tree
r s n a 2 2 sp e
2 2 2 2 4 8
y l
E i 1 1
1 1
k .
1 1
Building a Tree
r s n a 2 2 sp e
2
2 2 2 2 4 8
E i y l k .
1 1 1 1 1 1
Building a Huffman Tree
n a 2 sp e
2 2
2 2 4 8
E i y l k .
1 1 1 1 1 1
r s
2 2
Building a Huffman Tree
n a 2 sp e
2 2 4
2 2 4 8
E i y l k . r s
1 1 1 1 1 1 2 2
Building a Huffman Tree
2 4 e
2 2 sp
8
4
y l k . r s
E i 1 1 1 1 2 2
1 1
n a
2 2
Building a Huffman Tree
2 4 4 e
2 2 sp
8
4
y l k . r s n a
E i 1 1 1 1 2 2 2 2
1 1
CS 102
Building a Huffman Tree
4 4 e
2 sp
8
4
k . r s n a
1 1 2 2 2 2
2 2
E i y l
1 1 1 1
Building a Huffman Tree
4 4 4
2 sp e
4 2 2 8
k . r s n a
1 1 2 2 2 2
E i y l
1 1 1 1
Building a Huffman Tree
4 4 4
e
2 2 8
r s n a
2 2 2 2
E i y l
1 1 1 1
2 sp
4
k .
1 1
Building a Huffman Tree
4 4 4 6 e
2 sp 8
r s n a 2 2
4
2 2 2 2 k .
E i y l 1 1
1 1 1 1
4 6 e
2 2 2 8
sp
4
E i y l k .
1 1 1 1 1 1
8
4 4
r s n a
2 2 2 2
Building a Huffman Tree
4 6 e 8
2 2 2 8
sp
4 4 4
E i y l k .
1 1 1 1 1 1
r s n a
2 2 2 2
Building a Huffman Tree
8
e
8
4 4
10
r s n a
2 2 2 2 4
6
2 2 2 sp
4
E i y l k .
1 1 1 1 1 1
Building a Huffman Tree
8 10
e
8 4
4 4
6
2 2
r s n a 2 sp
2 2 2 2 4
E i y l k .
1 1 1 1 1 1
Building a Tree
10
16
4
6
2 2 e 8
2 sp 8
4
E i y l k . 4 4
1 1 1 1 1 1
r s n a
2 2 2 2
Building a Huffman Tree
10 16
4
6
e 8
2 2 8
2 sp
4 4 4
E i y l k .
1 1 1 1 1 1
r s n a
2 2 2 2
Building a Huffman Tree
26
16
10
4 e 8
6 8
2 2 2 sp 4 4
4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
Why is Huffman Coding Greedy?
Building a Huffman Tree
•After
enqueueing
26 this node
there is only
16
10 one node left
4 e 8
in priority
6 8 queue.
2 2 2 sp 4 4
4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
CS 102
Building a Huffman Tree
Dequeue the single node
left in the queue.
26
16
This tree contains the 10
new code words for each
4 e 8
character. 6 8
2 2 2 sp 4 4
4
Frequency of root node E i y l k .
1 1 1 1 1 1 r s n a
should equal number of 2 2 2 2
characters in text.
Eerie eyes seen near lake. 26 characters
Encoding the File
Traverse Tree for Codes
CS 102
Decoding the File
• Once receiver has tree it
scans incoming bit stream 26
• 0 go left 10
16
• 1 go right 4 e 8
6 8
2 2 2 sp 4 4
4
101000110111101111 E i y l k .
1 1 1 1 1 1 r s n a
01111110000110101 2 2 2 2
NEW CASE
(WORD BAGGAGE)
HUFFMAN CODING TYPES:
• The construction of a code tree for the Huffman coding is based on a
certain probability distribution.
CS 102
REFERRENCE:
• https://www.slideshare.net/alitech123/huffman-coding-
10173665?next_slideshow=1
• http://www2.hawaii.edu/~janst/211/lecture/Huffman%20Trees.htm
• https://www.cs.duke.edu/csed/poop/huff/info/
• http://homes.soic.indiana.edu/yye/lab/teaching/spring2014-
C343/huffman.php
• https://www.youtube.com/watch?v=ZZX_KtWWPi8
• http://www.binaryessence.com/dct/en000093.htm