Huffman Code

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

Huffman Codes

• Huffman coding is a lossless data compression


algorithm.
• The idea is to assign variable-length codes to input
characters, lengths of the assigned codes are based on
the frequencies of corresponding characters.
• The most frequent character gets the smallest code
and the least frequent character gets the largest code.
• The variable-length codes assigned to input characters
are Prefix Codes, means the codes (bit sequences) are
assigned in such a way that the code assigned to one
character is not prefix of code assigned to any other
character. This is how Huffman Coding makes sure
that there is no ambiguity when decoding the
generated bit stream.
• Let us understand prefix codes with a counter example.
• Let there be four characters a, b, c and d, and their
corresponding variable length codes be 00, 01, 0 and 1.
• This coding leads to ambiguity because code assigned
to c is prefix of codes assigned to a and b. If the
compressed bit stream is 0001, the de-compressed
output may be “cccd” or “ccb” or “acd” or “ab”.

• There are mainly two major parts in Huffman Coding

1) Build a Huffman Tree from input characters.


2) Traverse the Huffman Tree and assign codes to
characters.
Steps to build Huffman Tree :
Input is array of unique characters along with their frequency of
occurrences and output is Huffman Tree.

1. Create a leaf node for each unique character and build a min heap
of all leaf nodes (Min Heap is used as a priority queue. The value
of frequency field is used to compare two nodes in min heap.
Initially, the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min
heap.
3. Create a new internal node with frequency equal to the sum of the
two nodes frequencies. Make the first extracted node as its left
child and the other extracted node as its right child. Add this node
to the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The
remaining node is the root node and the tree is complete.
Example
(character, Frequency) =
{(a, 5),
(b, 9),
(c, 12),
(d, 13),
(e, 16),
(f, 45)}
character code-word
f --- 0
c --- 100
d --- 101
a --- 1100
b --- 1101
e --- 111
Example
• Say we want to encode a text with the characters a,
b,…, g occurring with the following frequencies:

a b c d e f g
Frequency 37 18 29 13 30 17 6
Fixed-Length Code
a b c d e f g
Frequency 37 18 29 13 30 17 6
Fixed-length 000 001 010 011 100 101 110
code

• Total size is:


(37 + 18 + 29 + 13 + 30 + 17 + 6) x 3= 450 bits
Variable-Length Code
a b c d e f g
Frequency 37 18 29 13 30 17 6
Variable- 10 011 111 1101 00 010 1100
length code

• Total size is:


37x2 + 18x3 + 29x3 + 13x4 + 30x2 + 17x3 + 6x4 = 402 bits

• A savings of approximately 11%


Constructing a Huffman Code

g,6 d,13 f,17 b,18 c,29 e,30 a,37


Constructing a Huffman Code

g,6 d,13 f,17 b,18 c,29 e,30 a,37


Constructing a Huffman Code

19 f,17 b,18 c,29 e,30 a,37

g,6 d,13
Constructing a Huffman Code

f,17 b,18 19 c,29 e,30 a,37

g,6 d,13
Constructing a Huffman Code

f,17 b,18 19 c,29 e,30 a,37

g,6 d,13
Constructing a Huffman Code

35 19 c,29 e,30 a,37

f,17 b,18 g,6 d,13


Constructing a Huffman Code

19 c,29 e,30 35 a,37

g,6 d,13 f,17 b,18


Constructing a Huffman Code

19 c,29 e,30 35 a,37

g,6 d,13 f,17 b,18


Constructing a Huffman Code

48 e,30 35 a,37

19 c,29 f,17 b,18

g,6 d,13
Constructing a Huffman Code

e,30 35 a,37 48

f,17 b,18 19 c,29

g,6 d,13
Constructing a Huffman Code

e,30 35 a,37 48

f,17 b,18 19 c,29

g,6 d,13
Constructing a Huffman Code

65 a,37 48

e,30 35 19 c,29

f,17 b,18 g,6 d,13


Constructing a Huffman Code

a,37 48 65

19 c,29 e,30 35

g,6 d,13 f,17 b,18


Constructing a Huffman Code

a,37 48 65

19 c,29 e,30 35

g,6 d,13 f,17 b,18


Constructing a Huffman Code

85 65

e,30 35
a,37 48

f,17 b,18
19 c,29

g,6 d,13
Constructing a Huffman Code

65 85

e,30 35
a,37 48

f,17 b,18 19 c,29

g,6 d,13
Constructing a Huffman Code
150

65 85

e,30 35
a,37 48

f,17 b,18 19 c,29

g,6 d,13
Resulting Code

0 1
a 10
b 011
0 1 0 1
c 111
e a d 1101
0 1 0 1
e 00
f b c f 010
0 1 g 1100
g d

You might also like