Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Greedy Algorithms

Part - 4 : Huffman Coding


Huffman Codes

• Widely used technique for data compression

• Assume the data to be a sequence of characters

• Looking for an effective way of storing the data

• Binary character code

– Uniquely represents a character by a binary string

2
Two Ways to Compress data

• Fixed-Length Codewords

• Variable-Length Codewords

3
Example
E.g.:

Suppose you have to compress a data file containing


100,000 characters

The data file only has 6 characters (a, b, c, d, e, f)

a b c d e f
Frequency (thousands) 45 13 12 16 9 5

4
Fixed-Length Codes
E.g.: Data file containing 100,000 characters
a b c d e f
Frequency (thousands) 45 13 12 16 9 5

• 3 bits needed

• a = 000, b = 001, c = 010, d = 011, e = 100, f = 101

• Requires: 100,000 X 3 = 300,000 bits

5
Huffman Codes
• Idea:
– Variable Length Codes

– Use the frequencies of occurrence of characters to


build a optimal way of representing each character

6
Constructing a Huffman Code
• Idea:
– Arranging the characters in Ascending order of their
frequencies

f: 5 e: 9 c: 12 b: 13 d: 16 a: 45

– Add Two minimum frequencies and place them to the


right position

7
Example
f: 5 e: 9 c: 12 b: 13 d: 16 a: 45 c: 12 b: 13 14 d: 16 a: 45
0 1
f: 5 e: 9

14 d: 16 25 a: 45 25 30 a: 45
0 1 0 1 0 1 0 1
f: 5 e: 9 c: 12 b: 13 c: 12 b: 13 14 d: 16
0 1
f: 5 e: 9

a: 45 55 0 100 1
0 1
a: 45 55
25 30 0 1
0 1 0 1
25 30
c: 12 b: 13 14 d: 16 0 1 0 1
0 1 c: 12 b: 13 d: 16
14
f: 5 e: 9 0 1
f: 5 e: 9 8
Variable-Length Codes
E.g.: Data file containing 100,000 characters

a b c d e f
Frequency (thousands) 45 13 12 16 9 5
Variable length
0 101 100 111 1101 1100
Codewords

• Assign short codewords to frequent characters and


long codewords to infrequent characters

• ((45 * 1) + (13 * 3) + (12 * 3) + (16 * 3) + (9 * 4) + (5 * 4)) X 1,000


= 224,000 bits 9
Comparison Ration

a b c d e f
Frequency (thousands) 45 13 12 16 9 5
Fixed Length
000 001 010 011 100 101
Codewords
Variable length
0 101 100 111 1101 1100
Codewords
300000 – 224000
X 100
300000

= 25.33%

Saves 20% to 90% cost. 10


Building a Huffman Code
Alg.: HUFFMAN(C) Running time: O(nlgn)
1. n  C 
2. QC O(n)
3. for i  1 to n – 1
4. do allocate a new node z
5. left[z]  x  EXTRACT-MIN(Q)
O(nlgn)
6. right[z]  y  EXTRACT-MIN(Q)
7. f[z]  f[x] + f[y]
8. INSERT (Q, z)
9. return EXTRACT-MIN(Q)
11
Encoding and Decoding
0 100 1 Character Codeword
a: 45 55
0 1 a 0
25 30 b 101
0 1 0 1
c: 12 b: 13 14 d: 16 c 100
0 1
d 111
f: 5 e: 9
e 1101
f 1100

Encode : abcde

Decode: 101100011111011100
12

You might also like