Greedy Algorithms: Part - 4: Huffman Coding

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Greedy Algorithms

Part - 4 : Huffman Coding


Huffman Codes

• Widely used technique for data compression

• Assume the data to be a sequence of characters

• Looking for an effective way of storing the data

• Binary character code

– Uniquely represents a character by a binary string

2
Two Ways to Compress data

• Fixed-Length Codewords

• Variable-Length Codewords

3
Example
E.g.:

Suppose you have to compress a data file containing


100,000 characters

The data file only has 6 characters (a, b, c, d, e, f)

a b c d e f
Frequency (thousands) 45 13 12 16 9 5

4
Fixed-Length Codes
E.g.: Data file containing 100,000 characters
a b c d e f
Frequency (thousands) 45 13 12 16 9 5

• 3 bits needed

• a = 000, b = 001, c = 010, d = 011, e = 100, f = 101

• Requires: 100,000 X 3 = 300,000 bits

5
Huffman Codes
• Idea:
– Variable Length Codes

– Use the frequencies of occurrence of characters to


build a optimal way of representing each character

6
Constructing a Huffman Code
• Idea:
– Arranging the characters in Ascending order of their
frequencies

f: 5 e: 9 c: 12 b: 13 d: 16 a: 45

– Add Two minimum frequencies and place them to the


right position

7
Example
f: 5 e: 9 c: 12 b: 13 d: 16 a: 45 c: 12 b: 13 14 d: 16 a: 45
0 1
f: 5 e: 9

14 d: 16 25 a: 45 25 30 a: 45
0 1 0 1 0 1 0 1
f: 5 e: 9 c: 12 b: 13 c: 12 b: 13 14 d: 16
0 1
f: 5 e: 9

a: 45 55 0 100 1
0 1
a: 45 55
25 30 0 1
0 1 0 1
25 30
c: 12 b: 13 14 d: 16 0 1 0 1
0 1 c: 12 b: 13 d: 16
14
f: 5 e: 9 0 1
f: 5 e: 9 8
Variable-Length Codes
E.g.: Data file containing 100,000 characters

a b c d e f
Frequency (thousands) 45 13 12 16 9 5
Variable length
0 101 100 111 1101 1100
Codewords

• Assign short codewords to frequent characters and


long codewords to infrequent characters

• ((45 * 1) + (13 * 3) + (12 * 3) + (16 * 3) + (9 * 4) + (5 * 4)) X 1,000


= 224,000 bits 9
Comparison Ration

a b c d e f
Frequency (thousands) 45 13 12 16 9 5
Fixed Length
000 001 010 011 100 101
Codewords
Variable length
0 101 100 111 1101 1100
Codewords
300000 – 224000
X 100
300000

= 25.33%

Saves 20% to 90% cost. 10


Building a Huffman Code
Alg.: HUFFMAN(C) Running time: O(nlgn)
1. n  C 
2. QC O(n)
3. for i  1 to n – 1
4. do allocate a new node z
5. left[z]  x  EXTRACT-MIN(Q)
O(nlgn)
6. right[z]  y  EXTRACT-MIN(Q)
7. f[z]  f[x] + f[y]
8. INSERT (Q, z)
9. return EXTRACT-MIN(Q)
11
Encoding and Decoding
0 100 1 Character Codeword
a: 45 55
0 1 a 0
25 30 b 101
0 1 0 1
c: 12 b: 13 14 d: 16 c 100
0 1
d 111
f: 5 e: 9
e 1101
f 1100

Encode : abcde

Decode: 101100011111011100
12
Dynamic Programming

Introduction
Dynamic Programming
• An algorithm design technique for optimization problems
(similar to divide and conquer)

• Dynamic Programming solves every subproblems just once


and store the answer in a table; next time when the result is
needed , it simply use the data from the table.

• Examines the previously solved subproblems and combines


their solutions to give the best solution for the whole problem.

14
Dynamic Programming

Part – 1: Longest increasing subsequence(LIS)


Longest increasing subsequence(LIS)

The Longest Increasing Subsequence (LIS) problem
is to find the length of the longest subsequence of a
given sequence such that all elements of the
subsequence are sorted in increasing order.

• For example,
{10, 22, 9, 33, 21, 50, 41, 60, 80}

LIS is {10, 22, 33, 50, 60, 80}


Length of LIS = 6
16
Increasing subsequences

• e.g. You have given the following array

9 2 5 3 7 11 8 10 13 6

Now Find The increasing subsequences.

17
Increasing subsequences
• e.g. 9 2 5 3 7 11 8 10 13 6
2 3 7
5 7 10 13 are increasing subsequences.
2 5 7 8 10 13
2 3 7 8 10 13
We want to find a longest one.
So LIS = 2 3 7 8 10 13 OR 2 5 7 8 10 13
And Length of LIS = 6

9 7 11 are not increasing subsequences.


5 3 11 13
18
More Example
• Input : arr[] = {3, 10, 2, 1, 20}
• Output : Length of LIS = 3
• The longest increasing subsequence is 3, 10, 20

• Input : arr[] = {3, 2}


• Output : Length of LIS = 1
• The longest increasing subsequences are {3} and {2}

• Input : arr[] = {50, 3, 10, 7, 40, 80}


• Output : Length of LIS = 4
• The longest increasing subsequence is {3, 7, 40, 80}

19
A naive approach for LIS
• Let LIS[i] be the length of a longest increasing
subsequence ending at position i.
LIS[i] = 1 + max j = 0..i-1{L[j] | aj < ai}
(use a dummy a0 = minimum, and L[0]=0)
i = 1; j = 0 to i-1

Index 0 1 2 3 4 5 6 7 8 9

Input 9 2 5 3 7 11 8 10 13 6

Length

Prev

20
A naive approach for LIS

Index 0 1 2 3 4 5 6 7 8 9

Input 9 2 5 3 7 11 8 10 13 6

Length 1 1 2 2 3 4 4 5 6 3

Prev -1 -1 1 1 2 4 4 6 7 2

The longest increasing subsequence 2, 5, 7, 8, 10, 13


Length of LIS = 6

21
Psudocode or LIS
LIS( int arr[], int n )
• for (i = 0; i < n; i++ )
• lis[i] = 1;

• /* Compute optimized LIS values in bottom up manner */
• for (i = 1; i < n; i++ )
• for (j = 0; j < i; j++ )
• if ( arr[i] > arr[j] && lis[i] < lis[j] + 1)
• lis[i] = lis[j] + 1;

• /* Pick maximum of all LIS values */
• for (i = 0; i < n; i++ )
• if (max < lis[i])
• max = lis[i];

• return max;
22

You might also like