Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Huffman Coding

Eric Dubois

School of Electrical Engineering and Computer Science


University of Ottawa

September 2012

Eric Dubois (EECS) Huffman Coding September 2012 1 / 17


The optimal prefix code problem
Given a finite alphabet with a given set of probabilities, we want to
find a prefix code with the shortest average codeword length.
To simplify notation, denote pi = P(ai ) and `i = `(ai ), for
i = 1, . . . , M.
Without loss of generality, we arrange the symbols in the alphabet so
that p1 ≥ p2 ≥ · · · ≥ pM .
Problem: Find a set of positive integers `1 , `2 , . . . , `M such that
M
X
`= p i `i
i=1

is minimized, subject to the constraint


M
X
2−`i ≤ 1
i=1

The solution may not be unique.


Eric Dubois (EECS) Huffman Coding September 2012 2 / 17
Preview of the Huffman algorithm
The Huffman algorithm was originally devised by David Huffman,
apparently as part of a course assignment at MIT and published in
1951.
Consider the following example: M = 8,
{pi } = {0.25, 0.2, 0.2, 0.18, 0.09, 0.05, 0.02, 0.01}.
The Huffman procedure constructs the prefix code starting with the
last bits of the least probable symbols.
List the probabilities in decreasing order in a column on the left.
Assign the final bits of the last two codewords.
Add the two probabilities to replace the previous two.
Select the two lowest probabilities in the reduced list, and assign two
bits.
Continue until two symbols remain.
Read codewords from right to left.

Eric Dubois (EECS) Huffman Coding September 2012 3 / 17


Huffman coding example
Huffman code example 

Codewords 
 
0 10 
p1 = 0.25 
0  00 
p2 = 0.2 
0.4 0 
1.0
p3 = 0.2  01 
1  0.6 
0 1 
p4 = 0.18  110 

0 0.35 1110 
p5 = 0.09 
1
0 0.17 11110 
p6 = 0.05  1
0.08
0  111110 
p7 = 0.02  1
0.03
1 111111 
p8 = 0.01 

Eric Dubois (EECS) Huffman Coding September 2012 4 / 17


Huffman coding example (2)

H1 = − 8i=1 pi log2 (pi ) = 2.5821


P

`Huff = 8i=1 pi `i = 2.63


P

`Shann = 3.04
`fixed = 3

Eric Dubois (EECS) Huffman Coding September 2012 5 / 17


Huffman coding example – spreadsheet

Example of Huffman and Shannon code

Shannon code Huffman code
p_i ‐log_2(p_i) ‐p_i*log_2(p_i) l_i p_i*l_i 2^(‐l_i) l_i p_i*l_i 2^(‐l_i)
0.25 2.0000 0.5000 2 0.5 0.25 2 0.5 0.25
0.20 2.3219 0.4644 3 0.6 0.125 2 0.4 0.25
0.20 2.3219 0.4644 3 0.6 0.125 2 0.4 0.25
0.18 2.4739 0.4453 3 0.54 0.125 3 0.54 0.125
0.09 3.4739 0.3127 4 0.36 0.0625 4 0.36 0.0625
0.05 4.3219 0.2161 5 0.25 0.03125 5 0.25 0.03125
0.02 5.6439 0.1129 6 0.12 0.015625 6 0.12 0.015625
0.01 6.6439 0.0664 7 0.07 0.0078125 6 0.06 0.015625

sum p_i entropy H_1 l_Shann Kraft l_Huff Kraft


1.00 2.5821 3.0400 0.7422 2.6300 1.0000

Eric Dubois (EECS) Huffman Coding September 2012 6 / 17


Huffman coding example – binary tree
Huffman coding example – binary tree 

0 1

0  1 0 1

c (a2) c (a3) c (a1)


0 1 

c (a4)
0 1

c (a5)
0 1

c (a6)
0 1

c (a7) c (a8)
 

Eric Dubois (EECS) Huffman Coding September 2012 7 / 17


Theorem
For any admissible set of probabilities, there exists an optimal prefix code
satisfying the following properties:
1 If pj > pk , then `j ≤ `k , so that `1 ≤ `2 ≤ · · · ≤ `M .
2 The two longest codewords have the same length: `M−1 = `M .
3 The two longest codewords differ only in their last bit, and correspond
to the two source symbols of lowest probability.

Note that not all optimal codes need satisfy these properties, but at least
one does.

Eric Dubois (EECS) Huffman Coding September 2012 8 / 17


Proof (1)

Let C be an optimal code with codeword lengths `1 , . . . , `M , and


suppose that contrary to the theorem statement, pj > pk but `j > `k .
Let C 0 be a new code with `0j = `k , `0k = `j , and `0i = `i for i 6= j, k.
Then
M
X M
X
`(C 0 ) − `(C) = pi `0i − pi `i
i=1 i=1
= pj `k + pk `j − pj `j − pk `k
= (pj − pk )(`k − `j ) < 0

which contradicts the assumption that C is an optimal code. Thus


`j ≤ `k .

Eric Dubois (EECS) Huffman Coding September 2012 9 / 17


Theorem
For any admissible set of probabilities, there exists an optimal prefix code
satisfying the following properties:
1 If pj > pk , then `j ≤ `k , so that `1 ≤ `2 ≤ · · · ≤ `M .
2 The two longest codewords have the same length: `M−1 = `M .
3 The two longest codewords differ only in their last bit, and correspond
to the two source symbols of lowest probability.

Eric Dubois (EECS) Huffman Coding September 2012 10 / 17


Proof (2)

Suppose that `M > `M−1 . Thus no other codeword will be of length


`M .
Since C is a prefix code, we can remove the last bit of c(aM ) and the
new code will still be a prefix code, but of lower average codeword
length (`(C) − pM ).
Again, this contradicts the assumption that C is an optimal code, so
`M−1 = `M .

Eric Dubois (EECS) Huffman Coding September 2012 11 / 17


Theorem
For any admissible set of probabilities, there exists an optimal prefix code
satisfying the following properties:
1 If pj > pk , then `j ≤ `k , so that `1 ≤ `2 ≤ · · · ≤ `M .
2 The two longest codewords have the same length: `M−1 = `M .
3 The two longest codewords differ only in their last bit, and correspond
to the two source symbols of lowest probability.

Eric Dubois (EECS) Huffman Coding September 2012 12 / 17


Proof (3)

For all the codewords of length `M , there is another codeword that


differs only in the last bit. Otherwise, we could remove the last bit as
in (2) and reduce the average codeword length.
If the codeword that differs from c(aM ) in the last bit is not c(aM−1 )
but rather c(aj ) for some other j, we can exchange the codewords for
aM−1 and aj without changing the average codeword length, and the
code would remain optimal.
The Huffman algorithm is a recursive procedure to find a code satisfying
the properties of the theorem.

Eric Dubois (EECS) Huffman Coding September 2012 13 / 17


Huffman code scenario 
Optimal code tree

cM (aM-1) cM (aM)

Eric Dubois (EECS) Huffman Coding September 2012 14 / 17


Recursive Algorithm
Assume that we have an optimal code CM for the alphabet
A = {a1 , . . . .aM } with probabilities P(ai ) satisfying the properties of
the theorem.
Form the reduced alphabet A0 = {a10 , . . . , aM−1 0 } with probabilities
0 0
P(ai ) = P(ai ), i = 1, . . . , M − 2 and P(aM−1 ) = P(aM−1 ) + P(aM ).
Suppose that we have a prefix code CM−1 for the reduced alphabet
satisfying cM (ai ) = cM−1 (ai0 ), i = 1, . . . , M − 2,
cM (aM−1 ) = cM−1 (aM−1 0 0
) ∗ 0 and cM (aM ) = cM−1 (aM−1 )∗1
0 0
Then `i = `i , i = 1, . . . , M − 2 and `M−1 = `M = `M−1 + 1.
M
X M−2
X
`(CM ) = P(ai )`i = P(ai0 )`0i + (P(aM−1 ) + P(aM ))(`0M−1 + 1)
i=1 i=1
M−1
X
= P(ai0 )`0i + P(aM−1 ) + P(aM )
i=1
= `(CM−1 ) + P(aM−1 ) + P(aM )

Eric Dubois (EECS) Huffman Coding September 2012 15 / 17


Huffman code scenario (2) 
  Reduced code tree
 

cM-1 (a’M-1)

Eric Dubois (EECS) Huffman Coding September 2012 16 / 17


Recursive Algorithm (2)

Conclusion: CM is an optimal code for {A, P(ai )} if and only if CM−1


is an optimal code for {A0 , P(ai0 )}
Similarly, we can obtain CM−2 from CM−1 .
We continue until we reach C2 for an alphabet with two symbols,
where the only possible code has codewords 0 and 1.
This results in the Huffman procedure illustrated by the earlier
example.
Note that H1 ≤ `Huff ≤ `Shann < H1 + 1.

Eric Dubois (EECS) Huffman Coding September 2012 17 / 17

You might also like