Huffman Coding: Eric Dubois

Huffman Coding
Eric Dubois
School of Electrical Engineering and Computer Science

University of Ottawa
September 2012
Eric Dubois (EECS) Huffman Coding September 2012 1 / 17

The optimal prefix code problem
Given a finite alphabet with a given set of probabilities, we want to
find a prefix code with the shortest average codeword length.
To simplify notation, denote pi = P(ai ) and ì = `(ai ), for
i = 1, . . . , M.
Without loss of generality, we arrange the symbols in the alphabet so
that p1 ≥ p2 ≥ · · · ≥ pM .
Problem: Find a set of positive integers `1 , `2 , . . . , `M such that
M
X
`= p i ì
i=1
is minimized, subject to the constraint

M
X
2−ì ≤ 1
i=1
The solution may not be unique.

Preview of the Huffman algorithm
The Huffman algorithm was originally devised by David Huffman,
apparently as part of a course assignment at MIT and published in
1951.
Consider the following example: M = 8,
{pi } = {0.25, 0.2, 0.2, 0.18, 0.09, 0.05, 0.02, 0.01}.
The Huffman procedure constructs the prefix code starting with the
last bits of the least probable symbols.
List the probabilities in decreasing order in a column on the left.
Assign the final bits of the last two codewords.
Add the two probabilities to replace the previous two.
Select the two lowest probabilities in the reduced list, and assign two
bits.
Continue until two symbols remain.
Read codewords from right to left.

Huffman coding example
Huffman code example
Codewords

0 10
p1 = 0.25
0 00
p2 = 0.2
0.4 0
1.0
p3 = 0.2 01
1 0.6
0 1
p4 = 0.18 110
0 0.35 1110
p5 = 0.09
1
0 0.17 11110
p6 = 0.05 1
0.08
0 111110
p7 = 0.02 1
0.03
1 111111
p8 = 0.01
1

Huffman coding example (2)
H1 = − 8i=1 pi log2 (pi ) = 2.5821

P
`Huff = 8i=1 pi ì = 2.63

P
`Shann = 3.04
`fixed = 3

Huffman coding example – spreadsheet
Example of Huffman and Shannon code
Shannon code Huffman code
p_i ‐log_2(p_i) ‐p_i*log_2(p_i) l_i p_i*l_i 2^(‐l_i) l_i p_i*l_i 2^(‐l_i)
0.25 2.0000 0.5000 2 0.5 0.25 2 0.5 0.25
0.20 2.3219 0.4644 3 0.6 0.125 2 0.4 0.25
0.20 2.3219 0.4644 3 0.6 0.125 2 0.4 0.25
0.18 2.4739 0.4453 3 0.54 0.125 3 0.54 0.125
0.09 3.4739 0.3127 4 0.36 0.0625 4 0.36 0.0625
0.05 4.3219 0.2161 5 0.25 0.03125 5 0.25 0.03125
0.02 5.6439 0.1129 6 0.12 0.015625 6 0.12 0.015625
0.01 6.6439 0.0664 7 0.07 0.0078125 6 0.06 0.015625
sum p_i entropy H_1 l_Shann Kraft l_Huff Kraft

1.00 2.5821 3.0400 0.7422 2.6300 1.0000

Huffman coding example – binary tree
Huffman coding example – binary tree
0 1
0 1 0 1
c (a2) c (a3) c (a1)

0 1
c (a4)
0 1
c (a5)
0 1
c (a6)
0 1
c (a7) c (a8)


Theorem
For any admissible set of probabilities, there exists an optimal prefix code
satisfying the following properties:
1 If pj > pk , then `j ≤ `k , so that `1 ≤ `2 ≤ · · · ≤ `M .
2 The two longest codewords have the same length: `M−1 = `M .
3 The two longest codewords differ only in their last bit, and correspond
to the two source symbols of lowest probability.
Note that not all optimal codes need satisfy these properties, but at least
one does.

Proof (1)
Let C be an optimal code with codeword lengths `1 , . . . , `M , and

suppose that contrary to the theorem statement, pj > pk but `j > `k .
Let C 0 be a new code with `0j = `k , `0k = `j , and `0i = ì for i 6= j, k.
Then
M
X M
X
`(C 0 ) − `(C) = pi `0i − pi ì
i=1 i=1
= pj `k + pk `j − pj `j − pk `k
= (pj − pk )(`k − `j ) < 0
which contradicts the assumption that C is an optimal code. Thus

`j ≤ `k .

Theorem

Proof (2)
Suppose that `M > `M−1 . Thus no other codeword will be of length

`M .
Since C is a prefix code, we can remove the last bit of c(aM ) and the
new code will still be a prefix code, but of lower average codeword
length (`(C) − pM ).
Again, this contradicts the assumption that C is an optimal code, so
`M−1 = `M .

Theorem

Proof (3)
For all the codewords of length `M , there is another codeword that

differs only in the last bit. Otherwise, we could remove the last bit as
in (2) and reduce the average codeword length.
If the codeword that differs from c(aM ) in the last bit is not c(aM−1 )
but rather c(aj ) for some other j, we can exchange the codewords for
aM−1 and aj without changing the average codeword length, and the
code would remain optimal.
The Huffman algorithm is a recursive procedure to find a code satisfying
the properties of the theorem.

Huffman code scenario
Optimal code tree
cM (aM-1) cM (aM)

Recursive Algorithm
Assume that we have an optimal code CM for the alphabet
A = {a1 , . . . .aM } with probabilities P(ai ) satisfying the properties of
the theorem.
Form the reduced alphabet A0 = {a10 , . . . , aM−1 0 } with probabilities
0 0
P(ai ) = P(ai ), i = 1, . . . , M − 2 and P(aM−1 ) = P(aM−1 ) + P(aM ).
Suppose that we have a prefix code CM−1 for the reduced alphabet
satisfying cM (ai ) = cM−1 (ai0 ), i = 1, . . . , M − 2,
cM (aM−1 ) = cM−1 (aM−1 0 0
) ∗ 0 and cM (aM ) = cM−1 (aM−1 )∗1
0 0
Then ì = ì , i = 1, . . . , M − 2 and `M−1 = `M = `M−1 + 1.
M
X M−2
X
`(CM ) = P(ai )ì = P(ai0 )`0i + (P(aM−1 ) + P(aM ))(`0M−1 + 1)
i=1 i=1
M−1
X
= P(ai0 )`0i + P(aM−1 ) + P(aM )
i=1
= `(CM−1 ) + P(aM−1 ) + P(aM )

Huffman code scenario (2)
Reduced code tree

cM-1 (a’M-1)

Recursive Algorithm (2)
Conclusion: CM is an optimal code for {A, P(ai )} if and only if CM−1

is an optimal code for {A0 , P(ai0 )}
Similarly, we can obtain CM−2 from CM−1 .
We continue until we reach C2 for an alphabet with two symbols,
where the only possible code has codewords 0 and 1.
This results in the Huffman procedure illustrated by the earlier
example.
Note that H1 ≤ `Huff ≤ `Shann < H1 + 1.

Huffman Coding: Eric Dubois

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Huffman Coding: Eric Dubois

Uploaded by

Copyright:

Available Formats

Huffman Coding

School of Electrical Engineering and Computer Science

Eric Dubois (EECS) Huffman Coding September 2012 1 / 17

is minimized, subject to the constraint

The solution may not be unique.

Eric Dubois (EECS) Huffman Coding September 2012 3 / 17

Eric Dubois (EECS) Huffman Coding September 2012 4 / 17

H1 = − 8i=1 pi log2 (pi ) = 2.5821

`Huff = 8i=1 pi `i = 2.63

Eric Dubois (EECS) Huffman Coding September 2012 5 / 17

sum p_i entropy H_1 l_Shann Kraft l_Huff Kraft

Eric Dubois (EECS) Huffman Coding September 2012 6 / 17

c (a2) c (a3) c (a1)

Eric Dubois (EECS) Huffman Coding September 2012 7 / 17

Eric Dubois (EECS) Huffman Coding September 2012 8 / 17

Let C be an optimal code with codeword lengths `1 , . . . , `M , and

which contradicts the assumption that C is an optimal code. Thus

Eric Dubois (EECS) Huffman Coding September 2012 9 / 17

Eric Dubois (EECS) Huffman Coding September 2012 10 / 17

Suppose that `M > `M−1 . Thus no other codeword will be of length

Eric Dubois (EECS) Huffman Coding September 2012 11 / 17

Eric Dubois (EECS) Huffman Coding September 2012 12 / 17

For all the codewords of length `M , there is another codeword that

Eric Dubois (EECS) Huffman Coding September 2012 13 / 17

Eric Dubois (EECS) Huffman Coding September 2012 14 / 17

Eric Dubois (EECS) Huffman Coding September 2012 15 / 17

Eric Dubois (EECS) Huffman Coding September 2012 16 / 17

Conclusion: CM is an optimal code for {A, P(ai )} if and only if CM−1

Eric Dubois (EECS) Huffman Coding September 2012 17 / 17

You might also like