Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Digital Communication & Systems

V. Praksh Singh

Department of Electronics & Comunication Engineering

National Institute of Technology Hamirpur
Hamirpur, Himachal Pradesh

Jan,9 2020
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Lecture #2: Coding for Discrete Sources

’Modern Digital and Analog Communication Systems’, by B. P. Lathi and Zhi
’Introduction to Analog and Digital Communications’, by Simon Haylin and
Michael Moher.
’Principles of Digital Communication’, by Robert G Gallager.
’Elements of Information Theory’, by Thomas Cover and Joy thomas.
’Lecture notes on Applied Digital Information Theory I’, James L. Massey.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Outline of the lecture

Coding of a discrete information source

Mathematical modeling of discrete sources

Source Coding

Source coding Theorem

Source coding Algorithms
Shannon Fano coding
Huffman coding
Lempel-ziv coding

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms


1 Coding for Discrete Sources

2 Mathematical Modeling of a Discrete Sources

3 Coding of a discrete random variable

4 Source Coding Algorithms

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Coding for Discrete Sources

In digital communication systems, all information sources e.g.
speech waveforms (analog source) or text files (discrete source)
must be represented by a sequence of bits.

Discrete Source: The source output is a sequence of symbols from a

given alphabet A of finite size. e.g.: Text files may consist of
symbols from an alphabet of english letters and alpha-numeric

The source encoder converts the sequence of symbols from the

source to a sequence of bits, using as few bits per symbol as
possible. (Also called source compression)

In this lecture, we will consider lossless encoding of discrete sources,

such that source output can be uniquely recovered from the encoded
string of bits.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Fixed length codes for Discrete Sources

Suppose the size of the source alphabet A is M. The simplest
method to encode a discrete source is to map each symbol a ∈ A
into a fixed length code C(a).

For example: If the source alphabet consists of 26 capital english

letters, then the following binary code of block-length L=5 can be

Table: Fixed length coding

Symbols Code
A 00000
B 00001
.. ..
. .
Y 11010
Z 11011
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Fixed length codes for Discrete Sources

For binary fixed length code for a source alphabet A of size M will
require L = dlog2 Me bits to encode each symbol (L bits/symbol).

Where d(x)e denotes the smallest integer greater than or equal to

the real number x.

However, this method of assigning codewords of fixed length to each

source symbol does not take into consideration the probabilities of
occurance of source symbols.

We can reduce the average length per symbol (L) by assignong more
bits to less probable symbols and lesser bits to more probable
symbols i.e. variable length coding.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms


1 Coding for Discrete Sources

2 Mathematical Modeling of a Discrete Sources

3 Coding of a discrete random variable

4 Source Coding Algorithms

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Mathematical Modeling of a Discrete Source

We mathematically model a discrete source as a discrete random

Figure: Discrete Information Source

We further assume that the discrete source is memoryless i.e. the

sequence of symbols U1 , U2 , · · · are independently generated by the
source and with the same probability mass function.

So, the output of a discrete memoryless source (DMS) is a sequence

of independent and identically distributed random variables.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Mathematical Modeling of a Discrete Source

Let U is a K-ary discrete random variable which takes values from

the set A = {u1 , u2 , · · · , uK }.

The probability mass function (PMF) of U is

Pr (U = ui ) = pi for i = 1, 2, · · · , K .

Each source output in a DMS i.e. U1 , U2 , · · · is selected from A

with the same PMF.

Each source output Ui in a DMS is statistically independent of

previous outputs U1 , U2 , · · · , Ui−1 .

A DMS is completely described by the source alphabet A and the

set of probabilities {pi }K
i=1 .

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Information Measure

Suppose ui is a particular realization of a K-ary discrete random

variable U. How can we quantify the amount of information I (ui )
provided by the observation ui ?

The information measure of output ui should depend only on the

probability of ui i.e. pi . More probable the outcome, less will be the
information conveyed by its observation (i.e. measure should be a
decreasing function of p).

The information measure should be a continuous function of

probability. i.e. small change in the probability of a certain
observation should not drastically change the information.

If the observation is divided into two (or more) independent parts

ui = {ui1 , ui,2 } the information measure of ui should be the sum of
information provided by the each independent part.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Information Measure (Entropy)

The only function for information measure, that satisfies the above
mentioned properties is the logarithmic function.
I (ui ) = log
is called as self-information of ui .
We define the average information content of the random variable U
as entropy of the discrete random variable U.
K K  
X X 1
H(U) = pi I (ui ) = pi log
i=1 i=1

where 0log 0 = 0
If base of the logarithm is 2, entropy (information) is expressed in
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Properties of Entropy

The entropy is always non-negative i.e. H(U) ≥ 0.

Entropy can be changed from one base to another as

Hb (U) = Ha (U)logb a.

Let a binary memoryless source (2-ary discrete source) emits u1 = 0

or u2 = 1 with Pr (U = u1 ) = p and Pr (U = u2 ) = 1 − p.
H(U) = −plogp − (1 − p)log (1 − p) =H(p)

is called the binary entropy function.

We can observe that H(U) = 0 for p=0 or p=1 i.e source generates
only zeros or ones and there is no uncertainty. The maximum
uncertainty is when p = 1/2 and entropy is maximum H(U) = 1.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms


1 Coding for Discrete Sources

2 Mathematical Modeling of a Discrete Sources

3 Coding of a discrete random variable

4 Source Coding Algorithms

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Coding of a discrete random variable

Given a K-ary random variable with the source alphabet
A = {u1 , u2 , · · · , uK } and the set of probabilities {pi }K
i=1 .

Figure: Coding of a discrete random variable

The symbols Xi takes values from a D-ary alphabet

D = {0, 1, · · · , D − 1} . For binary codes D=2.

A variable-length code C maps each source symbol ui in A to a

D-ary string [x1 , x2 , · · · , xli ] called a codeword.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Coding of a discrete random variable

The number of symbols in the codeword C(ui ) is called the length

(li ) of the codeword.

The set of K codewords {C(u1 ), C(u2 ), · · · , C(uK )} is called a D-ary

code for a K-ary random variable.

The average (expected) length of the code C is defined as

L̄ = pi li

We see that we can reduce the average length of the code by

assigning codewords of smaller lengths to more probable source
symbols and vice-versa.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Class of codes

However, we can not just arbitrarily assign codewords of small

lengths to each source symbol. A code C must satisfy following
conditions for it to be suitable for encoding discrete sources in

Non-singular codes: A code C is nonsingular if every symbol from

the source alphabet is assigned a distinct codeword.

ui 6= uj =⇒ C(ui ) 6= C(uj )

Uniquely decodable codes: A code C is uniquely decodable if a

sequence of codewords should be decoded into only one possible
sequence of source symbols.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Class of codes
Instantaneous code: A code is called a prefix-free code or an
instantaneous code if no codeword in the code is a prefix of another

In an instantaneous code, a symbol ui can be decoded as soon as

the last symbol of corresponding codeword arrives with out waiting
for future codewords.

Figure: Example of codes

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Kraft Inequality for prefix-free codes

The objective in encoding a discrete source is to construct a code C

which is instantaneous (prefix fee) and of minimum average length.

The set of codeword lengths possible for an instantaneous code is

limited by the Kraft inequality.

There exists a D-ary prefix-free (instantaneous) code whose

codeword lengths are the positive integers l1 , l2 , l3 , · · · , lK if and only
D −li ≤ 1

Conversely, if a set of positive integers l1 , l2 , l3 , · · · , lK satisfy Kraft

inequality, we can construct a D-ary prefix-free code with codewords
of these lengths.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Optimal Prefix-free Codes

Given a discrete memoryless source i.e. source alphabet A and the

set of probabilities {pi }K
i=1 , we wish to create a prefix-free D-ary
code with minimum possible average length L̄.

In other words, we wish to determine a set of codeword lengths

l1 , l2 , l3 , ·P
· · , lK that satisfy Kraft’s inequality and the average length
of code i=1 pi li is minimized.

The optimization problem is formulated as

min i=1 pi li
l1 ,l2 ,··· ,lK
PK −li
subject to i=1 D ≤1
l1 , l2 , · · · , lK are positive integers

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Optimal Prefix-free Codes

Initially, we simplify the optimization problem by droping the integer

constraint on the codeword lengths.

This simplified problem can be solved using Lagrange multiplier


The Lagrangian of the simplified problem is formed as

J= pi li + λ D −1
i=1 i=1

where λ is the Lagrange multiplier.

Setting ∂J
∂li = 0, we get D −li = pi /λloge D.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Optimal Prefix-free Codes

PK 1
Using the constraint i=1 pi = 1 we obtain λ = loge D .

The optimal solution for codeword lengths are given as

∗ 1
l = logD

Substituting these lengths, we obtain the average length of the

optimal code

X 1
L̄ = pi logD = HD (U)

In summary, the entropy HD (U) is a lowerbound for average length

L̄ for prefix-free codes and this lowerbound is achieved when
li = −logD (pi ) for each i.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Entropy bounds for prefix-free codes

In the solution of the optimization problem for the optimal code

(minimum average length code), we relaxed the integer constraint
on the lengths of the codeqords.

Therefore, HD (U) provides a lower bound on the average length of

the optimal code. The following theorem provides a lowerbound and
upperbound on the average length of the optimal code.

Coding Theorem : Let l1 , l2 , · · · , lK be the codeword lengths of an

optimal D-ary code for a discrete random variable with K-ary
alphabet A and probability mass function p and L̄∗ is the average
length of the code.

HD (U) ≤ L̄∗ < HD (U) + 1

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Proof of Coding Theorem

First we prove the left side i.e. show that HD (U) ≤ L̄∗ .

For any prefix-free code

K   K
X 1 X
HD (U) − L̄ = pi logD − pi li
i=1 i=1
K   K
X 1 X
= pi logD + pi logD (D −li )
i=1 i=1
K  −li

= pi logD

where we have used logD (D li ) = li .

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Proof of Coding Theorem

Using an inequality logD u ≤ (u − 1)logD e, we get

K  −li 
HD (U) − L̄ ≤ logD e pi −1
≤ logD e D − pi
i=1 i=1
≤ 0

where we have used Kraft inequality and properties of probability.

The inequality is strict unless li = −logD pi and pi is a power of D

(because length of the codewords are integer).

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Proof of Coding Theorem

Now we prove the right side i.e. show that L̄∗ < HD (U) + 1.

Let us assign the codeword lengths of a code as li = dLogD (−pi )e.

We can show that these lengths satisfy the Kraft inequality i.e. it is
a prefix-free code.

This assignment of these codeword lengths implies

−logD pi ≤ li < −logD pi + 1

− pi logD pi ≤ pi li < − pi logD pi + 1
i=1 i=1 i=1
HD (U) ≤ L̄ < HD (U) + 1

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Extension of Source

We can see from the bounds on average length of the optimal code,
that we can construct codes for discrete memoryless source with in
one bit of the source entropy.

The average length of the code can be made arbitrarily close to

source entropy by encoding a block of n source symbols.

Suppose a block of symbols from the source is considered as one

super symbol U n = [U1 , U2 , · · · , Un ].

This super symbol is considered as a random variable which takes

values from the alphabet An of size K n . A prefix-free code can be
constructed for U n similar to that for U. This is called nth order
extension of the source.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Extension of Source

A block of n i.i.d. source symbols will have entropy

HD (U n ) = H(U1 , U2 , · · · , Un ) = nH(U)

Let L̄∗n be the average length per input symbol of the optimal
prefix-free code for U n , then applying the theorem for bounds on
average length of optimal codes, we get

HD (U n ) ≤ nL̄∗n < HD (U n ) + 1
nHD (U) ≤ nL̄∗n < nHD (U) + 1
HD (U) ≤ L̄∗n < HD (U) + 1/n

This result shows that we can simultaneously encode long n-tuples

of source symbols to approach the entropy bound (as 1/n goes to

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Extension of Source (Example)

Example: Let a discrete memoryless source has the source alphabet
A = {u1 , u2 } with corresponding probabilities {p1 = 0.4, p2 = 0.6}.
Find the entropy of the source. Suppose now we wish to
simultaneously encode a block of two symbols from the source i.e.
second order extension of the source. Construct the source alphabet
and probability distribution for the extended source and find its
Solution: The entropy of the source is computed as
H2 (U) = − pi log2 pi

which gives H2 (U) = 0.9710.

Now, we encode a pair of source symbols together. The second

order extension of the source is given as
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Extension of Source (Example)

The second order extension of the source

Table: Second order extension of the source

Symbols Probability Probability
u1 u1 p21 0.16
u1 u2 p1 p2 0.24
u2 u1 p2 p1 0.24
u2 u2 p22 0.36
The entropy of this source is

H(U) = −(0.16log2 (0.16)+0.24log2 (0.24)+0.16log2 (0.16)+0.36log2 (0.36))

We can see that entropy of the second order extension of the source
is twice the entropy of the original source. (because the emitted
symbols from the source are assumed i.i.d.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms


1 Coding for Discrete Sources

2 Mathematical Modeling of a Discrete Sources

3 Coding of a discrete random variable

4 Source Coding Algorithms

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Source Coding Algorithms

We have seen that the source of the entropy gives a lower bound on
the average length of any prefix-free code for the source.

In this section, we will study some specific algorithms for source

coding and compare the achieved average code length with the
source entropy.

Shannon Fano Source coding Algorithm : This is a suboptimal

procedure for designing prefix-free code for a given discrete
memoryless source.

This algorithm achieves the average code length as L̄ ≤ H(U) + 2.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Shannon-Fano Source Coding Algorithm

Shannon-Fano coding: Algorithm for constructing a binary
prefix-free code for a K-ary random variable U.

Initialization: Given a K-ary random variable with source alphabet

A = {u1 , u2 , · · · , uK } and corresponding probabilities
Pr (U = ui ) = pi . Order these symbols in decreasing order of

Step 1: Divide the symbols into two subgroups, such that sum of
symbol probabilities in two subgroups are as close as possible.
Step 2: Assign next most significant bit of these two subgroups as 0
and 1 in any order.
Step 3: If only one symbol is left in any subgroup, stop else go to
Step 1.
Extract the Shannon-Fano code starting from the MSB.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Shannon-Fano Source Coding Algorithm

Example: Construct Shannon-Fano code for the following source:
P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4) = 0.10, P(u5) =
0.10, P(u6) = 0.10

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Shannon-Fano Source Coding Algorithm

The average code length of the Shannon-Fano code for the given
source is
L̄ = pi li = 0.25×2+0.25×2+0.20×3+0.10×3+0.10×3+0.10×3 = 2.5

The entropy of the given source is

H2 (U) = − pi log2 pi = 2.4610

The efficiency of the designed code is defined as

Source Entropy
Code efficiency =
Average code length
So the code efficiency in this case is 2.5 = 0.984. (i.e. 98.4%)
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Source Coding Algorithm

Huffman coding: Huffman codes are the optimal prefix free codes
i.e. ( with minimum anergae code length) for a discrete memoryless
source with a given probability mass function.

The basic idea of Huffman codes is to assign short code sequence to

more probable source symbols and longer code sequence to less
probable source symbols.

The set of codeword lengths for Huffman (optimal) code is not

unique, i.e. there may be different sets of codeword lengths with
same average length.

The codeword length for optimal code may not always be less than
dlogD (1/pi)e.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Codes (Binary)

Huffman coding: Algorithm for constructing binary (D=2) prefix-free
code for a K-ary discrete memoryless source (random variable U).
Initialization: Given a K-ary random variable, create K active nodes
u1 , u2 , · · · , uK and assign probabilities Pr (U = ui ) = pi to these.
Step 1: Create a new node that combines together the two least
probable nodes and assign label 0 and 1 to the two branches in any
order. Assign the new node a probability equal to the sum of the
probabilities of these two nodes.
Deactivate these two nodes which are combined and make the new
node active.
Step 2: If only one node is left, make it root and stop else go to
Step 1.
Extract the Huffman codewords from different branches starting
from the root.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Codes (D-ary)

Huffman coding: Algorithm for constructing D-ary (D > 2)
prefix-free code for a K-ary random variable U.
Initialization: Given a K-ary random variable, create K nodes
u1 , u2 , · · · , uK and assign probabilities Pr (U = ui ) = pi to these.
Compute the remainder p when (K − D)(D − 2) is divided by D-1.

Step 1: Create a new node that combines together the D - p least

probable nodes with D - p branches of a D-ary branch and assign
assign label 0, 1, · · · , D − p − 1 to the D − p branches in any order.
Assign the new node a probability equal to the sum of the
probabilities of these two nodes.
Deactivate these D − p nodes which are combined and make the
new node active.
Step 2: If only one node is left, stop, else make p = 0 and go to
Step 1.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Codes
Example: Construct a binary Huffman code for the following source:
P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4) = 0.10, P(u5) =
0.10, P(u6) = 0.10

Figure: Binary Huffman Code

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Codes
Example: Construct a ternary (3-ary) Huffman code for the
following source: P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4)
= 0.10, P(u5) = 0.10, P(u6) = 0.10

Figure: Ternary Huffman Code

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Coding Algorithm

The average code length of the binary Huffman code for the given
source is
L̄ = pi li = 0.25×2+0.25×2+0.20×2+0.10×3+0.10×4+0.10×4 bits

The average code length of the ternary Huffman code for the same
source is

L̄ = 0.25×1+0.25×1+0.20×2+0.10×2+0.10×3+0.10×3 ternary digits

The code efficiency of binary Huffman code is 2.5 = 0.984. (i.e.

The code efficiency of ternary Huffman code is 1.552

1.7 = 0.91.34. (i.e.
91.3%). Note that we have used entropy in base 3 i.e. H3 (U) here.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems

You might also like