Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Digital Communication & Systems

V. Praksh Singh

Department of Electronics & Comunication Engineering


National Institute of Technology Hamirpur
Hamirpur, Himachal Pradesh
India

Jan,9 2020
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Lecture #2: Coding for Discrete Sources

References:
’Modern Digital and Analog Communication Systems’, by B. P. Lathi and Zhi
Ding.
’Introduction to Analog and Digital Communications’, by Simon Haylin and
Michael Moher.
’Principles of Digital Communication’, by Robert G Gallager.
’Elements of Information Theory’, by Thomas Cover and Joy thomas.
’Lecture notes on Applied Digital Information Theory I’, James L. Massey.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Outline of the lecture

Coding of a discrete information source

Mathematical modeling of discrete sources

Source Coding

Source coding Theorem


Source coding Algorithms
Shannon Fano coding
Huffman coding
Lempel-ziv coding

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Outline

1 Coding for Discrete Sources

2 Mathematical Modeling of a Discrete Sources

3 Coding of a discrete random variable

4 Source Coding Algorithms

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Coding for Discrete Sources


In digital communication systems, all information sources e.g.
speech waveforms (analog source) or text files (discrete source)
must be represented by a sequence of bits.

Discrete Source: The source output is a sequence of symbols from a


given alphabet A of finite size. e.g.: Text files may consist of
symbols from an alphabet of english letters and alpha-numeric
characters.

The source encoder converts the sequence of symbols from the


source to a sequence of bits, using as few bits per symbol as
possible. (Also called source compression)

In this lecture, we will consider lossless encoding of discrete sources,


such that source output can be uniquely recovered from the encoded
string of bits.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Fixed length codes for Discrete Sources


Suppose the size of the source alphabet A is M. The simplest
method to encode a discrete source is to map each symbol a ∈ A
into a fixed length code C(a).

For example: If the source alphabet consists of 26 capital english


letters, then the following binary code of block-length L=5 can be
used.

Table: Fixed length coding

Symbols Code
A 00000
B 00001
.. ..
. .
Y 11010
Z 11011
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Fixed length codes for Discrete Sources

For binary fixed length code for a source alphabet A of size M will
require L = dlog2 Me bits to encode each symbol (L bits/symbol).

Where d(x)e denotes the smallest integer greater than or equal to


the real number x.

However, this method of assigning codewords of fixed length to each


source symbol does not take into consideration the probabilities of
occurance of source symbols.

We can reduce the average length per symbol (L) by assignong more
bits to less probable symbols and lesser bits to more probable
symbols i.e. variable length coding.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Outline

1 Coding for Discrete Sources

2 Mathematical Modeling of a Discrete Sources

3 Coding of a discrete random variable

4 Source Coding Algorithms

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Mathematical Modeling of a Discrete Source


We mathematically model a discrete source as a discrete random
process

Figure: Discrete Information Source

We further assume that the discrete source is memoryless i.e. the


sequence of symbols U1 , U2 , · · · are independently generated by the
source and with the same probability mass function.

So, the output of a discrete memoryless source (DMS) is a sequence


of independent and identically distributed random variables.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Mathematical Modeling of a Discrete Source

Let U is a K-ary discrete random variable which takes values from


the set A = {u1 , u2 , · · · , uK }.

The probability mass function (PMF) of U is


Pr (U = ui ) = pi for i = 1, 2, · · · , K .

Each source output in a DMS i.e. U1 , U2 , · · · is selected from A


with the same PMF.

Each source output Ui in a DMS is statistically independent of


previous outputs U1 , U2 , · · · , Ui−1 .

A DMS is completely described by the source alphabet A and the


set of probabilities {pi }K
i=1 .

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Information Measure

Suppose ui is a particular realization of a K-ary discrete random


variable U. How can we quantify the amount of information I (ui )
provided by the observation ui ?

The information measure of output ui should depend only on the


probability of ui i.e. pi . More probable the outcome, less will be the
information conveyed by its observation (i.e. measure should be a
decreasing function of p).

The information measure should be a continuous function of


probability. i.e. small change in the probability of a certain
observation should not drastically change the information.

If the observation is divided into two (or more) independent parts


ui = {ui1 , ui,2 } the information measure of ui should be the sum of
information provided by the each independent part.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Information Measure (Entropy)


The only function for information measure, that satisfies the above
mentioned properties is the logarithmic function.
 
1
I (ui ) = log
pi
is called as self-information of ui .
We define the average information content of the random variable U
as entropy of the discrete random variable U.
K K  
X X 1
H(U) = pi I (ui ) = pi log
pi
i=1 i=1

where 0log 0 = 0
If base of the logarithm is 2, entropy (information) is expressed in
bits/symbol.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Properties of Entropy

The entropy is always non-negative i.e. H(U) ≥ 0.

Entropy can be changed from one base to another as


Hb (U) = Ha (U)logb a.

Let a binary memoryless source (2-ary discrete source) emits u1 = 0


or u2 = 1 with Pr (U = u1 ) = p and Pr (U = u2 ) = 1 − p.
def
H(U) = −plogp − (1 − p)log (1 − p) =H(p)

is called the binary entropy function.

We can observe that H(U) = 0 for p=0 or p=1 i.e source generates
only zeros or ones and there is no uncertainty. The maximum
uncertainty is when p = 1/2 and entropy is maximum H(U) = 1.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Outline

1 Coding for Discrete Sources

2 Mathematical Modeling of a Discrete Sources

3 Coding of a discrete random variable

4 Source Coding Algorithms

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Coding of a discrete random variable


Given a K-ary random variable with the source alphabet
A = {u1 , u2 , · · · , uK } and the set of probabilities {pi }K
i=1 .

Figure: Coding of a discrete random variable

The symbols Xi takes values from a D-ary alphabet


D = {0, 1, · · · , D − 1} . For binary codes D=2.

A variable-length code C maps each source symbol ui in A to a


D-ary string [x1 , x2 , · · · , xli ] called a codeword.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Coding of a discrete random variable

The number of symbols in the codeword C(ui ) is called the length


(li ) of the codeword.

The set of K codewords {C(u1 ), C(u2 ), · · · , C(uK )} is called a D-ary


code for a K-ary random variable.

The average (expected) length of the code C is defined as


K
X
L̄ = pi li
i=1

We see that we can reduce the average length of the code by


assigning codewords of smaller lengths to more probable source
symbols and vice-versa.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Class of codes

However, we can not just arbitrarily assign codewords of small


lengths to each source symbol. A code C must satisfy following
conditions for it to be suitable for encoding discrete sources in
practice.

Non-singular codes: A code C is nonsingular if every symbol from


the source alphabet is assigned a distinct codeword.

ui 6= uj =⇒ C(ui ) 6= C(uj )

Uniquely decodable codes: A code C is uniquely decodable if a


sequence of codewords should be decoded into only one possible
sequence of source symbols.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Class of codes
Instantaneous code: A code is called a prefix-free code or an
instantaneous code if no codeword in the code is a prefix of another
codeword.

In an instantaneous code, a symbol ui can be decoded as soon as


the last symbol of corresponding codeword arrives with out waiting
for future codewords.

Figure: Example of codes


V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Kraft Inequality for prefix-free codes

The objective in encoding a discrete source is to construct a code C


which is instantaneous (prefix fee) and of minimum average length.

The set of codeword lengths possible for an instantaneous code is


limited by the Kraft inequality.

There exists a D-ary prefix-free (instantaneous) code whose


codeword lengths are the positive integers l1 , l2 , l3 , · · · , lK if and only
if
XK
D −li ≤ 1
i=1

Conversely, if a set of positive integers l1 , l2 , l3 , · · · , lK satisfy Kraft


inequality, we can construct a D-ary prefix-free code with codewords
of these lengths.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Optimal Prefix-free Codes

Given a discrete memoryless source i.e. source alphabet A and the


set of probabilities {pi }K
i=1 , we wish to create a prefix-free D-ary
code with minimum possible average length L̄.

In other words, we wish to determine a set of codeword lengths


l1 , l2 , l3 , ·P
· · , lK that satisfy Kraft’s inequality and the average length
K
of code i=1 pi li is minimized.

The optimization problem is formulated as


PK
min i=1 pi li
l1 ,l2 ,··· ,lK
PK −li
subject to i=1 D ≤1
l1 , l2 , · · · , lK are positive integers

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Optimal Prefix-free Codes

Initially, we simplify the optimization problem by droping the integer


constraint on the codeword lengths.

This simplified problem can be solved using Lagrange multiplier


method.

The Lagrangian of the simplified problem is formed as


K
X K
X 
−li
J= pi li + λ D −1
i=1 i=1

where λ is the Lagrange multiplier.

Setting ∂J
∂li = 0, we get D −li = pi /λloge D.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Optimal Prefix-free Codes


PK 1
Using the constraint i=1 pi = 1 we obtain λ = loge D .

The optimal solution for codeword lengths are given as


 
∗ 1
l = logD
pi

Substituting these lengths, we obtain the average length of the


optimal code
K  

X 1
L̄ = pi logD = HD (U)
pi
i=1

In summary, the entropy HD (U) is a lowerbound for average length


L̄ for prefix-free codes and this lowerbound is achieved when
li = −logD (pi ) for each i.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Entropy bounds for prefix-free codes

In the solution of the optimization problem for the optimal code


(minimum average length code), we relaxed the integer constraint
on the lengths of the codeqords.

Therefore, HD (U) provides a lower bound on the average length of


the optimal code. The following theorem provides a lowerbound and
upperbound on the average length of the optimal code.

Coding Theorem : Let l1 , l2 , · · · , lK be the codeword lengths of an


optimal D-ary code for a discrete random variable with K-ary
alphabet A and probability mass function p and L̄∗ is the average
length of the code.

HD (U) ≤ L̄∗ < HD (U) + 1

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Proof of Coding Theorem

First we prove the left side i.e. show that HD (U) ≤ L̄∗ .

For any prefix-free code


K   K
X 1 X
HD (U) − L̄ = pi logD − pi li
pi
i=1 i=1
K   K
X 1 X
= pi logD + pi logD (D −li )
pi
i=1 i=1
K  −li

X D
= pi logD
pi
i=1

where we have used logD (D li ) = li .

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Proof of Coding Theorem

Using an inequality logD u ≤ (u − 1)logD e, we get


K  −li 
X D
HD (U) − L̄ ≤ logD e pi −1
pi
i=1
X K K
X 
−li
≤ logD e D − pi
i=1 i=1
≤ 0

where we have used Kraft inequality and properties of probability.

The inequality is strict unless li = −logD pi and pi is a power of D


(because length of the codewords are integer).

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Proof of Coding Theorem

Now we prove the right side i.e. show that L̄∗ < HD (U) + 1.

Let us assign the codeword lengths of a code as li = dLogD (−pi )e.

We can show that these lengths satisfy the Kraft inequality i.e. it is
a prefix-free code.

This assignment of these codeword lengths implies

−logD pi ≤ li < −logD pi + 1


K
X X K K
X
− pi logD pi ≤ pi li < − pi logD pi + 1
i=1 i=1 i=1
HD (U) ≤ L̄ < HD (U) + 1

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Extension of Source

We can see from the bounds on average length of the optimal code,
that we can construct codes for discrete memoryless source with in
one bit of the source entropy.

The average length of the code can be made arbitrarily close to


source entropy by encoding a block of n source symbols.

Suppose a block of symbols from the source is considered as one


super symbol U n = [U1 , U2 , · · · , Un ].

This super symbol is considered as a random variable which takes


values from the alphabet An of size K n . A prefix-free code can be
constructed for U n similar to that for U. This is called nth order
extension of the source.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Extension of Source

A block of n i.i.d. source symbols will have entropy

HD (U n ) = H(U1 , U2 , · · · , Un ) = nH(U)

Let L̄∗n be the average length per input symbol of the optimal
prefix-free code for U n , then applying the theorem for bounds on
average length of optimal codes, we get

HD (U n ) ≤ nL̄∗n < HD (U n ) + 1
nHD (U) ≤ nL̄∗n < nHD (U) + 1
HD (U) ≤ L̄∗n < HD (U) + 1/n

This result shows that we can simultaneously encode long n-tuples


of source symbols to approach the entropy bound (as 1/n goes to
zero).

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Extension of Source (Example)


Example: Let a discrete memoryless source has the source alphabet
A = {u1 , u2 } with corresponding probabilities {p1 = 0.4, p2 = 0.6}.
Find the entropy of the source. Suppose now we wish to
simultaneously encode a block of two symbols from the source i.e.
second order extension of the source. Construct the source alphabet
and probability distribution for the extended source and find its
entropy.
Solution: The entropy of the source is computed as
K
X
H2 (U) = − pi log2 pi
i=1

which gives H2 (U) = 0.9710.

Now, we encode a pair of source symbols together. The second


order extension of the source is given as
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Extension of Source (Example)


The second order extension of the source

Table: Second order extension of the source


Symbols Probability Probability
u1 u1 p21 0.16
u1 u2 p1 p2 0.24
u2 u1 p2 p1 0.24
u2 u2 p22 0.36
The entropy of this source is

H(U) = −(0.16log2 (0.16)+0.24log2 (0.24)+0.16log2 (0.16)+0.36log2 (0.36))

We can see that entropy of the second order extension of the source
is twice the entropy of the original source. (because the emitted
symbols from the source are assumed i.i.d.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Outline

1 Coding for Discrete Sources

2 Mathematical Modeling of a Discrete Sources

3 Coding of a discrete random variable

4 Source Coding Algorithms

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Source Coding Algorithms

We have seen that the source of the entropy gives a lower bound on
the average length of any prefix-free code for the source.

In this section, we will study some specific algorithms for source


coding and compare the achieved average code length with the
source entropy.

Shannon Fano Source coding Algorithm : This is a suboptimal


procedure for designing prefix-free code for a given discrete
memoryless source.

This algorithm achieves the average code length as L̄ ≤ H(U) + 2.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Shannon-Fano Source Coding Algorithm


Shannon-Fano coding: Algorithm for constructing a binary
prefix-free code for a K-ary random variable U.

Initialization: Given a K-ary random variable with source alphabet


A = {u1 , u2 , · · · , uK } and corresponding probabilities
Pr (U = ui ) = pi . Order these symbols in decreasing order of
probabilities.

Step 1: Divide the symbols into two subgroups, such that sum of
symbol probabilities in two subgroups are as close as possible.
Step 2: Assign next most significant bit of these two subgroups as 0
and 1 in any order.
Step 3: If only one symbol is left in any subgroup, stop else go to
Step 1.
Extract the Shannon-Fano code starting from the MSB.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Shannon-Fano Source Coding Algorithm


Example: Construct Shannon-Fano code for the following source:
P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4) = 0.10, P(u5) =
0.10, P(u6) = 0.10

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Shannon-Fano Source Coding Algorithm


The average code length of the Shannon-Fano code for the given
source is
K
X
L̄ = pi li = 0.25×2+0.25×2+0.20×3+0.10×3+0.10×3+0.10×3 = 2.5
i=1

The entropy of the given source is


K
X
H2 (U) = − pi log2 pi = 2.4610
i=1

The efficiency of the designed code is defined as


Source Entropy
Code efficiency =
Average code length
2.4610
So the code efficiency in this case is 2.5 = 0.984. (i.e. 98.4%)
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Source Coding Algorithm

Huffman coding: Huffman codes are the optimal prefix free codes
i.e. ( with minimum anergae code length) for a discrete memoryless
source with a given probability mass function.

The basic idea of Huffman codes is to assign short code sequence to


more probable source symbols and longer code sequence to less
probable source symbols.

The set of codeword lengths for Huffman (optimal) code is not


unique, i.e. there may be different sets of codeword lengths with
same average length.

The codeword length for optimal code may not always be less than
dlogD (1/pi)e.

V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Codes (Binary)


Huffman coding: Algorithm for constructing binary (D=2) prefix-free
code for a K-ary discrete memoryless source (random variable U).
Initialization: Given a K-ary random variable, create K active nodes
u1 , u2 , · · · , uK and assign probabilities Pr (U = ui ) = pi to these.
Step 1: Create a new node that combines together the two least
probable nodes and assign label 0 and 1 to the two branches in any
order. Assign the new node a probability equal to the sum of the
probabilities of these two nodes.
Deactivate these two nodes which are combined and make the new
node active.
Step 2: If only one node is left, make it root and stop else go to
Step 1.
Extract the Huffman codewords from different branches starting
from the root.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Codes (D-ary)


Huffman coding: Algorithm for constructing D-ary (D > 2)
prefix-free code for a K-ary random variable U.
Initialization: Given a K-ary random variable, create K nodes
u1 , u2 , · · · , uK and assign probabilities Pr (U = ui ) = pi to these.
Compute the remainder p when (K − D)(D − 2) is divided by D-1.

Step 1: Create a new node that combines together the D - p least


probable nodes with D - p branches of a D-ary branch and assign
assign label 0, 1, · · · , D − p − 1 to the D − p branches in any order.
Assign the new node a probability equal to the sum of the
probabilities of these two nodes.
Deactivate these D − p nodes which are combined and make the
new node active.
Step 2: If only one node is left, stop, else make p = 0 and go to
Step 1.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Codes
Example: Construct a binary Huffman code for the following source:
P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4) = 0.10, P(u5) =
0.10, P(u6) = 0.10

Figure: Binary Huffman Code


V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Codes
Example: Construct a ternary (3-ary) Huffman code for the
following source: P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4)
= 0.10, P(u5) = 0.10, P(u6) = 0.10

Figure: Ternary Huffman Code


V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms

Huffman Coding Algorithm


The average code length of the binary Huffman code for the given
source is
K
X
L̄ = pi li = 0.25×2+0.25×2+0.20×2+0.10×3+0.10×4+0.10×4 bits
i=1

The average code length of the ternary Huffman code for the same
source is

L̄ = 0.25×1+0.25×1+0.20×2+0.10×2+0.10×3+0.10×3 ternary digits


2.4610
The code efficiency of binary Huffman code is 2.5 = 0.984. (i.e.
98.4%)

The code efficiency of ternary Huffman code is 1.552


1.7 = 0.91.34. (i.e.
91.3%). Note that we have used entropy in base 3 i.e. H3 (U) here.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Digital Communication & Systems

You might also like