Lecture2 PDF

Digital Communication & Systems
V. Praksh Singh
Department of Electronics & Comunication Engineering

National Institute of Technology Hamirpur
Hamirpur, Himachal Pradesh
India
Jan,9 2020
Coding for Discrete Sources Mathematical Modeling of a Discrete Sources Coding of a discrete random variable Source Coding Algorithms
Lecture #2: Coding for Discrete Sources
References:
’Modern Digital and Analog Communication Systems’, by B. P. Lathi and Zhi
Ding.
’Introduction to Analog and Digital Communications’, by Simon Haylin and
Michael Moher.
’Principles of Digital Communication’, by Robert G Gallager.
’Elements of Information Theory’, by Thomas Cover and Joy thomas.
’Lecture notes on Applied Digital Information Theory I’, James L. Massey.
V. Praksh Singh Department of Electronics & Comunication Engineering National Institute of Technology Hamirpur Hamirpur, Himachal Pradesh India
Outline of the lecture
Coding of a discrete information source
Mathematical modeling of discrete sources
Source Coding
Source coding Theorem

Source coding Algorithms
Shannon Fano coding
Huffman coding
Lempel-ziv coding
Outline
1 Coding for Discrete Sources
2 Mathematical Modeling of a Discrete Sources
3 Coding of a discrete random variable
4 Source Coding Algorithms
Coding for Discrete Sources

In digital communication systems, all information sources e.g.
speech waveforms (analog source) or text files (discrete source)
must be represented by a sequence of bits.
Discrete Source: The source output is a sequence of symbols from a

given alphabet A of finite size. e.g.: Text files may consist of
symbols from an alphabet of english letters and alpha-numeric
characters.
The source encoder converts the sequence of symbols from the

source to a sequence of bits, using as few bits per symbol as
possible. (Also called source compression)
In this lecture, we will consider lossless encoding of discrete sources,

such that source output can be uniquely recovered from the encoded
string of bits.
Fixed length codes for Discrete Sources

Suppose the size of the source alphabet A is M. The simplest
method to encode a discrete source is to map each symbol a ∈ A
into a fixed length code C(a).
For example: If the source alphabet consists of 26 capital english

letters, then the following binary code of block-length L=5 can be
used.
Table: Fixed length coding
Symbols Code
A 00000
B 00001
.. ..
. .
Y 11010
Z 11011
Fixed length codes for Discrete Sources
For binary fixed length code for a source alphabet A of size M will
require L = dlog2 Me bits to encode each symbol (L bits/symbol).
Where d(x)e denotes the smallest integer greater than or equal to

the real number x.
However, this method of assigning codewords of fixed length to each

source symbol does not take into consideration the probabilities of
occurance of source symbols.
We can reduce the average length per symbol (L) by assignong more
bits to less probable symbols and lesser bits to more probable
symbols i.e. variable length coding.
Outline
Mathematical Modeling of a Discrete Source

We mathematically model a discrete source as a discrete random
process
Figure: Discrete Information Source
We further assume that the discrete source is memoryless i.e. the

sequence of symbols U1 , U2 , · · · are independently generated by the
source and with the same probability mass function.
So, the output of a discrete memoryless source (DMS) is a sequence

of independent and identically distributed random variables.
Mathematical Modeling of a Discrete Source
Let U is a K-ary discrete random variable which takes values from

the set A = {u1 , u2 , · · · , uK }.
The probability mass function (PMF) of U is

Pr (U = ui ) = pi for i = 1, 2, · · · , K .
Each source output in a DMS i.e. U1 , U2 , · · · is selected from A

with the same PMF.
Each source output Ui in a DMS is statistically independent of

previous outputs U1 , U2 , · · · , Ui−1 .
A DMS is completely described by the source alphabet A and the

set of probabilities {pi }K
i=1 .
Information Measure
Suppose ui is a particular realization of a K-ary discrete random

variable U. How can we quantify the amount of information I (ui )
provided by the observation ui ?
The information measure of output ui should depend only on the

probability of ui i.e. pi . More probable the outcome, less will be the
information conveyed by its observation (i.e. measure should be a
decreasing function of p).
The information measure should be a continuous function of

probability. i.e. small change in the probability of a certain
observation should not drastically change the information.
If the observation is divided into two (or more) independent parts

ui = {ui1 , ui,2 } the information measure of ui should be the sum of
information provided by the each independent part.
Information Measure (Entropy)

The only function for information measure, that satisfies the above
mentioned properties is the logarithmic function.

1
I (ui ) = log
pi
is called as self-information of ui .
We define the average information content of the random variable U
as entropy of the discrete random variable U.
K K
X X 1
H(U) = pi I (ui ) = pi log
pi
i=1 i=1
where 0log 0 = 0
If base of the logarithm is 2, entropy (information) is expressed in
bits/symbol.
Properties of Entropy
The entropy is always non-negative i.e. H(U) ≥ 0.
Entropy can be changed from one base to another as

Hb (U) = Ha (U)logb a.
Let a binary memoryless source (2-ary discrete source) emits u1 = 0

or u2 = 1 with Pr (U = u1 ) = p and Pr (U = u2 ) = 1 − p.
def
H(U) = −plogp − (1 − p)log (1 − p) =H(p)
is called the binary entropy function.
We can observe that H(U) = 0 for p=0 or p=1 i.e source generates
only zeros or ones and there is no uncertainty. The maximum
uncertainty is when p = 1/2 and entropy is maximum H(U) = 1.
Outline
Coding of a discrete random variable

Given a K-ary random variable with the source alphabet
A = {u1 , u2 , · · · , uK } and the set of probabilities {pi }K
i=1 .
Figure: Coding of a discrete random variable
The symbols Xi takes values from a D-ary alphabet

D = {0, 1, · · · , D − 1} . For binary codes D=2.
A variable-length code C maps each source symbol ui in A to a

D-ary string [x1 , x2 , · · · , xli ] called a codeword.
Coding of a discrete random variable
The number of symbols in the codeword C(ui ) is called the length

(li ) of the codeword.
The set of K codewords {C(u1 ), C(u2 ), · · · , C(uK )} is called a D-ary

code for a K-ary random variable.
The average (expected) length of the code C is defined as

K
X
L̄ = pi li
i=1
We see that we can reduce the average length of the code by

assigning codewords of smaller lengths to more probable source
symbols and vice-versa.
Class of codes
However, we can not just arbitrarily assign codewords of small

lengths to each source symbol. A code C must satisfy following
conditions for it to be suitable for encoding discrete sources in
practice.
Non-singular codes: A code C is nonsingular if every symbol from

the source alphabet is assigned a distinct codeword.
ui 6= uj =⇒ C(ui ) 6= C(uj )
Uniquely decodable codes: A code C is uniquely decodable if a

sequence of codewords should be decoded into only one possible
sequence of source symbols.
Class of codes
Instantaneous code: A code is called a prefix-free code or an
instantaneous code if no codeword in the code is a prefix of another
codeword.
In an instantaneous code, a symbol ui can be decoded as soon as

the last symbol of corresponding codeword arrives with out waiting
for future codewords.
Figure: Example of codes

Kraft Inequality for prefix-free codes
The objective in encoding a discrete source is to construct a code C

which is instantaneous (prefix fee) and of minimum average length.
The set of codeword lengths possible for an instantaneous code is

limited by the Kraft inequality.
There exists a D-ary prefix-free (instantaneous) code whose

codeword lengths are the positive integers l1 , l2 , l3 , · · · , lK if and only
if
XK
D −li ≤ 1
i=1
Conversely, if a set of positive integers l1 , l2 , l3 , · · · , lK satisfy Kraft

inequality, we can construct a D-ary prefix-free code with codewords
of these lengths.
Optimal Prefix-free Codes
Given a discrete memoryless source i.e. source alphabet A and the

set of probabilities {pi }K
i=1 , we wish to create a prefix-free D-ary
code with minimum possible average length L̄.
In other words, we wish to determine a set of codeword lengths

l1 , l2 , l3 , ·P
· · , lK that satisfy Kraft’s inequality and the average length
K
of code i=1 pi li is minimized.
The optimization problem is formulated as

PK
min i=1 pi li
l1 ,l2 ,··· ,lK
PK −li
subject to i=1 D ≤1
l1 , l2 , · · · , lK are positive integers
Initially, we simplify the optimization problem by droping the integer

constraint on the codeword lengths.
This simplified problem can be solved using Lagrange multiplier

method.
The Lagrangian of the simplified problem is formed as

K
X K
X
−li
J= pi li + λ D −1
i=1 i=1
where λ is the Lagrange multiplier.
Setting ∂J
∂li = 0, we get D −li = pi /λloge D.

PK 1
Using the constraint i=1 pi = 1 we obtain λ = loge D .
The optimal solution for codeword lengths are given as

∗ 1
l = logD
pi
Substituting these lengths, we obtain the average length of the

optimal code
K
∗
X 1
L̄ = pi logD = HD (U)
pi
i=1
In summary, the entropy HD (U) is a lowerbound for average length

L̄ for prefix-free codes and this lowerbound is achieved when
li = −logD (pi ) for each i.
Entropy bounds for prefix-free codes
In the solution of the optimization problem for the optimal code

(minimum average length code), we relaxed the integer constraint
on the lengths of the codeqords.
Therefore, HD (U) provides a lower bound on the average length of

the optimal code. The following theorem provides a lowerbound and
upperbound on the average length of the optimal code.
Coding Theorem : Let l1 , l2 , · · · , lK be the codeword lengths of an

optimal D-ary code for a discrete random variable with K-ary
alphabet A and probability mass function p and L̄∗ is the average
length of the code.
HD (U) ≤ L̄∗ < HD (U) + 1
Proof of Coding Theorem
First we prove the left side i.e. show that HD (U) ≤ L̄∗ .
For any prefix-free code

K K
X 1 X
HD (U) − L̄ = pi logD − pi li
pi
i=1 i=1
K K
X 1 X
= pi logD + pi logD (D −li )
pi
i=1 i=1
K −li

X D
= pi logD
pi
i=1
where we have used logD (D li ) = li .
Using an inequality logD u ≤ (u − 1)logD e, we get

K −li
X D
HD (U) − L̄ ≤ logD e pi −1
pi
i=1
X K K
X
−li
≤ logD e D − pi
i=1 i=1
≤ 0
where we have used Kraft inequality and properties of probability.
The inequality is strict unless li = −logD pi and pi is a power of D

(because length of the codewords are integer).
Now we prove the right side i.e. show that L̄∗ < HD (U) + 1.
Let us assign the codeword lengths of a code as li = dLogD (−pi )e.
We can show that these lengths satisfy the Kraft inequality i.e. it is
a prefix-free code.
This assignment of these codeword lengths implies
−logD pi ≤ li < −logD pi + 1

K
X X K K
X
− pi logD pi ≤ pi li < − pi logD pi + 1
i=1 i=1 i=1
HD (U) ≤ L̄ < HD (U) + 1
Extension of Source
We can see from the bounds on average length of the optimal code,
that we can construct codes for discrete memoryless source with in
one bit of the source entropy.
The average length of the code can be made arbitrarily close to

source entropy by encoding a block of n source symbols.
Suppose a block of symbols from the source is considered as one

super symbol U n = [U1 , U2 , · · · , Un ].
This super symbol is considered as a random variable which takes

values from the alphabet An of size K n . A prefix-free code can be
constructed for U n similar to that for U. This is called nth order
extension of the source.
Extension of Source
A block of n i.i.d. source symbols will have entropy
HD (U n ) = H(U1 , U2 , · · · , Un ) = nH(U)
Let L̄∗n be the average length per input symbol of the optimal
prefix-free code for U n , then applying the theorem for bounds on
average length of optimal codes, we get
HD (U n ) ≤ nL̄∗n < HD (U n ) + 1
nHD (U) ≤ nL̄∗n < nHD (U) + 1
HD (U) ≤ L̄∗n < HD (U) + 1/n
This result shows that we can simultaneously encode long n-tuples

of source symbols to approach the entropy bound (as 1/n goes to
zero).
Extension of Source (Example)

Example: Let a discrete memoryless source has the source alphabet
A = {u1 , u2 } with corresponding probabilities {p1 = 0.4, p2 = 0.6}.
Find the entropy of the source. Suppose now we wish to
simultaneously encode a block of two symbols from the source i.e.
second order extension of the source. Construct the source alphabet
and probability distribution for the extended source and find its
entropy.
Solution: The entropy of the source is computed as
K
X
H2 (U) = − pi log2 pi
i=1
which gives H2 (U) = 0.9710.
Now, we encode a pair of source symbols together. The second

order extension of the source is given as
Extension of Source (Example)

The second order extension of the source
Table: Second order extension of the source

Symbols Probability Probability
u1 u1 p21 0.16
u1 u2 p1 p2 0.24
u2 u1 p2 p1 0.24
u2 u2 p22 0.36
The entropy of this source is
H(U) = −(0.16log2 (0.16)+0.24log2 (0.24)+0.16log2 (0.16)+0.36log2 (0.36))
We can see that entropy of the second order extension of the source
is twice the entropy of the original source. (because the emitted
symbols from the source are assumed i.i.d.
Outline
Source Coding Algorithms
We have seen that the source of the entropy gives a lower bound on
the average length of any prefix-free code for the source.
In this section, we will study some specific algorithms for source

coding and compare the achieved average code length with the
source entropy.
Shannon Fano Source coding Algorithm : This is a suboptimal

procedure for designing prefix-free code for a given discrete
memoryless source.
This algorithm achieves the average code length as L̄ ≤ H(U) + 2.
Shannon-Fano Source Coding Algorithm

Shannon-Fano coding: Algorithm for constructing a binary
prefix-free code for a K-ary random variable U.
Initialization: Given a K-ary random variable with source alphabet

A = {u1 , u2 , · · · , uK } and corresponding probabilities
Pr (U = ui ) = pi . Order these symbols in decreasing order of
probabilities.
Step 1: Divide the symbols into two subgroups, such that sum of
symbol probabilities in two subgroups are as close as possible.
Step 2: Assign next most significant bit of these two subgroups as 0
and 1 in any order.
Step 3: If only one symbol is left in any subgroup, stop else go to
Step 1.
Extract the Shannon-Fano code starting from the MSB.

Example: Construct Shannon-Fano code for the following source:
P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4) = 0.10, P(u5) =
0.10, P(u6) = 0.10

The average code length of the Shannon-Fano code for the given
source is
K
X
L̄ = pi li = 0.25×2+0.25×2+0.20×3+0.10×3+0.10×3+0.10×3 = 2.5
i=1
The entropy of the given source is

K
X
H2 (U) = − pi log2 pi = 2.4610
i=1
The efficiency of the designed code is defined as

Source Entropy
Code efficiency =
Average code length
2.4610
So the code efficiency in this case is 2.5 = 0.984. (i.e. 98.4%)
Huffman Source Coding Algorithm
Huffman coding: Huffman codes are the optimal prefix free codes
i.e. ( with minimum anergae code length) for a discrete memoryless
source with a given probability mass function.
The basic idea of Huffman codes is to assign short code sequence to

more probable source symbols and longer code sequence to less
probable source symbols.
The set of codeword lengths for Huffman (optimal) code is not

unique, i.e. there may be different sets of codeword lengths with
same average length.
The codeword length for optimal code may not always be less than
dlogD (1/pi)e.
Huffman Codes (Binary)

Huffman coding: Algorithm for constructing binary (D=2) prefix-free
code for a K-ary discrete memoryless source (random variable U).
Initialization: Given a K-ary random variable, create K active nodes
u1 , u2 , · · · , uK and assign probabilities Pr (U = ui ) = pi to these.
Step 1: Create a new node that combines together the two least
probable nodes and assign label 0 and 1 to the two branches in any
order. Assign the new node a probability equal to the sum of the
probabilities of these two nodes.
Deactivate these two nodes which are combined and make the new
node active.
Step 2: If only one node is left, make it root and stop else go to
Step 1.
Extract the Huffman codewords from different branches starting
from the root.
Huffman Codes (D-ary)

Huffman coding: Algorithm for constructing D-ary (D > 2)
prefix-free code for a K-ary random variable U.
Initialization: Given a K-ary random variable, create K nodes
u1 , u2 , · · · , uK and assign probabilities Pr (U = ui ) = pi to these.
Compute the remainder p when (K − D)(D − 2) is divided by D-1.
Step 1: Create a new node that combines together the D - p least

probable nodes with D - p branches of a D-ary branch and assign
assign label 0, 1, · · · , D − p − 1 to the D − p branches in any order.
Assign the new node a probability equal to the sum of the
probabilities of these two nodes.
Deactivate these D − p nodes which are combined and make the
new node active.
Step 2: If only one node is left, stop, else make p = 0 and go to
Step 1.
Huffman Codes
Example: Construct a binary Huffman code for the following source:
P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4) = 0.10, P(u5) =
0.10, P(u6) = 0.10
Figure: Binary Huffman Code

Huffman Codes
Example: Construct a ternary (3-ary) Huffman code for the
following source: P(u1) = 0.25, P(u2) = 0.25, P(u3) = 0.20, P(u4)
= 0.10, P(u5) = 0.10, P(u6) = 0.10
Figure: Ternary Huffman Code

Huffman Coding Algorithm

The average code length of the binary Huffman code for the given
source is
K
X
L̄ = pi li = 0.25×2+0.25×2+0.20×2+0.10×3+0.10×4+0.10×4 bits
i=1
The average code length of the ternary Huffman code for the same
source is
L̄ = 0.25×1+0.25×1+0.20×2+0.10×2+0.10×3+0.10×3 ternary digits

2.4610
The code efficiency of binary Huffman code is 2.5 = 0.984. (i.e.
98.4%)
The code efficiency of ternary Huffman code is 1.552

1.7 = 0.91.34. (i.e.
91.3%). Note that we have used entropy in base 3 i.e. H3 (U) here.

Lecture2 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture2 PDF

Uploaded by

Copyright:

Available Formats

Digital Communication & Systems

Department of Electronics & Comunication Engineering

Lecture #2: Coding for Discrete Sources

Outline of the lecture

Coding of a discrete information source

Mathematical modeling of discrete sources

Source coding Theorem

1 Coding for Discrete Sources

2 Mathematical Modeling of a Discrete Sources

3 Coding of a discrete random variable

4 Source Coding Algorithms

Coding for Discrete Sources

Discrete Source: The source output is a sequence of symbols from a

The source encoder converts the sequence of symbols from the

In this lecture, we will consider lossless encoding of discrete sources,

Fixed length codes for Discrete Sources

For example: If the source alphabet consists of 26 capital english

Table: Fixed length coding

Fixed length codes for Discrete Sources

Where d(x)e denotes the smallest integer greater than or equal to

However, this method of assigning codewords of fixed length to each

1 Coding for Discrete Sources

2 Mathematical Modeling of a Discrete Sources

3 Coding of a discrete random variable

4 Source Coding Algorithms

Mathematical Modeling of a Discrete Source

Figure: Discrete Information Source

We further assume that the discrete source is memoryless i.e. the

So, the output of a discrete memoryless source (DMS) is a sequence

Mathematical Modeling of a Discrete Source

Let U is a K-ary discrete random variable which takes values from

The probability mass function (PMF) of U is

Each source output in a DMS i.e. U1 , U2 , · · · is selected from A

Each source output Ui in a DMS is statistically independent of

A DMS is completely described by the source alphabet A and the

Suppose ui is a particular realization of a K-ary discrete random

The information measure of output ui should depend only on the

The information measure should be a continuous function of

If the observation is divided into two (or more) independent parts

Information Measure (Entropy)

The entropy is always non-negative i.e. H(U) ≥ 0.

Entropy can be changed from one base to another as

Let a binary memoryless source (2-ary discrete source) emits u1 = 0

is called the binary entropy function.

1 Coding for Discrete Sources

2 Mathematical Modeling of a Discrete Sources

3 Coding of a discrete random variable

4 Source Coding Algorithms

Coding of a discrete random variable

Figure: Coding of a discrete random variable

The symbols Xi takes values from a D-ary alphabet

A variable-length code C maps each source symbol ui in A to a

Coding of a discrete random variable

The number of symbols in the codeword C(ui ) is called the length

The set of K codewords {C(u1 ), C(u2 ), · · · , C(uK )} is called a D-ary

The average (expected) length of the code C is defined as

We see that we can reduce the average length of the code by

However, we can not just arbitrarily assign codewords of small

Non-singular codes: A code C is nonsingular if every symbol from

Uniquely decodable codes: A code C is uniquely decodable if a

In an instantaneous code, a symbol ui can be decoded as soon as

Figure: Example of codes