Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

EE462: AI Enabled

Digital Communications
Week # 3

1
Last time, we talked about:
Fundamental of Set theory, Bayes' Law and it applications on
Binary Communication Channel, and fundamentals of Channel
Coding in terms of repetition codes.
We will explore Channel Coding aspects later again.

Today, we are going to talk about:


Source Coding Theory
• What is the minimum number of bits that are required to transmit
a particular symbol?
• How can we encode symbols so that we achieve (or at least
come arbitrarily close to) this limit?
• Source encoding: concerned with minimizing the actual number
of source bits that are transmitted to the user 2
Model of a Digital Communication System

3
What is Information Theory
• C. E. Shannon, ”A mathematical theory of
communication,” Bell System Technical
Journal, 1948.
• Two fundamental questions in
communication theory:
– What is the ultimate limit on data compression?
– What is the ultimate transmission rate of reliable
communication over noisy channels?
• Shannon showed reliable (i.e., error-free)
communication is possible for all rates below
channel capacity (using channel coding).
• Any source can be represented in bits at any
rate above entropy (using source coding).
4
– Rise of digital information technology
What is Information?
• Information: any new knowledge about
something
• How can we measure it?
• Messages containing knowledge of a high probability
of occurrence → Not very informative
• Messages containing knowledge of low probability
of occurrence → More informative
• A small change in the probability of a certain output
should not change the information delivered by that
output by a large amount.

5
Definition
• The amount of information of a symbol s with
probability p:
1
I(s) = log
p
• Properties:
– p = 1 → I(s) = 0: a symbol that is certain to occur contains no
information.
– 0 ≤ p ≤ 1 → 0 ≤ I(s) ≤ ∞: the information measure is monotonic and
non-negative.
– p = p1  p2 → I(s) = I(p1) + I(p2): information is additive for
statistically independent events.

• In communications, log base 2 is commonly used,


resulting in unit bit
6
Example
• Example: Two symbols with equal probability p1 = p2 = 0.5
• Each symbol represents I(s) = log2(1/0.5) = 1 bit of
information.

1
• In summary: I (s) = log 2 bits
p

• Reminder: From logs of base 10 to logs of base 2:


log10 x
log2 x = = 3.32 log10 x
log10 2

7
Discrete Memoryless Source
• Suppose we have an information source emitting a
sequence of symbols from a finite alphabet:
S = {s1, s2, . . . , sK}

• Discrete memoryless source: The successive symbols


are statistically independent (i.i.d.)

• Assume that each symbol has a probability of


occurrence
K
pk , k = 1,..., K, such that  pk = 1
k =1

8
Source Entropy
• If symbol sk has occurred, then, by definition, we have received
1
I(sk ) = log2 = −log2 pk
pk
bits of information.

• The expected value of I(sk) over the source alphabet


K K

E{I (sk )} =  pk I (sk ) = −  pk log2 pk


k =1 k =1

• Source entropy: the average amount of information per source


symbol:
K
H (S) = −  pk log2 pk
k =1

• Units: bits / symbol.

9
Meaning of Entropy
• What information about a source does its entropy give us?
• It is the amount of uncertainty before we receive it.
• It tells us how many bits of information per symbol we
expect to get on the average.

Example: Entropy of a Binary Source


• s1: occurs with probability p
• s0: occurs with probability 1 − p.
• Entropy of the source:
H (s) = −(1 − p) log 2 (1 − p) − p log 2 p
= H ( p)

– H(p) is referred to as the


entropy function. 10
Example: A Three-Symbol Alphabet
– A: occurs with probability 0.7
– B: occurs with probability 0.2
– C: occurs with probability 0.1

• Source entropy:
H (S) = - 0.7 log2 (0.7) - 0.2 log2 (0.2) - 0.1log2 (0.1)
= 0.7  0.515 + 0.2  2.322 + 0.1 3.322
1.157 bits/symbol
• How can we encode these symbols in order to
transmit them?
• We need 2 bits/symbol if encoded as
A = 00, B = 01, C = 10
• Entropy prediction: the average amount of information is
only 1.157 bits per symbol.
• We are wasting bits!
11
Average Codeword Length
• lk: the number of bits used to code the k-th symbol
• K: total number of symbols
• pk: the probability of occurrence of symbol k
• Define the average codeword length:
K
L =  p k lk
k =1
• It represents the average number of bits per symbol in the
source alphabet.
• An idea to reduce average codeword length: symbols that
occur often should be encoded with short codewords;
symbols that occur rarely may be encoded using the long
codewords.

12
Minimum Codeword Length
• What is the minimum codeword length for a particular
alphabet of source symbols?
• In a system with 2 symbols that are equally likely:
– Probability of each symbol to occur: p = 1/2
– Best one can do: encode each with 1 bit only: 0 or 1

• In a system with n symbols that are equally likely:


Probability of each symbol to occur: p = 1/n
• One needs L = log2 n = log2 1/p = − log2 p bits to represent the
symbols.

14
The General Case
• S = {s1, . . . , sK}: an alphabet
• pk: the probability of symbol sk to occur
• N : the number of symbols generated
• We expect Npk occurrences of sk
• Assume:
– The source is memoryless
– all symbols are independent
– the probability of occurrence of a typical sequence SN is:

𝑁𝑝𝐾 𝑁𝑝𝐾 𝑁𝑝𝐾


𝑝 𝑆𝑁 = 𝑝𝐾 × 𝑝𝐾 ×… 𝑝𝐾

• All such typical sequences of N symbols are equally likely


• All other compositions are extremely unlikely to occur as N
→ ∞ (Shannon, 1948).
15
Source Coding Theorem
• The number of bits required to represent a typical
sequence SN is
1
LN = log 2 = − log 2 ( p1Np1 p Np
2
2
... p K NpK )
p(S N )
= −Np1 log 2 p1 − Np2 log 2 p2 − ... − NpK log 2 pK
K
= −N  pk log 2 pk = NH (S )
k =1

• Average length for one symbol: L = LN = H (S ) bits / symbol


N

• Given a discrete memoryless source of entropy H(S),


the average codeword length for any source coding
scheme is bounded by H(S):
L  H (S)

16
Huffman Coding
• How one can designan efficient source coding
algorithm?
• Use Huffman Coding (among other algorithms)
• It yields the shortest average codeword length.
• Basic idea: choose codeword lengths so that more-
probable sequences have shorter codewords
• Huffman Code construction
– Sort the source symbols in order of decreasing probability.
– Take the two smallest p(xi) and assign each a different bit (i.e., 0 or 1).
Then merge into a single symbol.
– Repeat until only one symbol remains.
• It’s very easy to implement this algorithm in a
microprocessor or computer.
• Used in JPEG, MP3…
17
Optimal Codes
(Shortest Average Length)
The procedure of constructing this code is as follows:
1. At each stage, reduce to a shorter code by combining the least
probable symbols of the source alphabet into a single symbol whose
probability is equal to the sum of the two corresponding probabilities.
So, we have to encode a source alphabet of one less symbol.
2. Repeat the above procedure step by step until you get to a problem
of encoding two symbols. At this stage, one symbol is encoded as 1
and the other as 0.
3. Go back, one of these symbols (1 or 0 ) is split into two symbols.
4. Repeat step 3. One of the three symbols are split into two. Repeat
until you finish.
The above procedure is called Huffman Coding procedure.
The following two tables are the source reduction and code construction
processes. 18
Original Source Reduced Source Reduced Source Reduced Source
(stage 1) (stage 2) (stage 3)
sj P(sj) s1j P(s1j) s2j P(s2j) s3j P(s3j)
s1 0.3
s2 0.25
s3 0.25
s4 0.1
s5 0.1

Original Source Reduced Source Reduced Source Reduced Source


(stage 1) (stage 2) (stage 3)
sj C.W. s1j C.W. s2j C.W. s3j C.W.
s1 0.3
s2 0.25
s3 0.25
s4 0.1
s5 0.1
19
The optimal code (uniquely decodable) is { 00, 01, 10, 110, 111}. The
average length is

The efficiency of this code


 = H(s)/the average length = 2.1855/2.2 = 99.34 %.

The above code is optimal in the sense that we can not find shorter than
this code. Notice the Huffman coding procedure is not unique, but the
average length is unique.

20
Example: Consider the discrete memory less information source with
alphabet S = { s1, s2, s3, s4, s5 }. The corresponding probabilities
P = {0.4, .2, .2, .1, .1}. Following the above procedure, the Huffman code
is { 1, 01, 000, 0010, 0011}. The average length is

The efficiency of this code

 = H(s)/the average length = 2.1219/2.2 = 96.45 %.

21
Extended Huffman Codes
To illustrate what we mean by extended Huffman codes, let us consider
the previous example by taking the second extension.
In this case the number of alphabet is 25 symbols.
S2 = {s1s1, s1s2, s1s3, s1s4, s1s5 , s2s1 , s2s2 , s2s3 ,. …….., s5s5 }.
The corresponding probabilities
P = {0.16, .08, .08, .04, .04, .08, .04, .04, ……, .01}.
Again, if we follow the procedure of constructing the Huffman code, we
come up with the following codewords: ( See the table ).
The average length for this code is 4.32 bits per extended symbol and
the entropy H = 4.2439 bits/ extended symbol.
The average length/bit = 2.16 bits/symbol. While H(S) = 2.12185 .
 = H(S)/the average length = 2.12185/2.16 = 98.24 %.
The efficiency can be further improved by encoding higher extension
22
of
the source. Of course, the price we paid is the complexity of encoding a
larger symbol set.
probability Codeword probability Codeword probability Codeword

0.16 11 0.02 011001 0.02 000101


0.08 0000 0.02 011000 0.02 000100
0.08 0011 0.08 101 0.01 0111101
0.04 01110 0.04 01010 0.01 0111100
0.04 01001 0.04 10001 0.04 1001
0.08 0010 0.02 011011 0.02 000111
0.04 01000 0.02 011010 0.02 000110
0.04 01011 0.04 10000 0.01 0111111
0.01 0111110

Notice here, the probabilities are not sorted in a descending order.

23
Special Case of Huffman Code
If all the symbols are equally likely and if there are 2M symbols, then
the binary Huffman code will be a fixed length code with all symbols
having the same length, M. The following example shows this fact.
Original Source Reduced Source (stage 1) Reduced Source (stage 2)

sj P(sj) C.W. s1j P(s1j) C.W s2j P(s2j) C.W.


s1 0.25 10 s11 0.5 0 s21 0.5 0
s2 0.25 11 s12 0.25 10 s22 0.5 1
s3 0.25 00 s13 0.25 11
s4 0.25 01

Therefore, the Huffman code is a fixed length code of average length


of two. The efficiency is 100%.
24
Huffman Codes with radix r

The procedure can be illustrated by the following example.


Example: Consider the discrete memory less information source with alphabet S =
{ s1, s2, s3, s4, s5 ,s6, s7, s8 }. The corresponding probabilities
P = {0.22, .20, .18, .15, .10, .08, .05, .02}. Find the Huffman code of radix four.

Original Source Reduced Source Reduced Source


(stage 1) (stage 2)
sj P(sj) s1j P(s1j) s2j P(s2j)
s1 0.22 s11 0.22 s2 1 0.40
s2 0.20 s1 2 0.20 s2 2 0.22
s3 0.18 s13 0.18 s23 0.20
s4 0.15 s14 0.15 s24 0.18
s5 0.10 s15 0.10
s6 0.08 s16 0.08
s7 0.05 s17 0.07
s8 0.02
25
Original Source Reduced Source (stage 1) Reduced Source (stage 2)

sj P(sj) C.W. s1j P(s1j) C.W. s2j P(s2j) C.W


s1 0.22 1 s11 0.22 1 s21 0.40 0
s2 0.20 2 s1 2 0.20 2 s2 2 0.22 1
s3 0.18 3 s1 3 0.18 3 s23 0.20 2
s4 0.15 00 s1 4 0.15 00 s24 0.18 3
s5 0.10 01 s1 5 0.10 01
s6 0.08 02 s16 0.08 02
s7 0.05 030 s1 7 0.07 03
s8 0.02 031

Again, this is not unique. There are other ways to encode these
symbols.
26
Huffman code with maximum efficiency

When the symbols probabilities are negative powers of two, then the efficiency of
the Huffman is maximum. For example, the Huffman for the source of probabilities
{ ½ , ¼ , 1/8 , 1/16 , 1/16 } is
{ 0, 10, 110, 1111, 1110 } and the average length is 1.875 bits/symbol. The entropy
is 1.875 bits/symbol. Therefore, efficiency,  = H(S)/the average length = 100 %.

27
Application:
File
Compression

• A drawback of Huffman coding is that it requires knowledge of a


probabilistic model, which is not always available a prior.
• Lempel-Ziv coding overcomes this practical limitation and has become
the standard algorithm for file compression.
– compress, gzip, GIF, TIFF, PDF, modem…
– A text file can typically be compressed to half
of its original size.
28
Project 1: CLO -1
(10 Marks)
Summary
• The entropy of a discrete memoryless information source
K
H (S) = − pk log2 pk
k =1

• Entropy function (entropy of a binary memoryless source)


H (S ) = −(1− p) log2 (1− p) − p log2 p = H ( p)

• Source coding theorem: The minimum average codeword


length for any source coding scheme is H(S) for a discrete
memoryless source.

• Huffman coding: An efficient source coding algorithm.

30

You might also like