Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

PRINCIPLES OF COMMUNICATIONS

SOURCE CODING
(MÃ HÓA NGUỒN)
Instructor:
Name: Đoàn Bảo Sơn
Office: Faculty of Electrical - Electronics Engineering
Phone: 0913 706061
Email: sondb@vaa.edu.vn

1
1. Introduction to Information Theory
1.1. Introduction

Two fundamental concepts of communication systems: source


coding and channel coding (Claude Shannon)
The theoretical limits of lossless source coding (exact message
reconstruction of the source)
The reachable rates for a given transmission channel due to
channel coding

2
1. Introduction to Information Theory
1.2. Review of probabilities

X be an experiment or an observation that can be repeated


under similar circumstances several times
The result of this observation is an event denoted as x, which
can take several possible outcomes. The set of these values is
denoted as AX
The result X = x of this observation is not known before it takes
place. X is consequently called a random variable.
Two classes of random variables can be distinguished as follows:
• discrete random variables, when the set of outcomes is discrete;
• continuous random variables, when their distribution functions
are continuous.

3
1. Introduction to Information Theory
1.2. Review of probabilities
1.2.1. Discrete random variables
X takes its values in a discrete set: AX
AX may be infinite (for instance, if AX = N) or finite with a size n, if AX
= {x1, x2, … , xn}
Each outcome is associated with an probability of occurrence PX =
{p1, p2, … , pn}:

For discrete random variables, the probability density fX (x)

δ(u): the Dirac function


4
1. Introduction to Information Theory
1.2. Review of probabilities
1.2.1. Discrete random variables
 Joint probability (Xác suất kết hợp)
X , Y : two discrete random variables
Set of possible outcomes: AX = {x1, x2, …, xn}, AY = {y1, y2, …, yn}
Pr(X = xi, Y = yj): joint probability of the events X = xi and Y = yj

 Marginal probability (Xác suất biên)

5
1. Introduction to Information Theory
1.2. Review of probabilities
1.2.1. Discrete random variables
 Conditional probability (Xác suất có điều kiện)

6
1. Introduction to Information Theory
1.2. Review of probabilities
1.2.1. Discrete random variables
 Independence
Two discrete random variables X and Y are independent

7
1. Introduction to Information Theory
1.2. Review of probabilities
1.2.2. Continuous random variables
• The random variable X is continuous if its cumulative distribution
function FX(x) is continuous
• FX(x) is related to the probability density

• Random variable mean:

• Nth moment:

8
1. Introduction to Information Theory
1.2. Review of probabilities
1.2.3. Random signals
 Signal x(t) is deterministic if the function t ↦ x(t) is perfectly known
 If the values taken by x(t) are unknown, the signal follows a
random process
 X(t): random variable; x(t): outcome of this random variable
 Probability density:

 The random process is stationary (dừng) if its probability density is


independent of time: fX(x, t) = fX(x) ∀t
 The mean of the random variable x(t) from the random process X:

9
1. Introduction to Information Theory
1.2. Review of probabilities
1.2.3. Random signals
 Autocorrelation function RXX(τ) of a random variable:

 The random process X is second-order stationary (dừng bậc 2) or


wide-sense stationary if, for any random signal x(t):
1) its mean mX(t) is independent of t;
2) its autocorrelation function verifies RXX(t1, t2) = RXX(t1 + t, t2 + t) ∀t.

 The power spectrum density γXX(f)


TF: Fourier transform

10
1. Introduction to Information Theory
1.2. Review of probabilities
1.2.3. Random signals
 Autocorrelation function RXX(τ) :

 Most random processes that are considered in digital


communications are second-order stationary and ergodic
• When the mean over time tends to the random process’ mean, the random
process is ergodic.
 For discrete signals: RXX(τ) is only defined at discrete times τ = nTe
⇒ Power spectrum density:

11
1. Introduction to Information Theory
1.3. Entropy and mutual information
1.3.1. A logarithmic measure of information
 Information associated with the event X = xi: h(xi)

12
1. Introduction to Information Theory
1.3. Entropy and mutual information
1.3.1. A logarithmic measure of information
 The quantity of information h(xi) associated with the realization of
the event X = xi should be equal to the logarithm of the inverse of
the probability Pr(X = xi)

 Unit of h(xi):
• binary logarithm: Shannon (Sh)
• natural logarithm: Natural unit (Nat)

13
1. Introduction to Information Theory
1.3. Entropy and mutual information
1.3.1. A logarithmic measure of information
 Example:
 a discrete source: bits (0 or 1)
 Pr(X = 0) = Pr(X = 1) = 1/2
 Quantity of information: X = 0 or X = 1

 If the source is generating a sequence composed of n independent


bits: 2n different sequences.
1
 Probability (these sequences can appear): 𝑛
2
 Quantity of information associated with the realization of a specific
sequence:
14
1. Introduction to Information Theory
1.3. Entropy and mutual information
 Example (cont.):
 Two events: X = xi and Y = yj.
 Quantity of information:

Pr (X = xi, Y = yj): the joint probability of the two events


 Quantity of information associated with the realization of the
event X = xi conditionally to the event Y = yj

15
1. Introduction to Information Theory
1.3. Entropy and mutual information
1.3.2. Mutual information
 Mutual information: the quantity of information that the
realization of the event Y = yj gives about the event X = xi
• difference between the quantity of information associated with the
realization of the event X = xi and the quantity of information associated
with the the realization of the event X = xi conditionally to the event Y = yj

 Two events are independent:


Pr(X = xi|Y = yj) = Pr(X = xi) ⇒ i(xi ; yj) = 0
 X = xi is equivalent to the event Y = yj:
Pr(X = xi|Y = yj) = 1 ⇒ i(xi ; yj) = h(xi)
16
1. Introduction to Information Theory
1.3. Entropy and mutual information
1.3.2. Mutual information

 Compared to h(xi), the mutual information i(xi ; yj) can be negative

17
1. Introduction to Information Theory
1.3. Entropy and mutual information
1.3.3. Entropy and average mutual information
 Source:
 random variable: X
 sample space: AX = {x1, x2, … , xn}
 probabilities PX = {p1, p2, … , pn}
 Entropy
 Average quantity of information entropy of the source: quantity of
information associated with each possible realization of the event
X = xi :

18
1. Introduction to Information Theory
1.3. Entropy and mutual information
1.3.3. Entropy and average mutual information
 Entropy (cont.)
 H(X) is a measure of the uncertainty on X
 Properties:

 Entropy is maximum: pi are the same


 Example: source: 2 states x0; x1 with p0; p1

19
1. Introduction to Information Theory
1.3. Entropy and mutual information
 Entropy (cont.)
 Two random variables: X, Y
AX = {x1, x2, … , xn}; AY = {y1, y2, … , ym}
 Joint entropy: H(X, Y)

 Conditional entropy: H(X/Y)

20
1. Introduction to Information Theory
1.3. Entropy and mutual information
 Entropy (cont.)

 Mutual information:

 Relations between entropies and average mutual information

21
1. Introduction to Information Theory
1.3. Entropy and mutual information
1.3.4. Differential entropy
 Continuous random variable: X;
 Density probability: p(x)
 Differential entropy: HD(X)

⇒ HD(X): quantity of information of X.

22
1. Introduction to Information Theory
1.4. Lossless source coding theorems
1.4.1. Introduction :
Source coding

lossless source coding lossy source coding

entropy coding
Digital sequence delivered by the source with the shortest sequence
of symbols with the ability to reconstruct it by the source decoder

23
1. Introduction to Information Theory
1.4. Lossless source coding theorems
1.4.2. Entropy and source redundancy
 Source: discrete, stationary
 Output symbols: Q-ary symbols
 Output: random variable X
 Entropy: H(X)
average quantity of information per symbol at the output of the source
 If the source is memoryless (output symbols: de-correlated):

HMAX = log2Q
 If the source is memory:

HJ(X): entropy per group of J symbols 24


1. Introduction to Information Theory
1.4. Lossless source coding theorems
1.4.2. Entropy and source redundancy (cont.)
 Redundancy of the source: Rred
• difference between the quantity of information of the source and the
quantity of a source with equiprobable (xác xuất bằng nhau) symbols

 Rreds: 0 → 1 (entropy of this source is zero)

25
1. Introduction to Information Theory
1.4. Lossless source coding theorems
1.4.3. Fundamental theorem of source coding
THEOREM 1.1. (Shannon):
Let ∊ > 0, for all stationary source with entropy per symbol H(X), there
is a binary source coding method that associates with each message x
of length N a binary word of average length NRmoy such that:

We can associate, on average NH(X) bits, with each message x.


Rmoy: rate or average bits per realization
𝟐
𝑹𝒎𝒐𝒚 ≤ 𝑯 𝑿 +∈+∈ log 𝟐 𝑨𝑿 + = 𝑯 𝑿 +∈𝒓
𝑵

26
1. Introduction to Information Theory
1.4. Lossless source coding theorems
1.4.4. Lossless source coding
 Introduction

 Output of source coder are bits (q = 2)


 The source coding should satisfy the two following criteria:
– unique coding: each message should be coded with a different
word;
– unique decoding: each word should be distinguished without
ambiguity. This criterion can be obtained using:
- coding by fixed length word,
- coding using a distinguishable separable symbol (Morse system for example),
- coding with words of variable length. 27
1. Introduction to Information Theory
1.4. Lossless source coding theorems
1.4.4. Lossless source coding
 Variable length coding
Example: Discrete source generate 4 different messages a1, a2, a3, a4
with their respective probabilities Pr(a1) = 1/2, Pr(a2) = 1//4, Pr(a3) =
Pr(a4) = 1/8.
Variable length code

28
1. Introduction to Information Theory
1.4. Lossless source coding theorems
1.4.4. Lossless source coding
 Variable length coding
Example 1: Discrete source generate 4 different messages a1, a2, a3, a4
with their respective probabilities Pr(a1) = 1/2, Pr(a2) = 1//4, Pr(a3) =
Pr(a4) = 1/8.
Variable length code
- Satisfy the criterion of unique coding
- Not allow a unique decoding
- Ex: a1, a2, a1 = 1001 (coding)
- At the receiver, decoding: a1, a2, a1 or a4, a3 ?
⇒ unusable

29
1. Introduction to Information Theory
1.4. Lossless source coding theorems
 Variable length coding (cont.)
Example 2:
- Satisfy both unique coding and decoding
- Not instantaneous (tức thời)
- Ex: a3 is the beginning of a4
- After the sequence 11 → determine the parity
of the number of zeros → decode
⇒ more complex (decoding)

30
1. Introduction to Information Theory
1.4. Lossless source coding theorems
 Variable length coding (cont.)
Example 3:

- Satisfy both unique coding and decoding


- Instantaneous (tức thời)

Tree associated with the source code


31
1. Introduction to Information Theory
1.4. Lossless source coding theorems
 Kraft inequality
THEOREM 1.2.
An instantaneous code composed of Q binary words of length {n1, n2,
… , nQ}, respectively, with n1 ≤ n2 ≤ … ≤ nQ should satisfy the following
inequality:

32
1. Introduction to Information Theory
1.4. Lossless source coding theorems
 Fundamental theorem of source coding
 Memoryless source: entropy per symbol H(X)
 It is possible to build an instantaneous code for which the average
length of the words Rmoy satisfy the following inequality:

 Fundamental theorem of source coding


THEOREM 1.3: For all stationary sources with entropy per symbol
H(X), there is a source coder that can encode the message into binary
words with average length Rmoy as close as possible to H(X).

33
1. Introduction to Information Theory
1.4. Lossless source coding theorems
 Entropy rate
 X: stationary and discrete source
 Entropy per symbol: H(X) [Sh/symbol]
 Symbols per second: DS
 Entropy rate: DI

 Binary data rate at the output: D’B

 Output of the binary source encoder: entropy per H’(X)

34
1. Introduction to Information Theory
1.4. Lossless source coding theorems
 Entropy rate (cont.)
 From the theorem of source coding:

Rmoy ≥ H(X) ⇒ DS.Rmoy ≥ DS. H(X) ⇒

• In case of equality, one bit will carry the quantity of information


of one Shannon
• If the redundancy of the output sequence is not zero, then one
bit will carry less than one Shannon.

35
1. Introduction to Information Theory
1.5. Theorem for lossy source coding
1.5.1. Definitions
words composed of
x NR bits x
ENCODER DECODER

Block diagram of the coder-decoder


 Binary word: R bits (x ∊ 𝘟)
 Source decoder: 2R possible binary words (x ∊𝑋 𝑁 )
 x: quantized or estimated sequence

36
1. Introduction to Information Theory
1.5. Theorem for lossy source coding
1.5.1. Definitions
DEFINITION 1.1. The distortion per dimension between the sequences
x and 𝑥 of dimension N:
𝟏
𝒙−𝒙 𝟐
𝑵
DEFINITION 1.2. The average distortion per dimension of the coder-
decoder
𝟏 𝟐𝒇
𝑫𝑵 = 𝒙−𝒙 𝒙 𝒅𝒙
𝑵
𝑿

37
1. Introduction to Information Theory
1.5. Theorem for lossy source coding
1.5.1. Definitions (cont.)
DEFINITION 1.3. A pair (R, D) is said to be achievable if there is a
coder-decoder:
lim 𝑫𝑵 ≤ 𝑫
𝒏→∞
DEFINITION 1.4. For a given memoryless source, the rate distortion
function R(D):
𝟐
𝑹 𝑫 = min 𝑰 𝑿; 𝑿 |𝑬 𝑿 − 𝑿 ≤𝑫
𝒑 𝒙|𝒙

38
1. Introduction to Information Theory
1.5. Theorem for lossy source coding
1.5.2. Lossly source coding theorem
THEOREM 1.4. The minimum number of bits per dimension R allowing
to describe a sequence of real samples with a given average distortion
D should be higher or equal to R(D).

If the source is Gaussian:

𝜎𝑥 2 : source variance

D(R): distortion rate function


39
1. Introduction to Information Theory
1.6. Transmission channel models
1.6.1. Binary symmetric channel
 input and the output of the channel is binary
 using a single parameter

 The labels on the branches are the conditional probabilities Pr(Y =


y|X = x).
 Character:

40
1. Introduction to Information Theory
1.6. Transmission channel models
1.6.1. Binary symmetric channel
 input and the output of the channel is binary
 using a single parameter

 The labels on the branches are the conditional probabilities Pr(Y =


y|X = x).
- p: inversion probability
 Character:
- Priori probabilities (xác suất
tiên nghiệm):
Pr(X = 0) = q
Pr(X = 1) = 1 − q.
41
1. Introduction to Information Theory
1.6. Transmission channel models
1.6.1. Binary symmetric channel (cont.)
 The binary symmetric channel is memoryless
 x = [x0, x2, … , xn-1], and y = [y0, y1, … , yn-1]

 Conditional entropy: H(Y|X)

42
1. Introduction to Information Theory
1.6. Transmission channel models
1.6.2. Binary erasure channel
 Some bits can be lost or erased
 Compared to the binary symmetric channel, we add an event Y = ∊
corresponding to the case where a transmitted bit has been erased
 Character:

 p: erasure probability
 Diagram:

43
1. Introduction to Information Theory
1.7. Capacity of a transmission channel
Problem:
 Transmit equiprobable bits: Pr(X = 0) = Pr(X = 1) =1/2
 Binary rate: 1000 bits per second
 Binary symmetric channel: p = 0.1
What is the maximum information rate that can be transmitted ?

44
1. Introduction to Information Theory
1.7. Capacity of a transmission channel
1.7.1. Capacity of a transmission channel
DEFINITION: capacity of a transmission channel
𝑪 = 𝐦𝐚𝐱𝑰(𝑿: 𝒀)
 The capacity is the maximum of average mutual information
 [C] = Shannon/symbol; Shannon/second
 Capacity per symbol
C ’ = C x DS ; DS: symbol rate of th source
 Channel is noiseless
C = HMAX(X) = log2Q
 Channel is noisy:
C < HMAX(X)
To compute the capacity of a transmission channel → calculate the
average quantity of information that is lost in the channel

45
1. Introduction to Information Theory
1.7. Capacity of a transmission channel
1.7.1. Capacity of a transmission channel (cont.)
 H(X|Y): measure of the residual uncertainty on X knowing Y
 Good transmission: H(X|Y) = zero or negligible (không đáng kể)
 H(X|Y): average quantity of information lost in the channel
 Noiseless channel:
H(X|Y) = H(X|X) = 0 ⇒ C = HMAX(X)
 Noisy channel: X and Y are independent
H(X|Y) = H(X) ⇒ C = 0

Case C = HMAX(X)

Case C = 0
46
1. Introduction to Information Theory
1.7. Capacity of a transmission channel
1.7.1. Capacity of a transmission channel (cont.)
 Communication system with channel coding:

 Average mutual information:


I(U;V) = H(U) − H(U|V)
 Channel coding: H(U|V) as low as desired
 H(U) < H(X) (redundancy added by the channel coding)

47
1. Introduction to Information Theory
1.7. Capacity of a transmission channel
1.7.2. Fundamental theorem of channel coding
 Channel coding: error rate as low as desired
 Average quantity of information entering the block channel coder
channel-channel decoder is less than the capacity C of the channel
H(U) < C
 C = maxI(X; Y): highest number of information bits that can be
transmitted through the channel with an error rate as low as
desired

48
1. Introduction to Information Theory
1.7. Capacity of a transmission channel
1.7.3. Capacity of the binary symmetric channel
 Average mutual information:

H2(p): independent of q ⇒ I(X;Y) max ⇒ H(Y) max ⇒ Pr(Y = 0) = Pr(Y =


1) = 1/2 i.e. Pr(X = 0) = Pr(X = 1) = q = 1/2
 Capacity of the binary symmetric channel:

49
1. Introduction to Information Theory
1.7. Capacity of a transmission channel
1.7.4. Capacity of erasure channel
 Entropy: H(X|Y) = pH2(q)
 Average mutual information:

 q = 0.5 ⇒ I(X;Y) max ⇒ H2(q) = 1


 Capacity of channel:
C=1-p

50
2. Source Coding
2.1. Introduction
 Source coding: lossless source coding and lossy source coding
 Lossless source coding (entropy coding): shortest sequence of
symbols (bits) → perfect reconstruction (decoder)
 Lossy source coding: minimize a fidelity criterion (constraint on the
binary rate)
 Implement lossless source coding: Huffman’s algorithm; Lempel–
Ziv coding

51
2. Source Coding
2.2. Algorithms for lossless source coding
2.2.1. Run length coding
 Run length coding (RLC): exploiting the repetition between
consecutive symbols
 Source sequence: many identical successive symbols
 Couples (number of identical consecutive symbols, symbol)
 Example:

 Solution: add a prefix only when the number of identical


consecutive symbols is higher than 1

 Add an additional symbol: indicate the position of the repeated symbols


52
2. Source Coding
2.2. Algorithms for lossless source coding
2.2.2. Huffman’s algorithm
 Huffman (1952): variable length source coding algorithm
 Messages: L
 Minimum average number of bit per word (uniquely decoding and
instantaneous)
 Algorithm: L messages from the top to the bottom following a
decreasing probability order (each message is associated with a
node)
1) Choose the two nodes with the lowest probabilities.
2) Connect these two nodes together: the upper branch is labeled with
0, while the lower branch is labeled with 1.
3) Sum the two probabilities associated with the new node.
4) Suppress the two nodes/messages previously selected and return to
phase 1.
53
2. Source Coding
2.2. Algorithms for lossless source coding
2.2.2. Huffman’s algorithm (cont.)
 Example:
A discrete source composed of eight messages: a1, a2, a3, a4, a5, a6, a7,
a8, with the associated probabilities {0.16; 0.15; 0.01; 0.05; 0.26; 0.11;
0.14; 0.12}.
- Entropy of this source: H(X) = 2.7358.
- Huffman’s coding:

54
2. Source Coding
2.2. Algorithms for lossless source coding
2.2.2. Huffman’s algorithm (cont.)
 Example:
- Huffman’s encoding table:
- Note:
 Huffman’s algorithm: optimal source coding under
the restriction that the probabilities of the message
are 2-m (1/2; 1/4; ….)
 When the successive symbols are correlated:
Group many symbols together to constitute the messages
⇒ complexity
- Use: image compression or audio compression
(JPEG, MP3, …)

55
2. Source Coding
2.2. Algorithms for lossless source coding
2.2.2. Arithmetic coding
 Rissanen (1976), Pasco (1976)
 Source coding without any a priori knowledge of the statistics of the
source (memoryless or with memory)
 Principle: associate with each binary sequence an interval on the
segment [0; 1[

 Example:
0111 → [0.0111; 0.1000[ in binary or [0.4375; 0.5[ in decimal
 The longer the sequence, the smaller the associated interval

56
2. Source Coding
2.2. Algorithms for lossless source coding
2.2.3. Algorithm LZ78
 Lempel and Ziv (1978)
 Algorithm: uses a dictionary
 Dictionary: a pair composed of a pointer or index on a previous
element of the dictionary and a symbol
 Each element of the dictionary will be related to a string of symbols

57
2. Source Coding
2.2. Algorithms for lossless source coding
2.2.3. Algorithm LZ78 (cont.)
 Example:
• Binary sequence:
001000001100010010000010110001000001000011000101010000
100000110000010110000
• We first find the shortest string that we have not yet found starting
from the left.
0,01000001100010010000010110001000001000011
• The second string different from 0 is 01
0,01,000001100010010000010110001000001000011
• The third string different from 0 and 01 is 00
0,01,00,0001100010010000010110001000001000011
• Finally, the sequence can be decomposed as follows:
0, 01, 00, 000, 1, 10, 001, 0010, 0000, 101, 100, 010, 00001, 000011
58
2. Source Coding
2.2. Algorithms for lossless source coding
2.2.3. Algorithm LZ78 (cont.)
 Example:
• Finally, the sequence can be decomposed as follows:
0, 01, 00, 000, 1, 10, 001, 0010, 0000, 101, 100, 010, 00001, 000011
• Dictionary of strings

• Line 1: index of the strings


• Line 2: strings
• Line 3: pair index-symbol → encoding
• Ex: 0010 → 7-0 (001 → index 7)
• Encoded sequence:
0-0, 1-1, 1-0, 3-0, 0-1, 5-0, 3-1, 7-0, 4-0, 6-1, 6-0, 2-0, 9-1, 13-159
2. Source Coding
2.2. Algorithms for lossless source coding
 Example:
• Encoded sequence:
0-0, 1-1, 1-0, 3-0, 0-1, 5-0, 3-1, 7-0, 4-0, 6-1, 6-0, 2-0, 9-1, 13-1
 Tree associated with the strings memorized in the dictionary
• Node: string [adding a 0 or a 1 (label on the branch) to a previous
string]
 Binary encoded sequence:
0-0, 1-1, 01-0, 11-0, 000-1, 101-0, 011-1,
111-0, 0100-0, 0110-1, 0110-0, 0010-0,
1001-1, 1101-1
 Note:
- 2 strings: index of length 1,
- 2 other strings: index of length 2,
- 22 strings: index of length 3,
- 23 strings: index of length 4, etc. 60

You might also like