Source Coding (Mã Hóa Ngu N)

PRINCIPLES OF COMMUNICATIONS
SOURCE CODING
(MÃ HÓA NGUỒN)
Instructor:
Name: Đoàn Bảo Sơn
Office: Faculty of Electrical - Electronics Engineering
Phone: 0913 706061
Email: sondb@vaa.edu.vn
1
1. Introduction to Information Theory
1.1. Introduction
Two fundamental concepts of communication systems: source

coding and channel coding (Claude Shannon)
The theoretical limits of lossless source coding (exact message
reconstruction of the source)
The reachable rates for a given transmission channel due to
channel coding
2
1.2. Review of probabilities
X be an experiment or an observation that can be repeated

under similar circumstances several times
The result of this observation is an event denoted as x, which
can take several possible outcomes. The set of these values is
denoted as AX
The result X = x of this observation is not known before it takes
place. X is consequently called a random variable.
Two classes of random variables can be distinguished as follows:
• discrete random variables, when the set of outcomes is discrete;
• continuous random variables, when their distribution functions
are continuous.
3
1.2.1. Discrete random variables
X takes its values in a discrete set: AX
AX may be infinite (for instance, if AX = N) or finite with a size n, if AX
= {x1, x2, … , xn}
Each outcome is associated with an probability of occurrence PX =
{p1, p2, … , pn}:
For discrete random variables, the probability density fX (x)
δ(u): the Dirac function

4
 Joint probability (Xác suất kết hợp)
X , Y : two discrete random variables
Set of possible outcomes: AX = {x1, x2, …, xn}, AY = {y1, y2, …, yn}
Pr(X = xi, Y = yj): joint probability of the events X = xi and Y = yj
 Marginal probability (Xác suất biên)
5
 Conditional probability (Xác suất có điều kiện)
6
 Independence
Two discrete random variables X and Y are independent
7
1.2.2. Continuous random variables
• The random variable X is continuous if its cumulative distribution
function FX(x) is continuous
• FX(x) is related to the probability density
• Random variable mean:
• Nth moment:
8
1.2.3. Random signals
 Signal x(t) is deterministic if the function t ↦ x(t) is perfectly known
 If the values taken by x(t) are unknown, the signal follows a
random process
 X(t): random variable; x(t): outcome of this random variable
 Probability density:
 The random process is stationary (dừng) if its probability density is

independent of time: fX(x, t) = fX(x) ∀t
 The mean of the random variable x(t) from the random process X:
9
 Autocorrelation function RXX(τ) of a random variable:
 The random process X is second-order stationary (dừng bậc 2) or

wide-sense stationary if, for any random signal x(t):
1) its mean mX(t) is independent of t;
2) its autocorrelation function verifies RXX(t1, t2) = RXX(t1 + t, t2 + t) ∀t.
 The power spectrum density γXX(f)

TF: Fourier transform
10
 Autocorrelation function RXX(τ) :
 Most random processes that are considered in digital

communications are second-order stationary and ergodic
• When the mean over time tends to the random process’ mean, the random
process is ergodic.
 For discrete signals: RXX(τ) is only defined at discrete times τ = nTe
⇒ Power spectrum density:
11
1.3. Entropy and mutual information
1.3.1. A logarithmic measure of information
 Information associated with the event X = xi: h(xi)
12
 The quantity of information h(xi) associated with the realization of
the event X = xi should be equal to the logarithm of the inverse of
the probability Pr(X = xi)
 Unit of h(xi):
• binary logarithm: Shannon (Sh)
• natural logarithm: Natural unit (Nat)
13
 Example:
 a discrete source: bits (0 or 1)
 Pr(X = 0) = Pr(X = 1) = 1/2
 Quantity of information: X = 0 or X = 1
 If the source is generating a sequence composed of n independent

bits: 2n different sequences.
1
 Probability (these sequences can appear): 𝑛
2
 Quantity of information associated with the realization of a specific
sequence:
14
 Example (cont.):
 Two events: X = xi and Y = yj.
 Quantity of information:
Pr (X = xi, Y = yj): the joint probability of the two events

 Quantity of information associated with the realization of the
event X = xi conditionally to the event Y = yj
15
1.3.2. Mutual information
 Mutual information: the quantity of information that the
realization of the event Y = yj gives about the event X = xi
• difference between the quantity of information associated with the
realization of the event X = xi and the quantity of information associated
with the the realization of the event X = xi conditionally to the event Y = yj
 Two events are independent:

Pr(X = xi|Y = yj) = Pr(X = xi) ⇒ i(xi ; yj) = 0
 X = xi is equivalent to the event Y = yj:
Pr(X = xi|Y = yj) = 1 ⇒ i(xi ; yj) = h(xi)
16
1.3.2. Mutual information
 Compared to h(xi), the mutual information i(xi ; yj) can be negative
17
1.3.3. Entropy and average mutual information
 Source:
 random variable: X
 sample space: AX = {x1, x2, … , xn}
 probabilities PX = {p1, p2, … , pn}
 Entropy
 Average quantity of information entropy of the source: quantity of
information associated with each possible realization of the event
X = xi :
18
1.3.3. Entropy and average mutual information
 Entropy (cont.)
 H(X) is a measure of the uncertainty on X
 Properties:
 Entropy is maximum: pi are the same

 Example: source: 2 states x0; x1 with p0; p1
19
 Entropy (cont.)
 Two random variables: X, Y
AX = {x1, x2, … , xn}; AY = {y1, y2, … , ym}
 Joint entropy: H(X, Y)
 Conditional entropy: H(X/Y)
20
 Entropy (cont.)
 Mutual information:
 Relations between entropies and average mutual information
21
1.3.4. Differential entropy
 Continuous random variable: X;
 Density probability: p(x)
 Differential entropy: HD(X)
⇒ HD(X): quantity of information of X.
22
1.4. Lossless source coding theorems
1.4.1. Introduction :
Source coding
lossless source coding lossy source coding
entropy coding
Digital sequence delivered by the source with the shortest sequence
of symbols with the ability to reconstruct it by the source decoder
23
1.4.2. Entropy and source redundancy
 Source: discrete, stationary
 Output symbols: Q-ary symbols
 Output: random variable X
 Entropy: H(X)
average quantity of information per symbol at the output of the source
 If the source is memoryless (output symbols: de-correlated):
HMAX = log2Q
 If the source is memory:
HJ(X): entropy per group of J symbols 24

1.4.2. Entropy and source redundancy (cont.)
 Redundancy of the source: Rred
• difference between the quantity of information of the source and the
quantity of a source with equiprobable (xác xuất bằng nhau) symbols

 Rreds: 0 → 1 (entropy of this source is zero)
25
1.4.3. Fundamental theorem of source coding
THEOREM 1.1. (Shannon):
Let ∊ > 0, for all stationary source with entropy per symbol H(X), there
is a binary source coding method that associates with each message x
of length N a binary word of average length NRmoy such that:
We can associate, on average NH(X) bits, with each message x.

Rmoy: rate or average bits per realization
𝟐
𝑹𝒎𝒐𝒚 ≤ 𝑯 𝑿 +∈+∈ log 𝟐 𝑨𝑿 + = 𝑯 𝑿 +∈𝒓
𝑵
26
1.4.4. Lossless source coding
 Introduction
 Output of source coder are bits (q = 2)

 The source coding should satisfy the two following criteria:
– unique coding: each message should be coded with a different
word;
– unique decoding: each word should be distinguished without
ambiguity. This criterion can be obtained using:
- coding by fixed length word,
- coding using a distinguishable separable symbol (Morse system for example),
- coding with words of variable length. 27
 Variable length coding
Example: Discrete source generate 4 different messages a1, a2, a3, a4
with their respective probabilities Pr(a1) = 1/2, Pr(a2) = 1//4, Pr(a3) =
Pr(a4) = 1/8.
Variable length code
28
 Variable length coding
Example 1: Discrete source generate 4 different messages a1, a2, a3, a4
with their respective probabilities Pr(a1) = 1/2, Pr(a2) = 1//4, Pr(a3) =
Pr(a4) = 1/8.
Variable length code
- Satisfy the criterion of unique coding
- Not allow a unique decoding
- Ex: a1, a2, a1 = 1001 (coding)
- At the receiver, decoding: a1, a2, a1 or a4, a3 ?
⇒ unusable
29
 Variable length coding (cont.)
Example 2:
- Satisfy both unique coding and decoding
- Not instantaneous (tức thời)
- Ex: a3 is the beginning of a4
- After the sequence 11 → determine the parity
of the number of zeros → decode
⇒ more complex (decoding)
30
 Variable length coding (cont.)
Example 3:
- Satisfy both unique coding and decoding

- Instantaneous (tức thời)
Tree associated with the source code

31
 Kraft inequality
THEOREM 1.2.
An instantaneous code composed of Q binary words of length {n1, n2,
… , nQ}, respectively, with n1 ≤ n2 ≤ … ≤ nQ should satisfy the following
inequality:
32
 Fundamental theorem of source coding
 Memoryless source: entropy per symbol H(X)
 It is possible to build an instantaneous code for which the average
length of the words Rmoy satisfy the following inequality:
 Fundamental theorem of source coding

THEOREM 1.3: For all stationary sources with entropy per symbol
H(X), there is a source coder that can encode the message into binary
words with average length Rmoy as close as possible to H(X).
33
 Entropy rate
 X: stationary and discrete source
 Entropy per symbol: H(X) [Sh/symbol]
 Symbols per second: DS
 Entropy rate: DI
 Binary data rate at the output: D’B
 Output of the binary source encoder: entropy per H’(X)
34
 Entropy rate (cont.)
 From the theorem of source coding:
Rmoy ≥ H(X) ⇒ DS.Rmoy ≥ DS. H(X) ⇒
• In case of equality, one bit will carry the quantity of information

of one Shannon
• If the redundancy of the output sequence is not zero, then one
bit will carry less than one Shannon.
35
1.5. Theorem for lossy source coding
1.5.1. Definitions
words composed of
x NR bits x
ENCODER DECODER
Block diagram of the coder-decoder

 Binary word: R bits (x ∊ 𝘟)
 Source decoder: 2R possible binary words (x ∊𝑋 𝑁 )
 x: quantized or estimated sequence
36
1.5.1. Definitions
DEFINITION 1.1. The distortion per dimension between the sequences
x and 𝑥 of dimension N:
𝟏
𝒙−𝒙 𝟐
𝑵
DEFINITION 1.2. The average distortion per dimension of the coder-
decoder
𝟏 𝟐𝒇
𝑫𝑵 = 𝒙−𝒙 𝒙 𝒅𝒙
𝑵
𝑿
37
1.5.1. Definitions (cont.)
DEFINITION 1.3. A pair (R, D) is said to be achievable if there is a
coder-decoder:
lim 𝑫𝑵 ≤ 𝑫
𝒏→∞
DEFINITION 1.4. For a given memoryless source, the rate distortion
function R(D):
𝟐
𝑹 𝑫 = min 𝑰 𝑿; 𝑿 |𝑬 𝑿 − 𝑿 ≤𝑫
𝒑 𝒙|𝒙
38
1.5.2. Lossly source coding theorem
THEOREM 1.4. The minimum number of bits per dimension R allowing
to describe a sequence of real samples with a given average distortion
D should be higher or equal to R(D).
If the source is Gaussian:
𝜎𝑥 2 : source variance
D(R): distortion rate function

39
1.6. Transmission channel models
1.6.1. Binary symmetric channel
 input and the output of the channel is binary
 using a single parameter
 The labels on the branches are the conditional probabilities Pr(Y =

y|X = x).
 Character:
40
1.6.1. Binary symmetric channel
 input and the output of the channel is binary
 using a single parameter
 The labels on the branches are the conditional probabilities Pr(Y =

y|X = x).
- p: inversion probability
 Character:
- Priori probabilities (xác suất
tiên nghiệm):
Pr(X = 0) = q
Pr(X = 1) = 1 − q.
41
1.6.1. Binary symmetric channel (cont.)
 The binary symmetric channel is memoryless
 x = [x0, x2, … , xn-1], and y = [y0, y1, … , yn-1]
 Conditional entropy: H(Y|X)
42
1.6.2. Binary erasure channel
 Some bits can be lost or erased
 Compared to the binary symmetric channel, we add an event Y = ∊
corresponding to the case where a transmitted bit has been erased
 Character:
 p: erasure probability
 Diagram:
43
1.7. Capacity of a transmission channel
Problem:
 Transmit equiprobable bits: Pr(X = 0) = Pr(X = 1) =1/2
 Binary rate: 1000 bits per second
 Binary symmetric channel: p = 0.1
What is the maximum information rate that can be transmitted ?
44
1.7.1. Capacity of a transmission channel
DEFINITION: capacity of a transmission channel
𝑪 = 𝐦𝐚𝐱𝑰(𝑿: 𝒀)
 The capacity is the maximum of average mutual information
 [C] = Shannon/symbol; Shannon/second
 Capacity per symbol
C ’ = C x DS ; DS: symbol rate of th source
 Channel is noiseless
C = HMAX(X) = log2Q
 Channel is noisy:
C < HMAX(X)
To compute the capacity of a transmission channel → calculate the
average quantity of information that is lost in the channel
45
1.7.1. Capacity of a transmission channel (cont.)
 H(X|Y): measure of the residual uncertainty on X knowing Y
 Good transmission: H(X|Y) = zero or negligible (không đáng kể)
 H(X|Y): average quantity of information lost in the channel
 Noiseless channel:
H(X|Y) = H(X|X) = 0 ⇒ C = HMAX(X)
 Noisy channel: X and Y are independent
H(X|Y) = H(X) ⇒ C = 0
Case C = HMAX(X)
Case C = 0
46
1.7.1. Capacity of a transmission channel (cont.)
 Communication system with channel coding:
 Average mutual information:

I(U;V) = H(U) − H(U|V)
 Channel coding: H(U|V) as low as desired
 H(U) < H(X) (redundancy added by the channel coding)
47
1.7.2. Fundamental theorem of channel coding
 Channel coding: error rate as low as desired
 Average quantity of information entering the block channel coder
channel-channel decoder is less than the capacity C of the channel
H(U) < C
 C = maxI(X; Y): highest number of information bits that can be
transmitted through the channel with an error rate as low as
desired
48
1.7.3. Capacity of the binary symmetric channel
H2(p): independent of q ⇒ I(X;Y) max ⇒ H(Y) max ⇒ Pr(Y = 0) = Pr(Y =

1) = 1/2 i.e. Pr(X = 0) = Pr(X = 1) = q = 1/2
 Capacity of the binary symmetric channel:
49
1.7.4. Capacity of erasure channel
 Entropy: H(X|Y) = pH2(q)
 q = 0.5 ⇒ I(X;Y) max ⇒ H2(q) = 1

 Capacity of channel:
C=1-p
50
2. Source Coding
2.1. Introduction
 Source coding: lossless source coding and lossy source coding
 Lossless source coding (entropy coding): shortest sequence of
symbols (bits) → perfect reconstruction (decoder)
 Lossy source coding: minimize a fidelity criterion (constraint on the
binary rate)
 Implement lossless source coding: Huffman’s algorithm; Lempel–
Ziv coding
51
2. Source Coding
2.2. Algorithms for lossless source coding
2.2.1. Run length coding
 Run length coding (RLC): exploiting the repetition between
consecutive symbols
 Source sequence: many identical successive symbols
 Couples (number of identical consecutive symbols, symbol)
 Example:
 Solution: add a prefix only when the number of identical

consecutive symbols is higher than 1
 Add an additional symbol: indicate the position of the repeated symbols

52
2. Source Coding
2.2.2. Huffman’s algorithm
 Huffman (1952): variable length source coding algorithm
 Messages: L
 Minimum average number of bit per word (uniquely decoding and
instantaneous)
 Algorithm: L messages from the top to the bottom following a
decreasing probability order (each message is associated with a
node)
1) Choose the two nodes with the lowest probabilities.
2) Connect these two nodes together: the upper branch is labeled with
0, while the lower branch is labeled with 1.
3) Sum the two probabilities associated with the new node.
4) Suppress the two nodes/messages previously selected and return to
phase 1.
53
2. Source Coding
2.2.2. Huffman’s algorithm (cont.)
 Example:
A discrete source composed of eight messages: a1, a2, a3, a4, a5, a6, a7,
a8, with the associated probabilities {0.16; 0.15; 0.01; 0.05; 0.26; 0.11;
0.14; 0.12}.
- Entropy of this source: H(X) = 2.7358.
- Huffman’s coding:
54
2. Source Coding
2.2.2. Huffman’s algorithm (cont.)
 Example:
- Huffman’s encoding table:
- Note:
 Huffman’s algorithm: optimal source coding under
the restriction that the probabilities of the message
are 2-m (1/2; 1/4; ….)
 When the successive symbols are correlated:
Group many symbols together to constitute the messages
⇒ complexity
- Use: image compression or audio compression
(JPEG, MP3, …)
55
2. Source Coding
2.2.2. Arithmetic coding
 Rissanen (1976), Pasco (1976)
 Source coding without any a priori knowledge of the statistics of the
source (memoryless or with memory)
 Principle: associate with each binary sequence an interval on the
segment [0; 1[
 Example:
0111 → [0.0111; 0.1000[ in binary or [0.4375; 0.5[ in decimal
 The longer the sequence, the smaller the associated interval
56
2. Source Coding
2.2.3. Algorithm LZ78
 Lempel and Ziv (1978)
 Algorithm: uses a dictionary
 Dictionary: a pair composed of a pointer or index on a previous
element of the dictionary and a symbol
 Each element of the dictionary will be related to a string of symbols
57
2. Source Coding
2.2.3. Algorithm LZ78 (cont.)
 Example:
• Binary sequence:
001000001100010010000010110001000001000011000101010000
100000110000010110000
• We first find the shortest string that we have not yet found starting
from the left.
0,01000001100010010000010110001000001000011
• The second string different from 0 is 01
0,01,000001100010010000010110001000001000011
• The third string different from 0 and 01 is 00
0,01,00,0001100010010000010110001000001000011
• Finally, the sequence can be decomposed as follows:
0, 01, 00, 000, 1, 10, 001, 0010, 0000, 101, 100, 010, 00001, 000011
58
2. Source Coding
2.2.3. Algorithm LZ78 (cont.)
 Example:
• Finally, the sequence can be decomposed as follows:
0, 01, 00, 000, 1, 10, 001, 0010, 0000, 101, 100, 010, 00001, 000011
• Dictionary of strings
• Line 1: index of the strings

• Line 2: strings
• Line 3: pair index-symbol → encoding
• Ex: 0010 → 7-0 (001 → index 7)
• Encoded sequence:
0-0, 1-1, 1-0, 3-0, 0-1, 5-0, 3-1, 7-0, 4-0, 6-1, 6-0, 2-0, 9-1, 13-159
2. Source Coding
 Example:
• Encoded sequence:
0-0, 1-1, 1-0, 3-0, 0-1, 5-0, 3-1, 7-0, 4-0, 6-1, 6-0, 2-0, 9-1, 13-1
 Tree associated with the strings memorized in the dictionary
• Node: string [adding a 0 or a 1 (label on the branch) to a previous
string]
 Binary encoded sequence:
0-0, 1-1, 01-0, 11-0, 000-1, 101-0, 011-1,
111-0, 0100-0, 0110-1, 0110-0, 0010-0,
1001-1, 1101-1
 Note:
- 2 strings: index of length 1,
- 2 other strings: index of length 2,
- 22 strings: index of length 3,
- 23 strings: index of length 4, etc. 60

Source Coding (Mã Hóa Ngu N)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Source Coding (Mã Hóa Ngu N)

Uploaded by

Copyright:

Available Formats

PRINCIPLES OF COMMUNICATIONS

Two fundamental concepts of communication systems: source

X be an experiment or an observation that can be repeated

For discrete random variables, the probability density fX (x)

δ(u): the Dirac function

 Marginal probability (Xác suất biên)

• Random variable mean:

 The random process is stationary (dừng) if its probability density is

 The random process X is second-order stationary (dừng bậc 2) or

 The power spectrum density γXX(f)

 Most random processes that are considered in digital

 If the source is generating a sequence composed of n independent

Pr (X = xi, Y = yj): the joint probability of the two events

 Two events are independent:

 Compared to h(xi), the mutual information i(xi ; yj) can be negative

 Entropy is maximum: pi are the same

 Conditional entropy: H(X/Y)

 Relations between entropies and average mutual information

⇒ HD(X): quantity of information of X.

lossless source coding lossy source coding

HJ(X): entropy per group of J symbols 24

 Rreds: 0 → 1 (entropy of this source is zero)

We can associate, on average NH(X) bits, with each message x.

 Output of source coder are bits (q = 2)

- Satisfy both unique coding and decoding

Tree associated with the source code

 Fundamental theorem of source coding

 Binary data rate at the output: D’B

 Output of the binary source encoder: entropy per H’(X)

Rmoy ≥ H(X) ⇒ DS.Rmoy ≥ DS. H(X) ⇒

• In case of equality, one bit will carry the quantity of information

Block diagram of the coder-decoder

If the source is Gaussian:

D(R): distortion rate function

 The labels on the branches are the conditional probabilities Pr(Y =

 The labels on the branches are the conditional probabilities Pr(Y =

 Conditional entropy: H(Y|X)

 Average mutual information:

H2(p): independent of q ⇒ I(X;Y) max ⇒ H(Y) max ⇒ Pr(Y = 0) = Pr(Y =

 q = 0.5 ⇒ I(X;Y) max ⇒ H2(q) = 1

 Solution: add a prefix only when the number of identical

 Add an additional symbol: indicate the position of the repeated symbols

• Line 1: index of the strings

You might also like