Data Compression Basics: Discrete Source

Data Compression Basics
 Discrete source
 Information=uncertainty
 Quantification of uncertainty
 Source entropy
 Variable length codes
 Motivation
 Prefix condition
 Huffman coding algorithm
1
Information
 What do we mean by information?
 “A numerical measure of the uncertainty of an
experimental outcome” – Webster Dictionary
 How to quantitatively measure and represent
information?
 Shannon proposes a probabilistic approach
 Let us first look at how we assess the amount of
information in our daily lives using common
sense
2
Information = Uncertainty
 Zero information
 Pittsburgh Steelers won the Superbowl XL (past news, no
uncertainty)
 Afridi plays for Pakistan (celebrity fact, no uncertainty)
 Little information
 It will be very cold in Lahore tomorrow (not much uncertainty
since this is winter time)
 It is going to rain in Malaysia next week (not much uncertainty
since it rains nine months a year in South East Asia)
 Large information
 An earthquake is going to hit Indonesia in July 2006 (are you sure?
an unlikely event)
 Someone has shown P=NP (Wow! Really? Who did it?)
3
Shannon’s Picture on Communication
(1948)
channel channel
source channel destination
encoder decoder
super-channel
source source
encoder decoder
The goal of communication is to move information

from here to there and from now to then
Examples of source:
Human speeches, photos, text messages, computer programs …
Examples of channel:
storage media, telephone lines, wireless transmission …
4
Source-Channel Separation Principle
The role of channel coding:

Fight against channel errors for reliable transmission of information
We simply assume the super-channel achieves error-free transmission
The role of source coding (data compression):

Facilitate storage and transmission by eliminating source redundancy
Our goal is to maximally remove the source redundancy
by intelligent designing source encoder/decoder
5
Discrete Source
 A discrete source is characterized by a discrete
random variable X
 Examples
 Coin flipping: P(X=H)=P(X=T)=1/2
 Dice tossing: P(X=k)=1/6, k=1-6
 Playing-card drawing:
P(X=S)=P(X=H)=P(X=D)=P(X=C)=1/4
What is the redundancy with a discrete source?
6
Two Extreme Cases
tossing source source
channel
a fair coin encoder decoder
P(X=H)=P(X=T)=1/2: (maximum uncertainty)

Minimum (zero) redundancy, compression impossible
tossing a coin with Head HHHH…

or channel duplication
two identical sides
Tail? TTTT…
P(X=H)=1,P(X=T)=0: (minimum redundancy)
Maximum redundancy, compression trivial (1bit is enough)
Redundancy is the opposite of uncertainty

7
Quantifying Uncertainty of an Event
Self-information
I ( p )   log 2 p p - probability of the event x

(e.g., x can be X=H or X=T)
p I ( p) notes
1 0 must happen
(no uncertainty)
 unlikely to happen
0
(infinite amount of uncertainty)
Intuitively, I(p) measures the amount of uncertainty with event x
8
Weighted Self-information
p I ( p) I w ( p)  p  I ( p)
0  0
1/2 1 1/2
1 0 0
As p evolves from 0 to 1, weighted self-information

I w ( p )   p  log 2 p first increases and then decreases
Question: Which value of p maximizes Iw(p)?
9
Maximum of Weighted Self-information*
p=1/e
1
I w ( p) 
e ln 2
10
Quantification of Uncertainty of a Discrete Source
 A discrete source (random variable) is a collection

(set) of individual events whose probabilities sum to 1
X is a discrete random variable
x  {1,2,..., N }
N
pi  prob ( x  i ), i  1,2,..., N p
i 1
i 1
 To quantify the uncertainty of a discrete source,

we simply take the summation of weighted self-
information over the whole set
11
Shannon’s Source Entropy Formula
N
H ( X )   I w ( pi )
i 1
N
H ( X )   pi log 2 pi (bits/sample)
i 1 or bps
Weighting
coefficients
12
Source Entropy Examples
 Example 1: (binary Bernoulli source)

Flipping a coin with probability of head being p (0<p<1)
p  prob ( x  0), q  1  p  prob ( x  1)

H ( X )  ( p log 2 p  q log 2 q)
Check the two extreme cases:

As p goes to zero, H(X) goes to 0 bps  compression gains the most
As p goes to a half, H(X) goes to 1 bps  no compression can help
13
Entropy of Binary Bernoulli Source
14
Source Entropy Examples
N
 Example 2: (4-way random walk)
1 1 W E
prob ( x  S )  , prob ( x  N ) 
2 4
1
prob ( x  E )  prob ( x  W )  S
8
1 1 1 1 1 1 1 1
H ( X )  ( log 2  log 2  log 2  log 2 )  1.75bps
2 2 4 4 8 8 8 8
15
Source Entropy Examples (Con’t)
 Example 3: (source with geometric distribution)

A jar contains the same number of balls with two different colors: blue and red.
Each time a ball is randomly picked out from the jar and then put back. Consider
the event that at the k-th picking, it is the first time to see a red ball – what is the
probability of such event?
1 1
p  prob( x  red )  ,1  p  prob( x  blue) 
2 2
Prob(event)=Prob(blue in the first k-1 picks)Prob(red in the k-th pick )
=(1/2)k-1(1/2)=(1/2)k
16
Source Entropy Calculation
If we consider all possible events, the sum of their probabilities will be one.
 k
1
Check:    1
k 1  2  k
1
Then we can define a discrete random variable X with P( x  k )   
2
Entropy:
  k
1
H ( X )   pk log 2 pk   k    2bps
k 1 k 1  2 
17
Properties of Source Entropy
 Nonnegative and concave
 Achieves the maximum when the source
observes uniform distribution (i.e.,
P(x=k)=1/N, k=1-N)
 Goes to zero (minimum) as the source becomes
more and more skewed (i.e., P(x=k)1, P(xk)
0)
18
What is the use of H(X)?
Shannon’s first theorem (noiseless coding theorem)

For a memoryless discrete source X, its entropy H(X)
defines the minimum average code length required to
noiselessly code the source.
Notes:
1. Memoryless means that the events are independently
generated (e.g., the outcomes of flipping a coin N times
are independent events)
2. Source redundancy can be then understood as the
difference between raw data rate and source entropy
19
Code Redundancy*
Practical performance Theoretical bound
r  l  H(X )  0
N li: the length of
Average code length: l   pi li
i 1
codeword assigned
N
1 to the i-th symbol
H ( X )   pi log 2
i 1 pi
Note: if we represent each symbol by q bits (fixed length codes),
Then redundancy is simply q-H(X) bps
20
How to achieve source entropy?
discrete entropy binary

source X coding bit stream
P(X)
Note: The above entropy coding problem is based on simplified

assumptions are that discrete source X is memoryless and P(X)
is completely known. Those assumptions often do not hold for
real-world data such as images and we will recheck them later.
21
Data Compression Basics
 Discrete source
 Information=uncertainty
 Quantification of uncertainty
 Source entropy
 Variable length codes
 Motivation
 Prefix condition
 Huffman coding algorithm
22
Variable Length Codes (VLC)
Recall:
Self-information I ( p )   log 2 p
It follows from the above formula that a small-probability event contains

much information and therefore worth many bits to represent it. Conversely,
if some event frequently occurs, it is probably a good idea to use as few bits
as possible to represent it. Such observation leads to the idea of varying the
code lengths based on the events’ probabilities.
Assign a long codeword to an event with small probability

Assign a short codeword to an event with large probability
23
4-way Random Walk Example
fixed-length variable-length
symbol k pk codeword codeword
S 0.5 00 0
N 0.25 01 10
E 0.125 10 110
W 0.125 11 111
symbol stream : SSNWSENNNWSSSNESS

fixed length: 00 00 01 11 00 10 01 01 11 00 00 00 01 10 00 00 32bits
variable length: 0 0 10 111 0 110 10 10 111 0 0 0 10 110 0 0 28bits
4 bits savings achieved by VLC (redundancy eliminated)
24
Toy Example (Con’t)
• source entropy: 4
H ( X )   pk log 2 pk
k 1
=0.5×1+0.25×2+0.125×3+0.125×3
=1.75 bits/symbol
• average code length:
Nb Total number of bits
l (bps)
Ns Total number of symbols
fixed-length variable-length
l  2bps  H ( X ) l  1.75bps  H ( X )
25
Problems with VLC
 When codewords have fixed lengths, the
boundary of codewords is always identifiable.
 For codewords with variable lengths, their
boundary could become ambiguous
symbol VLC SSNW SE …
e
S 0 0 0 1 11 0 10…
N 1 0 0 11 1 0 10… 0 0 1 11 0 1 0…
E 10 d d
W 11 SSWN SE … SSNW SE …
26
Uniquely Decodable Codes
 To avoid the ambiguity in decoding, we need to
enforce certain conditions with VLC to make
them uniquely decodable
 Since ambiguity arises when some codeword
becomes the prefix of the other, it is natural to
consider prefix condition
Example: p  pr  pre  pref  prefi  prefix
ab: a is the prefix of b
27
Prefix condition
No codeword is allowed to
be the prefix of any other
codeword.
We will graphically illustrate this condition

with the aid of binary codeword tree
28
Binary Codeword Tree
root # of codewords
Level 1 1 0 2
Level 2 11 10 01 00 22
Level k …… 2k
29
Prefix Condition Examples
symbol x codeword 1 codeword 2
S 0 0
N 1 10
E 10 110
W 11 111
1 0 1 0
11 10 01 00 11 10
…… 111 110… …
codeword 1 codeword 2
30
How to satisfy prefix condition?
 Basic rule: If a node is used as a codeword,
then all its descendants cannot be used as
codeword.
Example 1 0
11 10
111 110
31
Property of Prefix Codes
N
Kraft’s inequality  1
2  li
i 1
li: length of the i-th codeword (proof skipped)
Example symbol x VLC- 1 VLC-2

S 0 0
N 1 10
E 10 110
W 4 11 4 111
 1
2
i 1
 li
 1
2
i 1
 li
32
Two Goals of VLC design
• achieve optimal code length (i.e., minimal redundancy)
For an event x with probability of p(x), the optimal
code-length is –log2p(x) , where x denotes the
smallest integer larger than x (e.g., 3.4=4 )
code redundancy: r  l  H ( X )  0
Unless probabilities of events are all power of 2,

we often have r>0
• satisfy prefix condition
33
Golomb Codes for Geometric Distribution
Optimal VLC for geometric source: P(X=k)=(1/2)k, k=1,2,…
k codeword
1 0 1 0
2 10
3 110 1 0
4 1110
5 11110 1 0
6 111110
1 0
7 1111110
8 11111110
… …… …
34

Data Compression Basics: Discrete Source

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Compression Basics: Discrete Source

Uploaded by

Copyright:

Available Formats

Data Compression Basics

The goal of communication is to move information

The role of channel coding:

We simply assume the super-channel achieves error-free transmission

The role of source coding (data compression):

P(X=H)=P(X=T)=1/2: (maximum uncertainty)

tossing a coin with Head HHHH…

Redundancy is the opposite of uncertainty

I ( p )   log 2 p p - probability of the event x

As p evolves from 0 to 1, weighted self-information

Question: Which value of p maximizes Iw(p)?

 A discrete source (random variable) is a collection

 To quantify the uncertainty of a discrete source,

 Example 1: (binary Bernoulli source)

p  prob ( x  0), q  1  p  prob ( x  1)

Check the two extreme cases:

As p goes to a half, H(X) goes to 1 bps  no compression can help

 Example 3: (source with geometric distribution)

Shannon’s first theorem (noiseless coding theorem)

discrete entropy binary

Note: The above entropy coding problem is based on simplified

It follows from the above formula that a small-probability event contains

Assign a long codeword to an event with small probability

symbol stream : SSNWSENNNWSSSNESS

ab: a is the prefix of b

We will graphically illustrate this condition

li: length of the i-th codeword (proof skipped)

Example symbol x VLC- 1 VLC-2

Unless probabilities of events are all power of 2,

• satisfy prefix condition

You might also like