Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 34

Data Compression Basics

 Discrete source
 Information=uncertainty
 Quantification of uncertainty
 Source entropy
 Variable length codes
 Motivation
 Prefix condition
 Huffman coding algorithm

1
Information
 What do we mean by information?
 “A numerical measure of the uncertainty of an
experimental outcome” – Webster Dictionary
 How to quantitatively measure and represent
information?
 Shannon proposes a probabilistic approach
 Let us first look at how we assess the amount of
information in our daily lives using common
sense

2
Information = Uncertainty
 Zero information
 Pittsburgh Steelers won the Superbowl XL (past news, no
uncertainty)
 Afridi plays for Pakistan (celebrity fact, no uncertainty)
 Little information
 It will be very cold in Lahore tomorrow (not much uncertainty
since this is winter time)
 It is going to rain in Malaysia next week (not much uncertainty
since it rains nine months a year in South East Asia)
 Large information
 An earthquake is going to hit Indonesia in July 2006 (are you sure?
an unlikely event)
 Someone has shown P=NP (Wow! Really? Who did it?)

3
Shannon’s Picture on Communication
(1948)

channel channel
source channel destination
encoder decoder
super-channel

source source
encoder decoder

The goal of communication is to move information


from here to there and from now to then

Examples of source:
Human speeches, photos, text messages, computer programs …

Examples of channel:
storage media, telephone lines, wireless transmission …
4
Source-Channel Separation Principle

The role of channel coding:


Fight against channel errors for reliable transmission of information

We simply assume the super-channel achieves error-free transmission

The role of source coding (data compression):


Facilitate storage and transmission by eliminating source redundancy
Our goal is to maximally remove the source redundancy
by intelligent designing source encoder/decoder

5
Discrete Source
 A discrete source is characterized by a discrete
random variable X
 Examples
 Coin flipping: P(X=H)=P(X=T)=1/2
 Dice tossing: P(X=k)=1/6, k=1-6
 Playing-card drawing:
P(X=S)=P(X=H)=P(X=D)=P(X=C)=1/4
What is the redundancy with a discrete source?

6
Two Extreme Cases
tossing source source
channel
a fair coin encoder decoder

P(X=H)=P(X=T)=1/2: (maximum uncertainty)


Minimum (zero) redundancy, compression impossible

tossing a coin with Head HHHH…


or channel duplication
two identical sides
Tail? TTTT…
P(X=H)=1,P(X=T)=0: (minimum redundancy)
Maximum redundancy, compression trivial (1bit is enough)

Redundancy is the opposite of uncertainty


7
Quantifying Uncertainty of an Event

Self-information

I ( p )   log 2 p p - probability of the event x


(e.g., x can be X=H or X=T)
p I ( p) notes

1 0 must happen
(no uncertainty)

 unlikely to happen
0
(infinite amount of uncertainty)
Intuitively, I(p) measures the amount of uncertainty with event x

8
Weighted Self-information

p I ( p) I w ( p)  p  I ( p)
0  0
1/2 1 1/2
1 0 0

As p evolves from 0 to 1, weighted self-information


I w ( p )   p  log 2 p first increases and then decreases

Question: Which value of p maximizes Iw(p)?

9
Maximum of Weighted Self-information*

p=1/e

1
I w ( p) 
e ln 2

10
Quantification of Uncertainty of a Discrete Source

 A discrete source (random variable) is a collection


(set) of individual events whose probabilities sum to 1
X is a discrete random variable
x  {1,2,..., N }
N
pi  prob ( x  i ), i  1,2,..., N p
i 1
i 1

 To quantify the uncertainty of a discrete source,


we simply take the summation of weighted self-
information over the whole set
11
Shannon’s Source Entropy Formula
N
H ( X )   I w ( pi )
i 1

N
H ( X )   pi log 2 pi (bits/sample)
i 1 or bps

Weighting
coefficients

12
Source Entropy Examples

 Example 1: (binary Bernoulli source)


Flipping a coin with probability of head being p (0<p<1)

p  prob ( x  0), q  1  p  prob ( x  1)


H ( X )  ( p log 2 p  q log 2 q)

Check the two extreme cases:


As p goes to zero, H(X) goes to 0 bps  compression gains the most

As p goes to a half, H(X) goes to 1 bps  no compression can help

13
Entropy of Binary Bernoulli Source

14
Source Entropy Examples
N
 Example 2: (4-way random walk)

1 1 W E
prob ( x  S )  , prob ( x  N ) 
2 4
1
prob ( x  E )  prob ( x  W )  S
8
1 1 1 1 1 1 1 1
H ( X )  ( log 2  log 2  log 2  log 2 )  1.75bps
2 2 4 4 8 8 8 8

15
Source Entropy Examples (Con’t)

 Example 3: (source with geometric distribution)


A jar contains the same number of balls with two different colors: blue and red.
Each time a ball is randomly picked out from the jar and then put back. Consider
the event that at the k-th picking, it is the first time to see a red ball – what is the
probability of such event?

1 1
p  prob( x  red )  ,1  p  prob( x  blue) 
2 2
Prob(event)=Prob(blue in the first k-1 picks)Prob(red in the k-th pick )
=(1/2)k-1(1/2)=(1/2)k

16
Source Entropy Calculation
If we consider all possible events, the sum of their probabilities will be one.
 k
1
Check:    1
k 1  2  k
1
Then we can define a discrete random variable X with P( x  k )   
2
Entropy:
  k
1
H ( X )   pk log 2 pk   k    2bps
k 1 k 1  2 

17
Properties of Source Entropy
 Nonnegative and concave
 Achieves the maximum when the source
observes uniform distribution (i.e.,
P(x=k)=1/N, k=1-N)
 Goes to zero (minimum) as the source becomes
more and more skewed (i.e., P(x=k)1, P(xk)
0)

18
What is the use of H(X)?

Shannon’s first theorem (noiseless coding theorem)


For a memoryless discrete source X, its entropy H(X)
defines the minimum average code length required to
noiselessly code the source.

Notes:
1. Memoryless means that the events are independently
generated (e.g., the outcomes of flipping a coin N times
are independent events)
2. Source redundancy can be then understood as the
difference between raw data rate and source entropy
19
Code Redundancy*
Practical performance Theoretical bound

r  l  H(X )  0
N li: the length of
Average code length: l   pi li
i 1
codeword assigned
N
1 to the i-th symbol
H ( X )   pi log 2
i 1 pi
Note: if we represent each symbol by q bits (fixed length codes),
Then redundancy is simply q-H(X) bps
20
How to achieve source entropy?

discrete entropy binary


source X coding bit stream

P(X)

Note: The above entropy coding problem is based on simplified


assumptions are that discrete source X is memoryless and P(X)
is completely known. Those assumptions often do not hold for
real-world data such as images and we will recheck them later.

21
Data Compression Basics
 Discrete source
 Information=uncertainty
 Quantification of uncertainty
 Source entropy
 Variable length codes
 Motivation
 Prefix condition
 Huffman coding algorithm

22
Variable Length Codes (VLC)
Recall:
Self-information I ( p )   log 2 p

It follows from the above formula that a small-probability event contains


much information and therefore worth many bits to represent it. Conversely,
if some event frequently occurs, it is probably a good idea to use as few bits
as possible to represent it. Such observation leads to the idea of varying the
code lengths based on the events’ probabilities.

Assign a long codeword to an event with small probability


Assign a short codeword to an event with large probability

23
4-way Random Walk Example
fixed-length variable-length
symbol k pk codeword codeword
S 0.5 00 0
N 0.25 01 10
E 0.125 10 110
W 0.125 11 111

symbol stream : SSNWSENNNWSSSNESS


fixed length: 00 00 01 11 00 10 01 01 11 00 00 00 01 10 00 00 32bits
variable length: 0 0 10 111 0 110 10 10 111 0 0 0 10 110 0 0 28bits
4 bits savings achieved by VLC (redundancy eliminated)

24
Toy Example (Con’t)
• source entropy: 4
H ( X )   pk log 2 pk
k 1
=0.5×1+0.25×2+0.125×3+0.125×3
=1.75 bits/symbol
• average code length:
Nb Total number of bits
l (bps)
Ns Total number of symbols

fixed-length variable-length
l  2bps  H ( X ) l  1.75bps  H ( X )

25
Problems with VLC
 When codewords have fixed lengths, the
boundary of codewords is always identifiable.
 For codewords with variable lengths, their
boundary could become ambiguous
symbol VLC SSNW SE …
e
S 0 0 0 1 11 0 10…
N 1 0 0 11 1 0 10… 0 0 1 11 0 1 0…
E 10 d d
W 11 SSWN SE … SSNW SE …
26
Uniquely Decodable Codes
 To avoid the ambiguity in decoding, we need to
enforce certain conditions with VLC to make
them uniquely decodable
 Since ambiguity arises when some codeword
becomes the prefix of the other, it is natural to
consider prefix condition
Example: p  pr  pre  pref  prefi  prefix

ab: a is the prefix of b

27
Prefix condition

No codeword is allowed to
be the prefix of any other
codeword.

We will graphically illustrate this condition


with the aid of binary codeword tree

28
Binary Codeword Tree
root # of codewords

Level 1 1 0 2

Level 2 11 10 01 00 22

Level k …… 2k

29
Prefix Condition Examples
symbol x codeword 1 codeword 2
S 0 0
N 1 10
E 10 110
W 11 111

1 0 1 0

11 10 01 00 11 10

…… 111 110… …
codeword 1 codeword 2
30
How to satisfy prefix condition?
 Basic rule: If a node is used as a codeword,
then all its descendants cannot be used as
codeword.
Example 1 0

11 10
111 110

31
Property of Prefix Codes
N
Kraft’s inequality  1
2  li

i 1

li: length of the i-th codeword (proof skipped)

Example symbol x VLC- 1 VLC-2


S 0 0
N 1 10
E 10 110
W 4 11 4 111

 1
2
i 1
 li
 1
2
i 1
 li

32
Two Goals of VLC design
• achieve optimal code length (i.e., minimal redundancy)
For an event x with probability of p(x), the optimal
code-length is –log2p(x) , where x denotes the
smallest integer larger than x (e.g., 3.4=4 )
code redundancy: r  l  H ( X )  0

Unless probabilities of events are all power of 2,


we often have r>0

• satisfy prefix condition

33
Golomb Codes for Geometric Distribution
Optimal VLC for geometric source: P(X=k)=(1/2)k, k=1,2,…

k codeword
1 0 1 0
2 10
3 110 1 0
4 1110
5 11110 1 0
6 111110
1 0
7 1111110
8 11111110
… …… …
34

You might also like