Professional Documents
Culture Documents
Week 3
Week 3
Digital Communications
Week # 3
1
Last time, we talked about:
Fundamental of Set theory, Bayes' Law and it applications on
Binary Communication Channel, and fundamentals of Channel
Coding in terms of repetition codes.
We will explore Channel Coding aspects later again.
3
What is Information Theory
• C. E. Shannon, ”A mathematical theory of
communication,” Bell System Technical
Journal, 1948.
• Two fundamental questions in
communication theory:
– What is the ultimate limit on data compression?
– What is the ultimate transmission rate of reliable
communication over noisy channels?
• Shannon showed reliable (i.e., error-free)
communication is possible for all rates below
channel capacity (using channel coding).
• Any source can be represented in bits at any
rate above entropy (using source coding).
4
– Rise of digital information technology
What is Information?
• Information: any new knowledge about
something
• How can we measure it?
• Messages containing knowledge of a high probability
of occurrence → Not very informative
• Messages containing knowledge of low probability
of occurrence → More informative
• A small change in the probability of a certain output
should not change the information delivered by that
output by a large amount.
5
Definition
• The amount of information of a symbol s with
probability p:
1
I(s) = log
p
• Properties:
– p = 1 → I(s) = 0: a symbol that is certain to occur contains no
information.
– 0 ≤ p ≤ 1 → 0 ≤ I(s) ≤ ∞: the information measure is monotonic and
non-negative.
– p = p1 p2 → I(s) = I(p1) + I(p2): information is additive for
statistically independent events.
1
• In summary: I (s) = log 2 bits
p
7
Discrete Memoryless Source
• Suppose we have an information source emitting a
sequence of symbols from a finite alphabet:
S = {s1, s2, . . . , sK}
8
Source Entropy
• If symbol sk has occurred, then, by definition, we have received
1
I(sk ) = log2 = −log2 pk
pk
bits of information.
9
Meaning of Entropy
• What information about a source does its entropy give us?
• It is the amount of uncertainty before we receive it.
• It tells us how many bits of information per symbol we
expect to get on the average.
• Source entropy:
H (S) = - 0.7 log2 (0.7) - 0.2 log2 (0.2) - 0.1log2 (0.1)
= 0.7 0.515 + 0.2 2.322 + 0.1 3.322
1.157 bits/symbol
• How can we encode these symbols in order to
transmit them?
• We need 2 bits/symbol if encoded as
A = 00, B = 01, C = 10
• Entropy prediction: the average amount of information is
only 1.157 bits per symbol.
• We are wasting bits!
11
Average Codeword Length
• lk: the number of bits used to code the k-th symbol
• K: total number of symbols
• pk: the probability of occurrence of symbol k
• Define the average codeword length:
K
L = p k lk
k =1
• It represents the average number of bits per symbol in the
source alphabet.
• An idea to reduce average codeword length: symbols that
occur often should be encoded with short codewords;
symbols that occur rarely may be encoded using the long
codewords.
12
Minimum Codeword Length
• What is the minimum codeword length for a particular
alphabet of source symbols?
• In a system with 2 symbols that are equally likely:
– Probability of each symbol to occur: p = 1/2
– Best one can do: encode each with 1 bit only: 0 or 1
14
The General Case
• S = {s1, . . . , sK}: an alphabet
• pk: the probability of symbol sk to occur
• N : the number of symbols generated
• We expect Npk occurrences of sk
• Assume:
– The source is memoryless
– all symbols are independent
– the probability of occurrence of a typical sequence SN is:
16
Huffman Coding
• How one can designan efficient source coding
algorithm?
• Use Huffman Coding (among other algorithms)
• It yields the shortest average codeword length.
• Basic idea: choose codeword lengths so that more-
probable sequences have shorter codewords
• Huffman Code construction
– Sort the source symbols in order of decreasing probability.
– Take the two smallest p(xi) and assign each a different bit (i.e., 0 or 1).
Then merge into a single symbol.
– Repeat until only one symbol remains.
• It’s very easy to implement this algorithm in a
microprocessor or computer.
• Used in JPEG, MP3…
17
Optimal Codes
(Shortest Average Length)
The procedure of constructing this code is as follows:
1. At each stage, reduce to a shorter code by combining the least
probable symbols of the source alphabet into a single symbol whose
probability is equal to the sum of the two corresponding probabilities.
So, we have to encode a source alphabet of one less symbol.
2. Repeat the above procedure step by step until you get to a problem
of encoding two symbols. At this stage, one symbol is encoded as 1
and the other as 0.
3. Go back, one of these symbols (1 or 0 ) is split into two symbols.
4. Repeat step 3. One of the three symbols are split into two. Repeat
until you finish.
The above procedure is called Huffman Coding procedure.
The following two tables are the source reduction and code construction
processes. 18
Original Source Reduced Source Reduced Source Reduced Source
(stage 1) (stage 2) (stage 3)
sj P(sj) s1j P(s1j) s2j P(s2j) s3j P(s3j)
s1 0.3
s2 0.25
s3 0.25
s4 0.1
s5 0.1
The above code is optimal in the sense that we can not find shorter than
this code. Notice the Huffman coding procedure is not unique, but the
average length is unique.
20
Example: Consider the discrete memory less information source with
alphabet S = { s1, s2, s3, s4, s5 }. The corresponding probabilities
P = {0.4, .2, .2, .1, .1}. Following the above procedure, the Huffman code
is { 1, 01, 000, 0010, 0011}. The average length is
21
Extended Huffman Codes
To illustrate what we mean by extended Huffman codes, let us consider
the previous example by taking the second extension.
In this case the number of alphabet is 25 symbols.
S2 = {s1s1, s1s2, s1s3, s1s4, s1s5 , s2s1 , s2s2 , s2s3 ,. …….., s5s5 }.
The corresponding probabilities
P = {0.16, .08, .08, .04, .04, .08, .04, .04, ……, .01}.
Again, if we follow the procedure of constructing the Huffman code, we
come up with the following codewords: ( See the table ).
The average length for this code is 4.32 bits per extended symbol and
the entropy H = 4.2439 bits/ extended symbol.
The average length/bit = 2.16 bits/symbol. While H(S) = 2.12185 .
= H(S)/the average length = 2.12185/2.16 = 98.24 %.
The efficiency can be further improved by encoding higher extension
22
of
the source. Of course, the price we paid is the complexity of encoding a
larger symbol set.
probability Codeword probability Codeword probability Codeword
23
Special Case of Huffman Code
If all the symbols are equally likely and if there are 2M symbols, then
the binary Huffman code will be a fixed length code with all symbols
having the same length, M. The following example shows this fact.
Original Source Reduced Source (stage 1) Reduced Source (stage 2)
Again, this is not unique. There are other ways to encode these
symbols.
26
Huffman code with maximum efficiency
When the symbols probabilities are negative powers of two, then the efficiency of
the Huffman is maximum. For example, the Huffman for the source of probabilities
{ ½ , ¼ , 1/8 , 1/16 , 1/16 } is
{ 0, 10, 110, 1111, 1110 } and the average length is 1.875 bits/symbol. The entropy
is 1.875 bits/symbol. Therefore, efficiency, = H(S)/the average length = 100 %.
27
Application:
File
Compression
30