Professional Documents
Culture Documents
ECM3701 Study Unit 8
ECM3701 Study Unit 8
ECM3701 Study Unit 8
1.2 INTRODUCTION
This study unit focuses mainly on encoding and decoding of digital information. One of a
good analogy to understand encoding is when an encoder is perceived as a person who is
developing or preparing and sending a message. The person must determine how the message
will be received by the audience and should also make necessary preparations for the message
to be received as it should be by the receiver. When we are in the process of putting our
thoughts into action, as human beings, we are encoding. After encoding, the message (which
could be a phone call, an email, a text message, a face-to-face meeting, or any other
communication tool) is sent through a ‘medium’. As the message is being transmitted,
anything could happen. It may get hijacked along the way or corrupted. Therefore, the
encoder should be designed to counteract all possible disturbances that may occur during
transmission. Some of the methods to counteract possible disturbances are encryption and
error correction techniques, some of which will be dealt with in this study unit. Upon arrival
of the message, the audience then ‘decodes’, or interprets, the message. In the context of this
analogy, decoding is the process of turning what has been communicated into thoughts. Have
this analogy in mind as you study this unit.
Noise
ACTIVITY 1
Download Shannon’s paper from the Internet. The full citation of the paper is ‘Shannon, C.
E. (1948). A mathematical theory of communication. Bell System Technical, Journal, 27, 379-
423.’ Take your time to read through this classical paper.
ACTIVITY 2
Watch the video clip to learn more about information basics, definition, uncertainty and the
properties of information:
https://www.youtube.com/watch?v=18E3NllObqg&list=PLgwJf8NK-
2e5oBBXubqVMiPQPSNMF4Zgz (16:57)
1 Sketch the basic block diagram of a communication system.
2 What is the relationship between probability and uncertainty in information theory?
3 Show, by means of calculation, that if uncertainty about the message is more, then
information is less of the following probabilities: P1 = 1/8 and P2 = 1/4.
4 If I1 is the information of message m1 and I2 is the information of message m2, show
that the combined information of m1 and m2 is qual to I1 + I2.
5 Show that if there are M = 2N equally likely messages, then the amount of information
carried by each message will be N bits.
1.3.1 Information
It has already been established that information relates to probability and uncertainty about an
event. Let P(message) denote the probability of a message, like-wise P(m) denotes the
probability of m. Table 1 shows messages with their probabilities and information content.
The definition of information content for a symbol m should be such that it monotonically
decreases with increasing message probability, P(m), and it goes to zero for a probability of
unity. Another desirable property is that of additivity. If one were to communicate two
(independent) messages in sequence, the total information content should be equal to the sum
of the individual information contents of the two messages. We know that the total probability
of the composite message is the product of the two individual independent probabilities.
Therefore, the definition of information must be such that when probabilities are multiplied
information is added.
The required properties of an information measure are summarised in Table 1. The logarithm
operation clearly satisfies these requirements. We thus define the information content, Im, of
a message, m, as:
(8.1)
This definition satisfies the additivity requirement, the monotonicity requirement, and for P(m)
= 1, Im = 0. Note that this is true regardless of the base chosen for the logarithm. Base 2 is
usually chosen; however, the resulting quantity of information being measured in bits:
(8.2)
________________________________________________________________
EXAMPLE 1
Consider, initially, vocabularies of equiprobable message symbols represented by fixed length
binary codewords. Thus, a vocabulary size of four symbols is represented by the binary digit
pairs 00, 01, 10, 11. The binary word length and symbol probabilities for other vocabulary sizes
are shown in Table 2. The ASCII vocabulary contains 128 symbols and therefore uses a log2
128 = 7 digit (fixed length) binary codeword to represent each symbol.
Table 2: Word length and symbol probabilities (Glover I. A. & Grant P. M., 2010).
ASCII symbols are not equiprobable, however, and for this particular code each symbol does
not, therefore, have a selection probability of 1/128.
What will be the information content, in bits, of the following symbols, 2, 4, 8 and 128?
Solution:
(a) I2 = - log2 P(2) = - log2 (1/2) = 1 bit
(b) I4 = - log2 P(4) = - log2 (1/4) = 2 bits
(c) I8 = - log2 P(8) = - log2 (1/8) = 3 bits
(d) I128 = - log2 P(128) = - log2 (1/128) = 7 bits
________________________________________________________________
ACTIVITY 3
A card is selected at random from a deck of paying cards. Suppose you have been told that it
is blue in colour.
1 Calculate the amount of information you have received. [1 bit]
2 How much more information you needed to completely specify the card? [4.7 bits]
(8.3)
For the two-symbol alphabet (0, 1) if we let P(1) = p then P(0) = 1 − p and:
(8.4)
The entropy is maximised when the symbols are equiprobable as shown in Figure 1. The
entropy definition of equation (8.3) holds for all alphabet sizes. Note that, in the binary case,
as either of the two messages becomes more likely, the entropy decreases. When either message
has probability 1, the entropy goes to zero. This is reasonable since, at these points, the
outcome of the transmission is certain. Thus, if P(0) = 1, we know the symbol 0 will be sent
repeatedly. If P(0) = 0, we know the symbol 1 will be sent repeatedly. In these two cases, no
information is conveyed by transmitting the symbols.
Figure 1: Entropy for binary data transmission versus the selection probability, p, of a
digital 1 (Glover I. A. & Grant P. M., 2010).
For the ASCII alphabet, source entropy would be given by H = −log2 (1/128) =
7 bit/symbol if all the symbols were both equiprobable and statistically independent. In
practice H is less than this, i.e.:
(8.5)
since the symbols are neither, making the code less than 100% efficient. Entropy thus indicates
the minimum number of binary digits required per symbol (averaged over a long sequence of
symbols).
ACTIVITY 4
Watch the following video clip to learn more about Entropy basics, Definition and Properties:
https://www.youtube.com/watch?v=BBdilQkH3gE&list=PLgwJf8NK-
2e5oBBXubqVMiPQPSNMF4Zgz&index=3 (7:54)
After watching the video, answer the following questions:
1 Prove that entropy (H) is zero when the event is certain.
2 Prove that when 𝑃𝑘 = 𝑚 for all ‘m’ symbols, then the symbols are equiprobable, so
𝐻𝑚𝑎𝑥 = 𝑙𝑜𝑔2 𝑚.
(8.6)
where P(j, i) is the probability of the source selecting i and j and P(j|i) is the probability that
the source will select j given that it has previously selected i . Bayes’s theorem, equation (8.6)
can be re-expressed as:
(8.7)
For independent symbols P(j|i) = P(j) and equation (8.7) reduces to equation (8.3). The effect
of having dependency between symbols is to increase the probability of selecting some
symbols at the expense of others given a particular symbol history. This reduces the average
information conveyed by the symbols, which is reflected in a reduced entropy. The difference
between the actual entropy of a source and the (maximum) entropy, Hmax, the source could
have if its symbols were independent and equiprobable is called the redundancy of the source.
For an M-symbol alphabet redundancy, R, is therefore given by:
(8.8)
_______________________________________________________________
EXAMPLE 2
Find the entropy, redundancy and information rate of a four-symbol source (A, B, C, D) with
a baud rate of 1024 symbol/s and symbol selection probabilities of 0.5, 0.2, 0.2 and 0.1,
respectively under the following conditions:
(i) The source is memoryless (i.e. the symbols are statistically independent).
(ii) The source has a one-symbol memory such that no two consecutively selected
symbols can be the same. (The long-term relative frequencies of the symbols
remain unchanged, however.)
Solution:
(i)
where Ri is the information rate and Rs is the symbol rate.
(ii) The appropriate formula to apply to find the entropy of a source with one-symbol
memory is equation (8.7). First, however, we must find the conditional probabilities
which the formula contains. If no two consecutive symbols can be the same, then:
Since the (unconditional) probability of A is unchanged, P(A) = 0.5, then every, and only every,
alternate symbol must be A, P(A|A) = P(A|A) = 1.0, where A represents not A. Furthermore,
if every alternate symbol is A then no two non- A symbols can occur consecutively, P(A|A)
= 0. Writing the above three probabilities explicitly:
Similarly:
We now have numerical values for all the conditional probabilities which can be substituted
into equation (8.7):
Note that here the symbol A carries no information at all since its occurrences are entirely
predictable.
ACTIVITY 5
Watch the following video clip to learn more about Entropy, source efficiency, redundancy &
information rate.
https://www.youtube.com/watch?v=Wc0oB04JOkI&list=PLgwJf8NK-
2e5oBBXubqVMiPQPSNMF4Zgz&index=5 (18:06)
Answer the following questions:
1 For a discrete, memoryless source there are three symbols with probabilities P1 = ɑ and
P2 = P3. Calculate the entropy of the source.
2 Show that the entropy (H) of the source with the probability distribution shown in Table
1 is equal to [2 – (2 + n)/2^n]. Hint: Use Taylor series expansion.
Table 1
S S1 S2 S3 … Sn
P 1/2 1/4 1/8 … 1/2n
3 The source emits three messages with probabilities P1 = 0.7, P2 = 0.2 and P3 = 0.1.
Calculate,
(a) Source entropy [1.157 bits/message]
(b) Maximum entropy. [1.585 bits/message]
(c) Source efficiency [0.73]
(d) Redundancy [0.27]
4 A discrete source emits one of six symbols once every m seconds. The symbol
probability are ½, ¼, 1/8, 1/16, 1/32 and 1/32. Find the source entropy and
information rate. [R = 1937.5 bits/seconds]
(8.9)
The maximum possible entropy, Hmax, of this source would be realised if all symbols were
equiprobable, P(m) = 1/M, i.e.:
(8.10)
(8.11)
If source symbols are coded into binary words, then there is a useful alternative interpretation
of ηcode. For a set of symbols represented by binary codewords with lengths lm (binary) digits,
an overall code length, L, can be defined as the average codeword length, i.e.:
(8.12)
Equations (8.12) and (8.13) are seen to be entirely consistent when it is remembered that the
maximum information conveyable per L digit binary codeword is given by:
(8.14)
________________________________________________________________
EXAMPLE 3
A scanner converts a black and white document, line by line, into binary data for transmission.
The scanner produces source data comprising symbols representing runs of up to six similar
image pixel elements with the probabilities as shown below:
Determine the average length of a run (in pixels) and the corresponding effective information
rate for this source when the scanner is traversing 1000 pixel/s.
Solution:
Average length:
At 1000 pixel/s scan rate we generate 1000/2.69 = 372 symbol/s. Thus, the source
information rate is 2.29 × 372 = 852 bit/s.
________________________________________________________________
In general, source coding is focused on finding a more efficient code which represents the
same information using fewer digits on average. This has led to the development of source
codes that use different lengths of codeword for different symbols. The problem with such
variable length codes is in recognising the start and end of the symbols.
Figure 2: Entropy for binary data transmission versus the selection probability, p, of a
digital 1 (Glover I. A. & Grant P. M., 2010).
If we receive the codeword 0011 it is not known whether the transmission was D, C or A, A,
C. This example is not, therefore, uniquely decodable.
(2) Instantaneous decoding.
Consider now an M = 4 symbol alphabet, with the following binary representation:
A=0
B = 10
C = 110
D = 111
This code can be instantaneously decoded using the decision tree shown in Figure 2 since no
complete codeword is a prefix of a larger codeword. This is in contrast to the previous example
where A is a prefix of both B and D. The latter example is also a “comma code” as the symbol
zero indicates the end of a codeword except for the all-ones word whose length is known.
Note that we are restricted in the number of available codewords with small numbers of bits
to ensure we achieve the desired decoding properties.
Using the representation:
A=0
B = 01
C = 011
D = 111
the code is identical to the example just given but the bits are time reversed. It is thus still
uniquely decodable but no longer instantaneous, since early codewords are now prefixes of
later ones.
(8.15)
and, at a symbol rate of 1 symbol/s, the information rate is 2.55 bit/s. The maximum entropy
of an eight-symbol source is log2 8 = 3 bit/symbol and the source efficiency is therefore given
by:
(8.16)
If the symbols are each allocated 3 bits, comprising all the binary patterns between 000 and
111, the coding efficiency will remain unchanged at 85%.
Next, a famous variable length source coding algorithm called Huffman coding is discussed.
(8.17)
(8.18)
The 85% efficiency without coding would have been improved to 96.6% using Shannon–
Fano coding but Huffman coding at 97.7% is even better. The maximum efficiency is
obtained when symbol probabilities are all negative, integer, powers of two, i.e. 1/2n. Note
that the Huffman codes are formulated to minimise the average codeword length. They do
not necessarily possess error detection properties but are uniquely, and instantaneously,
decodable. Error detection and correction codes are discussed next.
ACTIVITY 6
Take your time to watch the video clip form the following link:
https://www.youtube.com/watch?v=HmBH30NrM7c&list=PLgwJf8NK-
2e5oBBXubqVMiPQPSNMF4Zgz&index=9 (15:16)
In this video, the Huffman Coding Algorithm is explained for binary coding (0,1).
To learn about the application of Huffman code for Ternary (0,1,2) and Quaternary (0,1,2,3)
coding, watch the following videos:
1
https://www.youtube.com/watch?v=0RXa1XpNufc&list=PLgwJf8NK-
2e5oBBXubqVMiPQPSNMF4Zgz&index=10 (10:50)
2
https://www.youtube.com/watch?v=bfty_nOuWyA&list=PLgwJf8NK-
2e5oBBXubqVMiPQPSNMF4Zgz&index=11 (11:21)
From all the videos on Huffman Coding Algorithms, pay attention to the calculation of
Entropy (H), Average length (L), Efficiency and Variance.
To illustrate the concept of FEC and clarify some of the terminology described above let us
use the following example:
Let us assume that we want to transmit the following sequence of information bits, I1 = [0 1
1 0]. Since I1 has four bits, then k = 4. Now, to protect the information bits I1 against channel
errors, let us add three (3) bits to it such that a new sequence is formed, which is C1 = [0 1 1
0 1 0 1]. The new sequence C1 is called the codeword and is of length n = 7. It is clear that
n-k = 3, which is the number of redundancy or parity bits (1 0 1) at the end of C1.
Note, for every information bits generated, Ix, different parity bits are generated and a new
corresponding codeword is Cx created. A collection of the Cx forms a codebook as shown
below:
𝐶1 0110101
𝐶2 1011110
𝐶 = 𝐶3 = 1100100 (8.19)
𝐶4 0101010
[𝐶5 ] [1010111]
We call the collection of codewords, C in (8.19), a code of size 5 (or cardinality 5) because it
has five (5) codewords. From here onwards a codebook will be written in the row vector
format, as shown below by re-writing the codebook in Equation (8.19):
C = [0 1 1 0 1 0 1, 1 0 1 1 1 1 0, 1 1 0 0 1 0 0, 0 1 0 1 0 1 0, 1 0 1 0 1 1 1]. (8.19)
To clarify the terminology used above, watch the following video. In this video, the basics of
Block Codes and parameters are explained.
https://www.youtube.com/watch?v=EJpxAQ_DUJ4&list=PLgwJf8NK-2e4CIG385dyc8-
IIgFPG1NwY
EXAMPLE 4
Calculate the code rate of the code given by the codebook in Equation (8.19).
Solution:
In the next subsection, linear block coding is introduced. Watch the following video clip to
have a better understanding of linear block coding.
https://www.youtube.com/watch?v=UVpRJa02Ys0&list=PLgwJf8NK-2e4CIG385dyc8-
IIgFPG1NwY&index=8 (16:31)
C = [0 0 0 0 0 0 0, 1 0 0 0 0 1 1, 0 1 0 0 1 0 1, 0 0 0 1 1 1 1,
0 0 1 1 0 0 1, 0 0 1 0 0 1 1, 0 0 1 0 1 1 0, 0 1 1 0 0 1 1,
0 1 1 0 1 1 1, 0 1 0 1 1 1 1, 1 0 0 0 0 1 1, 1 0 1 1 0 1 0,
1 1 1 0 0 0 0, 1 0 0 1 1 0 0, 1 1 0 0 1 1 0, 1 1 1 1 1 1 1],
(8.20)
Taking any codewords from the codebook in (8.20) and performing addition (⊕) will
produce another codeword in the codebook, hence this code is linear.
ACTIVITY 7
Show that the codebook in (8.20) is a linear code, by performing an addition between
codewords.
Therefore, the Hamming distance between these two sequences is 4. The minimum Hamming
distance, dmin is therefore found by calculating the Hamming distances, dH of all the
codewords in a codebook, and then picking the smallest Hamming distance as the minimum
Hamming distance (dmin).
ACTIVITY 8
Calculate the Hamming distances between all the codewords of the code in (8.20). Then find
the minimum Hamming distance of the code.
It is clear that a code must satisfy some criteria, therefore an important question arises, how
were the codewords in the codebook in (8.20) selected or generated? An answer to this
question leads to the procedure for generating codewords of a code. Linear block codes are
generated, and completely described using a generator, which we shall denote by G. Because
the generator of a code completely describes the code, it is also referred to as the code.
To illustrate how the generator of a code is used, the codebook in (8.20) is again used as an
example.
The (7, 4) Hamming code in (8.20) was generated using the generator below:
1000011
0100101
𝐺=[ ] (8.21)
0010110
0001111
Recall that given a (n, k) linear block code, k represents the length of the information
vector/sequence Ix, and n represents the length of the codeword sequence Cx. It is not a
coincidence that a (7, 4) Hamming code has a generator matrix that has 4 rows and 7 columns.
For each (n, k) linear block code, there exists a generator matrix with k rows and n columns.
Now returning to the encoding process, since the elements of the information sequences Ix
are taken from a binary alphabet/set (0, 1), then the total number of information
vectors/sequences is L = 2k, i.e. there are L = 2k possible binary sequences of length k, hence
the information sequences are I1, I2, I3, …. IL. To encode the L information sequences using
the generator matrix, G in (8.21), we simply multiply each Ix with G to produce a
corresponding codeword Cx:
Ix × G = Cx , (8.22)
where addition is done modulo 2 (i.e. using the exclusive OR operator “⊕”).
I1 = [0 0 0 0]
I2 = [0 0 0 1]
I3 = [0 0 1 0]
I4 = [0 0 1 1] (8.23)
Ix × G = Cx
1000011
0100101
[0 0 1 1] [ ] = [0 0 1 1 0 0 1] (8.24)
0010110
0001111
ACTIVITY 9
Generate all the 16 possible information sequences, by following the illustration in (8.23), and
encode all the information sequences as demonstrated in (8.24). Check if all the sequences
generated are similar to the codewords in (8.20).
When a codeword is received, the decoder will go through all the codewords in the table
(“look up the table”) for a codeword that is closest to the received codeword. A table like
Table 3, used for decoding is sometimes referred to as a “lookup table”. Recall how the
Hamming distance is calculated, every received codeword is compared to all the codewords
in the lookup table using the Hamming distance. The codeword in the lookup table that has
the smallest Hamming distance to the received codeword is taken as the correct codeword,
and its corresponding information sequence is the correct information that was sent by the
transmitter.
For an example, assume that C2 = [0 0 0 1 1 1 1] was transmitted and as it went through the
channel it got corrupted by noise and was received as 𝐶̃2 = [1 0 0 1 1 1 1]. For illustration
purposes, the following Hamming distances, dH are obtained with the three codewords in
Table 3:
dH (C1, 𝐶̃2 ) = 5
dH (C2, 𝐶̃2 ) = 1
dH (C16, 𝐶̃2 ) = 2 (8.24)
where dH (Ci, Cj) denotes the Hamming distance between codewords Ci and Cj. It is clear
from the Hamming distances in (8.24) that the codedword from the lookup table that results
in the smallest Hamming distance with 𝐶̃2 is C2, therefore the decoder will select information
sequence I2 as the transmitted information. Thus, the one-bit error has been corrected and
the correct information sequence that was transmitted has been identified.
Not all errors can be corrected, there is a limitation to the number of errors any FEC code
can correct. A block code’s error correction capabilities are directly related to its minimum
Hamming distance. The number of correctable errors for a block code, Ne is given as
𝑑min −1
𝑁𝑒 = , (8.25)
2
where dmin is the minimum Hamming distance.
The lookup table decoding method is not efficient especially for codes with large cardinalities.
There is another procedure for decoding received codewords without storing all possible
codewords in the memory of the receiver. Such a procedure will not be discussed in this study.
1.6 SUMMARY
This study unit discussed the concepts of information theory, entropy, and redundancy as
they are applied in digital communications. Information theory gives bounds on the possible
amount of information that can be transmitted in a channel, the information is measured in
number of bits. We used the concept of entropy to measure the amount of information in
bits that can be transmitted given source symbols generated with a given probability. The
discussion went on to source coding which is mainly about compression of information
transmitted by a source. The idea of source coding is to remove redundant information before
transmission. Lastly, the fundamentals of error control (correction) coding, using block codes
as an example, were discussed.
(b) Define the efficiency of a code and determine the efficiency of the
code devised in part (a). [100%]
(c) Construct another code for the source of part (a) and assign equal
length binary words irrespective of the occurrence probability of
the symbols. Calculate the efficiency of this source. [87.5%]
1.9 REFERENCES
Haykin, S., 2016. Communication systems. John Wiley & Sons.
Glover, I. and Grant, P.M., 2010. Digital communications. Pearson Education.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System
Technical, Journal, 27, 379-423