ECM3701 Study Unit 8

STUDY UNIT 8
Information theory concepts and coding
1.1 LEARNING OUTCOMES

After working through this study unit, you should be able to
• Discuss the concepts, information theory, entropy, and redundancy
• Discuss source coding.
• Discuss the fundamental of error control (correction) coding, using block codes as
an example.
1.2 INTRODUCTION
This study unit focuses mainly on encoding and decoding of digital information. One of a
good analogy to understand encoding is when an encoder is perceived as a person who is
developing or preparing and sending a message. The person must determine how the message
will be received by the audience and should also make necessary preparations for the message
to be received as it should be by the receiver. When we are in the process of putting our
thoughts into action, as human beings, we are encoding. After encoding, the message (which
could be a phone call, an email, a text message, a face-to-face meeting, or any other
communication tool) is sent through a ‘medium’. As the message is being transmitted,
anything could happen. It may get hijacked along the way or corrupted. Therefore, the
encoder should be designed to counteract all possible disturbances that may occur during
transmission. Some of the methods to counteract possible disturbances are encryption and
error correction techniques, some of which will be dealt with in this study unit. Upon arrival
of the message, the audience then ‘decodes’, or interprets, the message. In the context of this
analogy, decoding is the process of turning what has been communicated into thoughts. Have
this analogy in mind as you study this unit.
Figure 1.1 illustrates a simple communication process.

Communication
Encoding Decoding
Media
Noise
Figure 1.1: Simplified communication process (Haykin, 2016)
The communication system can be loosely summarised as follows. After formatting

continuous signals/information to digital form (for example, using PCM described
previously) the digital data can further go through the following encoding steps:
• Source coding (sometimes referred to as data compression): the number of symbols

required to transmit information are reduced for effective transmission. Examples
of compression formats are JPEG, MPEG, MP3.
• Encryption: avoids unauthorised access to information by coding messages using a
cipher.
• Error Control Coding (ECC)/also known as channel coding: encodes messages such
that at receiver, the messages may be decoding correctly even if errors occurred in
the channel during transmission. This encoding enables correction of errors in
messages.
This study will discuss all the three types of encoding processes listed above.
1.3 INFORMATION AND ENTROPY

The concept of entropy in information theory describes how much information there is in a
signal or event. Shannon introduced the idea of information entropy in his 1948 paper "A
Mathematical Theory of Communication".
ACTIVITY 1
Download Shannon’s paper from the Internet. The full citation of the paper is ‘Shannon, C.
E. (1948). A mathematical theory of communication. Bell System Technical, Journal, 27, 379-
423.’ Take your time to read through this classical paper.
Further discussion on Information Theory and Entropy

An intuitive understanding of information and entropy relates to the amount of uncertainty
about an event associated with a given probability distribution. As an example, consider a box
containing many coloured balls. If the balls are of different colours and no colour
predominates, then our uncertainty about the colour of a randomly drawn ball is maximal (at
its highest). On the other hand, if the box contains more red balls than any other colour, then
there is slightly less uncertainty about the result: the ball drawn from the box has more chances
of being red (if we were forced to place a bet, we would bet on a red ball). Telling someone
the colour of every new drawn ball provides them with more information in the first case
than it does in the second case, because there is more uncertainty about what might happen
in the first case than there is in the second. Intuitively, if we know the number of balls
remaining, and they are all of one color, then there is no uncertainty about what the next ball
drawn will be, and therefore there is no information content from drawing the ball. As a result,
the entropy of the "signal" (the sequence of balls drawn, as calculated from the probability
distribution) is higher in the first case than in the second. Shannon, in fact, defined entropy
as a measure of the average information content associated with a random outcome.
ACTIVITY 2
Watch the video clip to learn more about information basics, definition, uncertainty and the
properties of information:
https://www.youtube.com/watch?v=18E3NllObqg&list=PLgwJf8NK-
2e5oBBXubqVMiPQPSNMF4Zgz (16:57)
1 Sketch the basic block diagram of a communication system.
2 What is the relationship between probability and uncertainty in information theory?
3 Show, by means of calculation, that if uncertainty about the message is more, then
information is less of the following probabilities: P1 = 1/8 and P2 = 1/4.
4 If I1 is the information of message m1 and I2 is the information of message m2, show
that the combined information of m1 and m2 is qual to I1 + I2.
5 Show that if there are M = 2N equally likely messages, then the amount of information
carried by each message will be N bits.
1.3.1 Information
It has already been established that information relates to probability and uncertainty about an
event. Let P(message) denote the probability of a message, like-wise P(m) denotes the
probability of m. Table 1 shows messages with their probabilities and information content.
Table 1: Information Measures (Glover I. A. & Grant P. M., 2010).
The definition of information content for a symbol m should be such that it monotonically
decreases with increasing message probability, P(m), and it goes to zero for a probability of
unity. Another desirable property is that of additivity. If one were to communicate two
(independent) messages in sequence, the total information content should be equal to the sum
of the individual information contents of the two messages. We know that the total probability
of the composite message is the product of the two individual independent probabilities.
Therefore, the definition of information must be such that when probabilities are multiplied
information is added.
The required properties of an information measure are summarised in Table 1. The logarithm
operation clearly satisfies these requirements. We thus define the information content, Im, of
a message, m, as:
(8.1)
This definition satisfies the additivity requirement, the monotonicity requirement, and for P(m)
= 1, Im = 0. Note that this is true regardless of the base chosen for the logarithm. Base 2 is
usually chosen; however, the resulting quantity of information being measured in bits:
(8.2)
________________________________________________________________
EXAMPLE 1
Consider, initially, vocabularies of equiprobable message symbols represented by fixed length
binary codewords. Thus, a vocabulary size of four symbols is represented by the binary digit
pairs 00, 01, 10, 11. The binary word length and symbol probabilities for other vocabulary sizes
are shown in Table 2. The ASCII vocabulary contains 128 symbols and therefore uses a log2
128 = 7 digit (fixed length) binary codeword to represent each symbol.
Table 2: Word length and symbol probabilities (Glover I. A. & Grant P. M., 2010).
ASCII symbols are not equiprobable, however, and for this particular code each symbol does
not, therefore, have a selection probability of 1/128.
What will be the information content, in bits, of the following symbols, 2, 4, 8 and 128?
Solution:
(a) I2 = - log2 P(2) = - log2 (1/2) = 1 bit
(b) I4 = - log2 P(4) = - log2 (1/4) = 2 bits
(c) I8 = - log2 P(8) = - log2 (1/8) = 3 bits
(d) I128 = - log2 P(128) = - log2 (1/128) = 7 bits
________________________________________________________________
ACTIVITY 3
A card is selected at random from a deck of paying cards. Suppose you have been told that it
is blue in colour.
1 Calculate the amount of information you have received. [1 bit]
2 How much more information you needed to completely specify the card? [4.7 bits]
1.3.2 Entropy of a binary source

Entropy (H) is defined as the average amount of information conveyed per symbol. For an
alphabet of size 2 (binary alphabet) and assuming that symbols are statistically independent:
(8.3)
For the two-symbol alphabet (0, 1) if we let P(1) = p then P(0) = 1 − p and:
(8.4)
The entropy is maximised when the symbols are equiprobable as shown in Figure 1. The
entropy definition of equation (8.3) holds for all alphabet sizes. Note that, in the binary case,
as either of the two messages becomes more likely, the entropy decreases. When either message
has probability 1, the entropy goes to zero. This is reasonable since, at these points, the
outcome of the transmission is certain. Thus, if P(0) = 1, we know the symbol 0 will be sent
repeatedly. If P(0) = 0, we know the symbol 1 will be sent repeatedly. In these two cases, no
information is conveyed by transmitting the symbols.
Figure 1: Entropy for binary data transmission versus the selection probability, p, of a
digital 1 (Glover I. A. & Grant P. M., 2010).
For the ASCII alphabet, source entropy would be given by H = −log2 (1/128) =
7 bit/symbol if all the symbols were both equiprobable and statistically independent. In
practice H is less than this, i.e.:
(8.5)
since the symbols are neither, making the code less than 100% efficient. Entropy thus indicates
the minimum number of binary digits required per symbol (averaged over a long sequence of
symbols).
ACTIVITY 4
Watch the following video clip to learn more about Entropy basics, Definition and Properties:
https://www.youtube.com/watch?v=BBdilQkH3gE&list=PLgwJf8NK-
2e5oBBXubqVMiPQPSNMF4Zgz&index=3 (7:54)
After watching the video, answer the following questions:
1 Prove that entropy (H) is zero when the event is certain.
2 Prove that when 𝑃𝑘 = 𝑚 for all ‘m’ symbols, then the symbols are equiprobable, so
𝐻𝑚𝑎𝑥 = 𝑙𝑜𝑔2 𝑚.
1.3.3 Conditional entropy and redundancy

For sources in which each symbol selected is not statistically independent from all previous
symbols (i.e. sources with memory) equation (8.3) is insufficiently general to give the entropy
correctly. In this case the joint and conditional statistics (section 3.2.1) of symbol sequences
must be considered. A source with a memory of one symbol, for example, has an entropy
given by:
(8.6)
where P(j, i) is the probability of the source selecting i and j and P(j|i) is the probability that
the source will select j given that it has previously selected i . Bayes’s theorem, equation (8.6)
can be re-expressed as:
(8.7)
For independent symbols P(j|i) = P(j) and equation (8.7) reduces to equation (8.3). The effect
of having dependency between symbols is to increase the probability of selecting some
symbols at the expense of others given a particular symbol history. This reduces the average
information conveyed by the symbols, which is reflected in a reduced entropy. The difference
between the actual entropy of a source and the (maximum) entropy, Hmax, the source could
have if its symbols were independent and equiprobable is called the redundancy of the source.
For an M-symbol alphabet redundancy, R, is therefore given by:
(8.8)
_______________________________________________________________
EXAMPLE 2
Find the entropy, redundancy and information rate of a four-symbol source (A, B, C, D) with
a baud rate of 1024 symbol/s and symbol selection probabilities of 0.5, 0.2, 0.2 and 0.1,
respectively under the following conditions:
(i) The source is memoryless (i.e. the symbols are statistically independent).
(ii) The source has a one-symbol memory such that no two consecutively selected
symbols can be the same. (The long-term relative frequencies of the symbols
remain unchanged, however.)
Solution:
(i)
where Ri is the information rate and Rs is the symbol rate.
(ii) The appropriate formula to apply to find the entropy of a source with one-symbol
memory is equation (8.7). First, however, we must find the conditional probabilities
which the formula contains. If no two consecutive symbols can be the same, then:
Since the (unconditional) probability of A is unchanged, P(A) = 0.5, then every, and only every,
alternate symbol must be A, P(A|A) = P(A|A) = 1.0, where A represents not A. Furthermore,
if every alternate symbol is A then no two non- A symbols can occur consecutively, P(A|A)
= 0. Writing the above three probabilities explicitly:
Since the (unconditional) probabilities of B, C and D are to remain unchanged, i.e.:
the conditional probability P(B|A) must satisfy:
Similarly:
We now have numerical values for all the conditional probabilities which can be substituted
into equation (8.7):
Note that here the symbol A carries no information at all since its occurrences are entirely
predictable.
ACTIVITY 5
Watch the following video clip to learn more about Entropy, source efficiency, redundancy &
information rate.
https://www.youtube.com/watch?v=Wc0oB04JOkI&list=PLgwJf8NK-
Answer the following questions:
1 For a discrete, memoryless source there are three symbols with probabilities P1 = ɑ and
P2 = P3. Calculate the entropy of the source.
2 Show that the entropy (H) of the source with the probability distribution shown in Table
1 is equal to [2 – (2 + n)/2^n]. Hint: Use Taylor series expansion.
Table 1
S S1 S2 S3 … Sn
P 1/2 1/4 1/8 … 1/2n
3 The source emits three messages with probabilities P1 = 0.7, P2 = 0.2 and P3 = 0.1.
Calculate,
(a) Source entropy [1.157 bits/message]
(b) Maximum entropy. [1.585 bits/message]
(c) Source efficiency [0.73]
(d) Redundancy [0.27]
4 A discrete source emits one of six symbols once every m seconds. The symbol
probability are ½, ¼, 1/8, 1/16, 1/32 and 1/32. Find the source entropy and
information rate. [R = 1937.5 bits/seconds]
1.4 SOURCE CODING

1.4.1 Code efficiency
Recall the definition of entropy (equation (8.3)) for a source with statistically independent
symbols:
(8.9)
The maximum possible entropy, Hmax, of this source would be realised if all symbols were
equiprobable, P(m) = 1/M, i.e.:
(8.10)
code efficiency can therefore be defined as:
(8.11)
If source symbols are coded into binary words, then there is a useful alternative interpretation
of ηcode. For a set of symbols represented by binary codewords with lengths lm (binary) digits,
an overall code length, L, can be defined as the average codeword length, i.e.:
(8.12)
The code efficiency can then be found from:

(8.13)
Equations (8.12) and (8.13) are seen to be entirely consistent when it is remembered that the
maximum information conveyable per L digit binary codeword is given by:
(8.14)
________________________________________________________________
EXAMPLE 3
A scanner converts a black and white document, line by line, into binary data for transmission.
The scanner produces source data comprising symbols representing runs of up to six similar
image pixel elements with the probabilities as shown below:
Determine the average length of a run (in pixels) and the corresponding effective information
rate for this source when the scanner is traversing 1000 pixel/s.
Solution:
Average length:
At 1000 pixel/s scan rate we generate 1000/2.69 = 372 symbol/s. Thus, the source
information rate is 2.29 × 372 = 852 bit/s.
________________________________________________________________
In general, source coding is focused on finding a more efficient code which represents the
same information using fewer digits on average. This has led to the development of source
codes that use different lengths of codeword for different symbols. The problem with such
variable length codes is in recognising the start and end of the symbols.
1.4.2 Decoding variable length codewords

The following properties need to be considered when attempting to decode variable length
codewords:
(1) Unique decoding.
This is essential if the received message is to have only a single possible meaning.
Consider an M = 4 symbol alphabet with symbols represented by binary digits as follows:
A=0
B = 01
C = 11
D = 00
Figure 2: Entropy for binary data transmission versus the selection probability, p, of a
digital 1 (Glover I. A. & Grant P. M., 2010).
If we receive the codeword 0011 it is not known whether the transmission was D, C or A, A,
C. This example is not, therefore, uniquely decodable.
(2) Instantaneous decoding.
Consider now an M = 4 symbol alphabet, with the following binary representation:
A=0
B = 10
C = 110
D = 111
This code can be instantaneously decoded using the decision tree shown in Figure 2 since no
complete codeword is a prefix of a larger codeword. This is in contrast to the previous example
where A is a prefix of both B and D. The latter example is also a “comma code” as the symbol
zero indicates the end of a codeword except for the all-ones word whose length is known.
Note that we are restricted in the number of available codewords with small numbers of bits
to ensure we achieve the desired decoding properties.
Using the representation:
A=0
B = 01
C = 011
D = 111
the code is identical to the example just given but the bits are time reversed. It is thus still
uniquely decodable but no longer instantaneous, since early codewords are now prefixes of
later ones.
1.4.3 Variable length coding

Assume an M = 8 symbol source A, ..., H having probabilities of symbol occurrence (the H in
the table should not to be confused with entropy):
The source entropy is given by:
(8.15)
and, at a symbol rate of 1 symbol/s, the information rate is 2.55 bit/s. The maximum entropy
of an eight-symbol source is log2 8 = 3 bit/symbol and the source efficiency is therefore given
by:
(8.16)
If the symbols are each allocated 3 bits, comprising all the binary patterns between 000 and
111, the coding efficiency will remain unchanged at 85%.
Next, a famous variable length source coding algorithm called Huffman coding is discussed.
1.4.4 Huffman coding

The Huffman coding algorithm comprises two steps – reduction and splitting. These steps can
be summarised by the following instructions:
(1) Reduction: List the symbols in descending order of probability. Reduce the two least
probable symbols to one symbol with probability equal to their combined probability. Reorder
in descending order of probability at each stage, Figure 3. Repeat the reduction step until only
two symbols remain.
(2) Splitting: Assign 0 and 1 to the two final symbols and work backwards, Figure 4. Expand
or lengthen the code to cope with each successive split and, at each stage, distinguish between
the two split symbols by adding another 0 and 1 respectively to the codeword.
The result of Huffman encoding the symbols A, … , H in the previous example (Figures 3 and
4) is to allocate the symbols codewords as follows:
Figure 3: Huffman coding of an eight-symbol alphabet – reduction step (Glover I. A.
& Grant P. M., 2010).
Figure 4: Huffman coding – allocation of codewords to the eight symbols (Glover I. A.

& Grant P. M., 2010).
The code length is now given by equation (8.12) as:
(8.17)
and the code efficiency, given by equation (8.13), is:
(8.18)
The 85% efficiency without coding would have been improved to 96.6% using Shannon–
Fano coding but Huffman coding at 97.7% is even better. The maximum efficiency is
obtained when symbol probabilities are all negative, integer, powers of two, i.e. 1/2n. Note
that the Huffman codes are formulated to minimise the average codeword length. They do
not necessarily possess error detection properties but are uniquely, and instantaneously,
decodable. Error detection and correction codes are discussed next.
ACTIVITY 6
Take your time to watch the video clip form the following link:
https://www.youtube.com/watch?v=HmBH30NrM7c&list=PLgwJf8NK-
In this video, the Huffman Coding Algorithm is explained for binary coding (0,1).
To learn about the application of Huffman code for Ternary (0,1,2) and Quaternary (0,1,2,3)
coding, watch the following videos:
1
https://www.youtube.com/watch?v=0RXa1XpNufc&list=PLgwJf8NK-
2
https://www.youtube.com/watch?v=bfty_nOuWyA&list=PLgwJf8NK-
From all the videos on Huffman Coding Algorithms, pay attention to the calculation of
Entropy (H), Average length (L), Efficiency and Variance.
1.5 ERROR CONTROL (CORRECTING) CODING – ECC

Error control coding (ECC) describes techniques used in digital data transmission to detect
symbol errors and request for retransmission or correct the data received in error. Symbol
errors occur due to noise in the communication channel. When. This section will be focused
on the discussion of the type of ECC that corrects errors instead of requesting for
retransmission. This type of ECC is called Forward Error Correction (FEC). There are several
FEC coding algorithms that exists, however, the discussion in this section will be limited to
block codes. In FEC, the idea is, given a sequence of k information symbols (bits in this case)
we add extra symbols such that the total number of symbols is now n. It is obvious to see
that in order to increase the number of symbols from k to n, we need to add extra (n-k)
symbols to the original k information symbols. The extra (n-k) symbols are called redundancy
or parity. The new sequence of length n is called a codeword. A collection of codewords
generated by a particular FEC algorithm is called a codebook or simply a code. The code has
a code rate, R = k/n, where R is normally less than one (R < 1).
To illustrate the concept of FEC and clarify some of the terminology described above let us
use the following example:
Let us assume that we want to transmit the following sequence of information bits, I1 = [0 1
1 0]. Since I1 has four bits, then k = 4. Now, to protect the information bits I1 against channel
errors, let us add three (3) bits to it such that a new sequence is formed, which is C1 = [0 1 1
0 1 0 1]. The new sequence C1 is called the codeword and is of length n = 7. It is clear that
n-k = 3, which is the number of redundancy or parity bits (1 0 1) at the end of C1.
Note, for every information bits generated, Ix, different parity bits are generated and a new
corresponding codeword is Cx created. A collection of the Cx forms a codebook as shown
below:
𝐶1 0110101
𝐶2 1011110
𝐶 = 𝐶3 = 1100100 (8.19)
𝐶4 0101010
[𝐶5 ] [1010111]
We call the collection of codewords, C in (8.19), a code of size 5 (or cardinality 5) because it
has five (5) codewords. From here onwards a codebook will be written in the row vector
format, as shown below by re-writing the codebook in Equation (8.19):
C = [0 1 1 0 1 0 1, 1 0 1 1 1 1 0, 1 1 0 0 1 0 0, 0 1 0 1 0 1 0, 1 0 1 0 1 1 1]. (8.19)
To clarify the terminology used above, watch the following video. In this video, the basics of
Block Codes and parameters are explained.
https://www.youtube.com/watch?v=EJpxAQ_DUJ4&list=PLgwJf8NK-2e4CIG385dyc8-
IIgFPG1NwY
EXAMPLE 4
Calculate the code rate of the code given by the codebook in Equation (8.19).
Solution:
The length of the codewords is n = 7.

The number of codewords (cardinality) of the codebook is |C|= 5.
Then each codeword can transmit
k = log2(|C|) = log2(5) = 2.32 information bits.
Therefore, the code rate, R is:

R = k/n = 2.32/7 = 0.33.
________________________________________________________________
In the next subsection, linear block coding is introduced. Watch the following video clip to
have a better understanding of linear block coding.
https://www.youtube.com/watch?v=UVpRJa02Ys0&list=PLgwJf8NK-2e4CIG385dyc8-
IIgFPG1NwY&index=8 (16:31)
1.4.3 Linear block coding

A linear code is an error-correcting code for which any linear combination of codewords is
also a codeword. Linear combination of codewords refers to the addition of codewords. It
can be verified that the code in Equation (8.19) is not linear because it does not satisfy the
condition that “any linear combination of codewords in C should produce a codeword in C”.
As an example, by exclusive OR addition:
C1 ⊕ C2 = 0110101 ⊕ 1011110 = 1101011 (not a codeword in C),
Where “⊕” indicates exclusive OR operation, which is performed element-wise. Therefore,

code C in Equation (8.19) is not a linear block code.
Consider the code in (8.20), with codewords of 7 bits long:
C = [0 0 0 0 0 0 0, 1 0 0 0 0 1 1, 0 1 0 0 1 0 1, 0 0 0 1 1 1 1,
0 0 1 1 0 0 1, 0 0 1 0 0 1 1, 0 0 1 0 1 1 0, 0 1 1 0 0 1 1,
0 1 1 0 1 1 1, 0 1 0 1 1 1 1, 1 0 0 0 0 1 1, 1 0 1 1 0 1 0,
1 1 1 0 0 0 0, 1 0 0 1 1 0 0, 1 1 0 0 1 1 0, 1 1 1 1 1 1 1],
(8.20)
Taking any codewords from the codebook in (8.20) and performing addition (⊕) will
produce another codeword in the codebook, hence this code is linear.
ACTIVITY 7
Show that the codebook in (8.20) is a linear code, by performing an addition between
codewords.
More information to note about the code in (8.20):

• It is a famous linear block code called the (7, 4) Hamming code. By convention,
block codes are described using the parameters, n and k, and they are called “(n, k)
Codes”.
• The cardinality (number of codewords) of the code is 16 (= 24). It is not a
coincidence that the number of codewords is 24 (= 2k). In fact, the cardinality of
binary (n, k) block codes is given by 2k.
• The code has a minimum Hamming distance, dmin = 3. The Hamming distance
is defined as: given any two sequences of equal length, and comparing them element-
wise, the number of elements that are not the same is the Hamming distance, dH.
For example, the two sequences below differ in four (4) elements:
1000011
0 1 0 0 1 0 1,
Therefore, the Hamming distance between these two sequences is 4. The minimum Hamming
distance, dmin is therefore found by calculating the Hamming distances, dH of all the
codewords in a codebook, and then picking the smallest Hamming distance as the minimum
Hamming distance (dmin).
ACTIVITY 8
Calculate the Hamming distances between all the codewords of the code in (8.20). Then find
the minimum Hamming distance of the code.
It is clear that a code must satisfy some criteria, therefore an important question arises, how
were the codewords in the codebook in (8.20) selected or generated? An answer to this
question leads to the procedure for generating codewords of a code. Linear block codes are
generated, and completely described using a generator, which we shall denote by G. Because
the generator of a code completely describes the code, it is also referred to as the code.
To illustrate how the generator of a code is used, the codebook in (8.20) is again used as an
example.
The (7, 4) Hamming code in (8.20) was generated using the generator below:
1000011
0100101
𝐺=[ ] (8.21)
0010110
0001111
Recall that given a (n, k) linear block code, k represents the length of the information
vector/sequence Ix, and n represents the length of the codeword sequence Cx. It is not a
coincidence that a (7, 4) Hamming code has a generator matrix that has 4 rows and 7 columns.
For each (n, k) linear block code, there exists a generator matrix with k rows and n columns.
Now returning to the encoding process, since the elements of the information sequences Ix
are taken from a binary alphabet/set (0, 1), then the total number of information
vectors/sequences is L = 2k, i.e. there are L = 2k possible binary sequences of length k, hence
the information sequences are I1, I2, I3, …. IL. To encode the L information sequences using
the generator matrix, G in (8.21), we simply multiply each Ix with G to produce a
corresponding codeword Cx:
Ix × G = Cx , (8.22)
where addition is done modulo 2 (i.e. using the exclusive OR operator “⊕”).
To encode information using the generator matrix in (8.21), we generate the L = 24 = 16

information sequences, some of which are:
I1 = [0 0 0 0]
I2 = [0 0 0 1]
I3 = [0 0 1 0]
I4 = [0 0 1 1] (8.23)
To demonstrate the encoding process, we take information sequence I4 = [0 0 1 1] in (8.23)

and encode it using G, to produce the corresponding codeword C4 as done in (8.22):
Ix × G = Cx
1000011
0100101
[0 0 1 1] [ ] = [0 0 1 1 0 0 1] (8.24)
0010110
0001111
ACTIVITY 9
Generate all the 16 possible information sequences, by following the illustration in (8.23), and
encode all the information sequences as demonstrated in (8.24). Check if all the sequences
generated are similar to the codewords in (8.20).
1.4.4 Decoding linear block codes

At the receiver of a communication system, a codeword is received, which may have
corrupted elements due to channel noise. The receiver performs a decoding process on every
received codeword in order to produce the corresponding information sequence. A simple
decoding algorithm for block codes is to generate all the possible information sequences and
all possible corresponding codewords using the generator matrix, and then store these two
types of sequences in a table for future reference as shown in Table 3.
Table 3: All possible information sequences and corresponding codewords.
Information sequence Codeword

I1 = [0 0 0 0] C1 = [0 0 0 0 0 0 0]
I2 = [0 0 0 1] C2 = [0 0 0 1 1 1 1]
. .
. .
. .
I16 = [1 1 1 1] C16 = [1 1 1 1 1 1 1]
When a codeword is received, the decoder will go through all the codewords in the table
(“look up the table”) for a codeword that is closest to the received codeword. A table like
Table 3, used for decoding is sometimes referred to as a “lookup table”. Recall how the
Hamming distance is calculated, every received codeword is compared to all the codewords
in the lookup table using the Hamming distance. The codeword in the lookup table that has
the smallest Hamming distance to the received codeword is taken as the correct codeword,
and its corresponding information sequence is the correct information that was sent by the
transmitter.
For an example, assume that C2 = [0 0 0 1 1 1 1] was transmitted and as it went through the
channel it got corrupted by noise and was received as 𝐶̃2 = [1 0 0 1 1 1 1]. For illustration
purposes, the following Hamming distances, dH are obtained with the three codewords in
Table 3:
dH (C1, 𝐶̃2 ) = 5
dH (C2, 𝐶̃2 ) = 1
dH (C16, 𝐶̃2 ) = 2 (8.24)
where dH (Ci, Cj) denotes the Hamming distance between codewords Ci and Cj. It is clear
from the Hamming distances in (8.24) that the codedword from the lookup table that results
in the smallest Hamming distance with 𝐶̃2 is C2, therefore the decoder will select information
sequence I2 as the transmitted information. Thus, the one-bit error has been corrected and
the correct information sequence that was transmitted has been identified.
Not all errors can be corrected, there is a limitation to the number of errors any FEC code
can correct. A block code’s error correction capabilities are directly related to its minimum
Hamming distance. The number of correctable errors for a block code, Ne is given as
𝑑min −1
𝑁𝑒 = , (8.25)
2
where dmin is the minimum Hamming distance.
The lookup table decoding method is not efficient especially for codes with large cardinalities.
There is another procedure for decoding received codewords without storing all possible
codewords in the memory of the receiver. Such a procedure will not be discussed in this study.
1.6 SUMMARY
This study unit discussed the concepts of information theory, entropy, and redundancy as
they are applied in digital communications. Information theory gives bounds on the possible
amount of information that can be transmitted in a channel, the information is measured in
number of bits. We used the concept of entropy to measure the amount of information in
bits that can be transmitted given source symbols generated with a given probability. The
discussion went on to source coding which is mainly about compression of information
transmitted by a source. The idea of source coding is to remove redundant information before
transmission. Lastly, the fundamentals of error control (correction) coding, using block codes
as an example, were discussed.
1.7 FURTHER READING

To obtain more information on error control (correction) coding, using block codes,
convolutional, cyclic codes and alternative decoding methods, the reader is referred to
Chapter 10 of the prescribed book (Glover I. A. & Grant P. M., 2010).
1.8 MYUNISA ACTIVITIES

1. What do you understand by the phrase, ‘statistically, independent, symbol
sources’?
2. (a) Consider a source having an M = 3 symbol alphabet where P(x1)

= ½, P(x2) = ¼ and symbols are statistically independent. Calculate
the information conveyed by the recipient of the symbol x1. Repeat
x2 and x3. [Ix1 = 1 bit; Ix2 = Ix3 = 2 bits]
(b) Consider a source whose statistically independent symbols consist
of all possible binary sequences of length k. Assume all symbols are
equiprobable. How much information is conveyed on receipts of
any symbol? [k bits]
(c) Determine the information conveyed by the specific message x1 x3
x2 x1 when it emanates from each of the following, statistically
independent, symbol sources:
(i) M = 4; P(x1) = ½; P(x2) = ¼; P(x3) = P(x4) = 1/8. [7 bits]
(ii) M = 4; P(x1) = P(x2) = P(x3) = P(x4) = ¼. [8 bits]
3. (a) Calculate the entropy of the sources in Problem 2 (a). [1.5

bits/symbol]
(b) Calculate the entropy of the sources in Problem 2 (c). [1.75
bits/symbol; 2 bits/symbol]
(c) What is the maximum entropy of an eight-symbol source and under
what conditions is this situation achieved? Calculate the entropy
and redundancy if P(x1) = ½, P(xi) = 1/8 for I 2, 3, 4 and P(xi) =
1/32 for I = 5, 6, 7, 8? [3 bits/symbol; 2.25 bit/symbol]
4. Calculate the entropy, redundancy and code efficiency of a three-symbol

source A, B, C, if the following statistical dependence exists, between
symbols. There is 20% chance of each symbol being succeeded by the next
symbol in the cyclical sequence A B C and a 30% chance of each symbol
being succeeded by the previous symbol in this sequence. [1.485
bit/symbol; 0.1 bit/symbol; 93.7%]
5. An information source contains 100 different, statistically independent,

equiprobable symbols. Calculate the maximum code efficiency, if, for
transmission, all the symbols are represented by binary codewords of equal
length. [7-bit words and 94.9%]
6. (a) Apply Huffman’s algorithm to reduce an optimal code for

transmitting the source defined in Problem 2 (c) over a binary
channel. Is your code unique? [0, 10, 110,111, Yes]
(b) Define the efficiency of a code and determine the efficiency of the
code devised in part (a). [100%]
(c) Construct another code for the source of part (a) and assign equal
length binary words irrespective of the occurrence probability of
the symbols. Calculate the efficiency of this source. [87.5%]
7. Assume a systematic (n, k) block code where n = 4, k = 2 and the four

codewords are 0000, 0101, 1011, 1110. Construct a maximum likelihood
decoding table for this code.
8. Consider the systematic generator matrix, G of the Hamming code given

below, and answer the following questions:
(a) What will be the length of the codewords, n generated by G?

(b) What will be the length of the information bit sequences, k?
(c) What is the code rate, R?
(d) What is the cardinality of the code generate by G?
(e) Generate all the possible codewords using G.
(f) What is the minimum Hamming distance, dmin of the code?
(g) How many bit errors can the code, created by G, correct?
1.9 REFERENCES
Haykin, S., 2016. Communication systems. John Wiley & Sons.
Glover, I. and Grant, P.M., 2010. Digital communications. Pearson Education.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System
Technical, Journal, 27, 379-423

ECM3701 Study Unit 8

Uploaded by

Copyright:

Available Formats

You might also like

ECM3701 Study Unit 8

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECM3701 Study Unit 8

Uploaded by

Copyright:

Available Formats

STUDY UNIT 8

Information theory concepts and coding

1.1 LEARNING OUTCOMES

Figure 1.1 illustrates a simple communication process.

Figure 1.1: Simplified communication process (Haykin, 2016)

The communication system can be loosely summarised as follows. After formatting

• Source coding (sometimes referred to as data compression): the number of symbols

1.3 INFORMATION AND ENTROPY

Further discussion on Information Theory and Entropy

Table 1: Information Measures (Glover I. A. & Grant P. M., 2010).

1.3.2 Entropy of a binary source

1.3.3 Conditional entropy and redundancy

Since the (unconditional) probabilities of B, C and D are to remain unchanged, i.e.:

the conditional probability P(B|A) must satisfy:

1.4 SOURCE CODING

code efficiency can therefore be defined as:

The code efficiency can then be found from:

1.4.2 Decoding variable length codewords

1.4.3 Variable length coding

The source entropy is given by:

1.4.4 Huffman coding

Figure 4: Huffman coding – allocation of codewords to the eight symbols (Glover I. A.

The code length is now given by equation (8.12) as:

and the code efficiency, given by equation (8.13), is:

1.5 ERROR CONTROL (CORRECTING) CODING – ECC

The length of the codewords is n = 7.

Therefore, the code rate, R is:

1.4.3 Linear block coding

C1 ⊕ C2 = 0110101 ⊕ 1011110 = 1101011 (not a codeword in C),

Where “⊕” indicates exclusive OR operation, which is performed element-wise. Therefore,

More information to note about the code in (8.20):

To encode information using the generator matrix in (8.21), we generate the L = 24 = 16

To demonstrate the encoding process, we take information sequence I4 = [0 0 1 1] in (8.23)

1.4.4 Decoding linear block codes

Table 3: All possible information sequences and corresponding codewords.

Information sequence Codeword

1.7 FURTHER READING

1.8 MYUNISA ACTIVITIES

2. (a) Consider a source having an M = 3 symbol alphabet where P(x1)

3. (a) Calculate the entropy of the sources in Problem 2 (a). [1.5

4. Calculate the entropy, redundancy and code efficiency of a three-symbol

5. An information source contains 100 different, statistically independent,

6. (a) Apply Huffman’s algorithm to reduce an optimal code for

7. Assume a systematic (n, k) block code where n = 4, k = 2 and the four

8. Consider the systematic generator matrix, G of the Hamming code given

(a) What will be the length of the codewords, n generated by G?

You might also like