Information and Coding Theory

IT18303
INFORMATION AND
CODING THEORY
COMMUNICATION SYSTEM
OBJECTIVES
• To understand encoding and decoding of digital

data streams.
• To have a complete understanding of error–
control coding.
• To have a detailed knowledge of compression and
decompression techniques.
Syllabus
Syllabus
OUTCOMES
• To generate code words for different media elements.

• To derive entropy, mutual information and channel capacity
for all types of channels.
• To analyze the performance of digital communication
system by evaluating the probability of error for different
error correcting codes.
Unit-1
INFORMATION ENTROPY
FUNDAMENTALS
Information Theory
• Branch of Probability applied to communication systems
• Resolves following points
1)What rate source generates information

2)Complexity if signal is not compressed
3)Over noisy channel what should be transmission rate for reliable
communication
The answer is
Entrophy (average information)< Capacity of channel
Uncertainty
• Randomness
• Lack of Predictability in message
• (Eg;-)probability=1 or 0-event is possible or
not possible
• probability=0.5-event is uncertain
More Information Needed for certainty

INFORMATION
• Let us consider communication system with messages
m1,m2,m3……with probability p1,p2,p3…… then
amount of information transmitted through message
mk with pk
ENTROPY
ENTROPY
• The entropy of a source is defined as “the source which produces
average information per individual message or symbol in a particular
interval”.
• In sequence of L messages, m1 message occurs with probability given as
• The amount of information in message m1 occurring once is given as
• The total amount of information

Cont…
• Thus the total amount of information due to L
Cont…
PROPERTIES OF ENTROPY
• The entropy of a discrete memoryless channel
source
• Property 1
1. Entropy is zero, if the event is sure or its impossible
Cont…
Property 2
2. Entropy H= log2 K when the symbols are equally likely for K
symbols PK= 1/K
Cont…
Property 3
3. Maximum upper bound on entropy is
Cont…
Cont…
Cont..
RATE OF INFORMATION
• RATE OF INFORMATION
* SOURCECODING THEOREM
* CHANNEL CODING THEOREM
SOURCE CODING
• An important problem in communication system is the
efficient representation of data generated by a source, which
can be achieved by source encoding (or) source coding
process
• The device which performs source encoding is called source
encoder
EFFICIENCY OF SOURCE ENCODER
• What are the requirements that a source encoder

should satisfy for it to be efficient
1)Codewords should be binary in nature
2)The Source code should be unique for every
single message
NOISELESS CODING THEOREM
Shanons information theory for a discrete memoryless
sources and channels involves three theorems.
(i) Shanons first theorem (or) Source coding theorem

(ii) Shanons second theorem (or) channel coding theorem (or)
shanons theorem on channel capacity
(iii) Shanons third theorem (or) information capacity theorem
(or) shanon hartley theorem.
SHANONS FIRST THEOREM
• This theorem is also called as source coding theorem. Consider a discrete
memoryless source
Encoding process
The average code word length L is given as
The coding efficiency of the source encoder is

STATEMENT
• Shanon first theorem is stated as “ Given a discrete memoryless source of
entropy H, the average codeword length L for any distortionless source
encoding is bounded as
L>= H
• According to source coding theorem the entropy H represents as the
fundamental limit on the average number of bits per source symbol
necessary to represent a discrete memoryless source
• It can be made as small as, but not smaller than the entropy H thus, with
Lmin=H, then η is represented as
η= H/ L
SHANONS SECOND THEOREM
• Why channel coding?
• Goal of channel coding
• Process
Cont…
• Approach
The code rate is given as
Suppose a discrete memoryless source alphabet S, Entropy H(S) bits per

source symbol and source emits and delivers symbols for every Ts seconds
STATEMENT
Two parts
SHANONS THIRD THEOREM
The channel capacity of a white band limited
gaussian channel is
Signal power
Noise power
* SHANON FANO
CODING
* HUFFMAN CODING
SOURCE CODING
The various source coding algorithms for data
compaction are
• Prefix coding
• Shanon Fano coding
• Huffman coding
• Lempel Ziv coding
Prefix coding
.
SHANON FANO CODING
• If the probablity of occurance of all the messages
are not equally likely, then average information or
entropy is reduced which in turn reduces
information rate (R)
• This can be solved by coding the message with
different number of bits according to their
occurance
• High probable-less no of bits and less probable-
more no of bits
Problem- Shanon fano coding
.
Cont…
For calculating entropy
Cont…
• For calculating number of messages and
efficiency
HUFFMAN CODING
• Huffman coding also uses the same principle as shanon fano
coding i.e assigning different number of binary digits to the
messages according to their probablities of occurance
• This coding method leads to the lowest possible value of L for

a given message sequence M resulting in a max efficiency
• Its also known as minimum redundancy code or optimum

code
Huffman coding- Problem
.
Cont…
Cont…
.
DIS-ADV OF HUFFMAN CODING
• It requires probabilities of source symbols
• It assigns variable length codes for symbols

hence its not suitable for synchronous
transmission
MUTUAL INFORMATION
MUTUAL INFORMATION
• The difference between these two values is
called mutual information
PROPERTIES OF MUTUAL INFORMATION
1. The mutual info of channel is symmetric
2. The mutual information is always non negative
3. The mutual information of a channel is related to the joint

entropy of the channel input and the channel output by
CHANNEL CAPACITY
Channel Capacity
CHANNEL CAPACITY
• What is channel capacity?
• It must satisfy two constraints

Cont…
• Transmission efficiency
• Redundancy
TYPES OF CHANNEL
• NOISELESS CHANNEL
• NOISY CHANNEL WITH NON OVER LAPPING
OUTPUT
• SYMMETRIC CHANNEL
• NOISY CHANNEL (OR) USELESS CHANNEL
Binary Symmetric Channel
Calculation of P(Y/X)
H(Y/X)
To Find H(y)
Channel capacity with rs=1 sec
Binary Erasure Channel(BEC)
Error Control Coding
Introduction
• Error Control Coding (ECC)
– Extra bits are added to the data at the transmitter
(redundancy) to permit error detection or
correction at the receiver
– Done to prevent the output of erroneous bits
despite noise and other imperfections in the
channel
– The positions of the error control coding and
decoding are shown in the transmission model
Transmission Model
Error Modulator
Digital Source Line X(w)
Control (Transmit
Source Encoder Coding
Coding Filter, etc)
Hc(w) Channel
Transmitter
N(w) Noise
+
Error Demod
Digital Source Line
Control (Receive
Sink Decoder Decoding Y(w)
Decoding Filter, etc)
Receiver
Error Models
• Binary Symmetric Memoryless Channel
– Assumes transmitted symbols are binary
– Errors affect ‘0’s and ‘1’s with equal probability
(i.e., symmetric)
– Errors occur randomly and are independent
from bit to bit (memoryless)
1-p
0 0 p is the probability of bit error
p
or the Bit Error Rate (BER) of
IN OUT the channel
p
1 1
1-p
Error Models
• Many other types
• Burst errors, i.e., contiguous bursts of bit
errors
– output from DFE (error propagation)
– common in radio channels
– Insertion, deletion and transposition errors
• We will consider mainly random errors
Error Control Techniques
• Error detection in a block of data
– Can then request a retransmission, known as
automatic repeat request (ARQ) for sensitive
data
– Appropriate for
• Low delay channels
• Channels with a return path
– Not appropriate for delay sensitive data, e.g.,
real time speech and data
Error Control Techniques
• Forward Error Correction (FEC)
– Coding designed so that errors can be corrected at
the receiver
– Appropriate for delay sensitive and one-way
transmission (e.g., broadcast TV) of data
– Two main types, namely block codes and
convolutional codes. We will only look at block
codes
Block Codes
• We will consider only binary data
• Data is grouped into blocks of length k bits
(dataword)
• Each dataword is coded into blocks of length n
bits (codeword), where in general n>k
• This is known as an (n,k) block code
Block Codes
• A vector notation is used for the datawords

and codewords,
– Dataword d = (d1 d2….dk)
– Codeword c = (c1 c2……..cn)
• The redundancy introduced by the code is
quantified by the code rate,
– Code rate = k/n
– i.e., the higher the redundancy, the lower the
code rate
Block Code - Example
• Dataword length k = 4
• Codeword length n = 7
• This is a (7,4) block code with code rate = 4/7
• For example, d = (1101), c = (1101001)
Error Control Process
Source code Codeword
101101 1000
data chopped (n bits)
into blocks Channel
1000 coder
Dataword
(k bits)
Codeword +
Dataword possible errors
(k bits) Channel (n bits)
Channel
decoder
Error flags
Error Control Process
• Decoder gives corrected data
• May also give error flags to
– Indicate reliability of decoded data
– Helps with schemes employing multiple layers of
error correction
Hamming Distance
• Error control capability is determined by
the Hamming distance
• The Hamming distance between two
codewords is equal to the number of
differences between them, e.g.,
10011011
11010010 have a Hamming distance = 3
• Alternatively, can compute by adding
codewords (mod 2)
=01001001 (now count up the ones)
Hamming Distance
• The Hamming distance of a code is equal to
the minimum Hamming distance between
two codewords
• If Hamming distance is:
1 – no error control capability; i.e., a single error
in a received codeword yields another valid
codeword
XXXXXXX X is a valid codeword
Note that this representation is
diagrammatic only.
In reality each codeword is surrounded by n
codewords. That is, one for every bit that
could be changed
Hamming Distance
2 – can detect single errors (SED); i.e., a single error
will yield an invalid codeword
XOXOXO X is a valid codeword
O in not a valid codeword
See that 2 errors will yield a valid (but
incorrect) codeword
Hamming Distance
3 – can correct single errors (SEC) or can detect
double errors (DED)
XOOXOOX X is a valid codeword
O in not a valid codeword
See that 3 errors will yield a valid but incorrect
codeword
Hamming Distance - Example
• Hamming distance 3 code, i.e., SEC/DED

– Or can perform single error correction (SEC)
10011011 X
This code corrected this way
11011011 O
11010011 O This code corrected this way
11010010 X
X is a valid codeword
O is an invalid codeword
Hamming Distance
• The maximum number of detectable errors is
d min  1
• That is the maximum number of correctable
errors is given by,
 d min  1
t 
 2 
where dmin is the minimum Hamming distance
between 2 codewords and . means the smallest
integer
Linear Block Codes
• As seen from the second Parity Code

example, it is possible to use a table to hold
all the codewords for a code and to look-up
the appropriate codeword based on the
supplied dataword
• Alternatively, it is possible to create
codewords by addition of other codewords.
This has the advantage that there is now no
longer the need to held every possible
codeword in the table.
Linear Block Codes
• If there are k data bits, all that is required is to hold k
linearly independent codewords, i.e., a set of k
codewords none of which can be produced by linear
combinations of 2 or more codewords in the set.
• The easiest way to find k linearly independent
codewords is to choose those which have ‘1’ in just
one of the first k positions and ‘0’ in the other k-1 of
the first k positions.
Linear Block Codes
• For example for a (7,4) code, only four
codewords are required, e.g.,
1 0 0 0 1 1 0
0 1 0 0 1 0 1
0 0 1 0 0 1 1
0 0 0 1 1 1 1
• So, to obtain the codeword for dataword 1011,
the first, third and fourth codewords in the list
are added together, giving 1011010
• This process will now be described in more detail
Linear Block Codes
• An (n,k) block code has code vectors
d=(d1 d2….dk) and
c=(c1 c2……..cn)
• The block coding process can be written as
c=dG
where G is the Generator Matrix
 a11 a12 ... a1n   a1 
a   
 21 a22 ... a2 n  a 2 
G 
 . . ... .   . 
   
 ak1 ak 2 ... akn  a k 
Linear Block Codes
• Thus,
k
c   dia i
i 1
• ai must be linearly independent, i.e.,

Since codewords are given by summations
of the ai vectors, then to avoid 2 datawords
having the same codeword the ai vectors
must be linearly independent
Linear Block Codes
• Sum (mod 2) of any 2 codewords is
also a codeword, i.e.,
Since for datawords d1 and d2 we have;
d 3  d1  d 2
So,
k k k k
c 3   d 3i a i   (d1i  d 2i )a i  d1i a i   d 2i a i
i 1 i 1 i 1 i 1
c3  c1  c 2
Linear Block Codes
• 0 is always a codeword, i.e.,

Since all zeros is a dataword then,
k
c   0 ai  0
i 1
Error Correcting Power of LBC
• The Hamming distance of a linear block code (LBC) is
simply the minimum Hamming weight (number of
1’s or equivalently the distance from the all 0
codeword) of the non-zero codewords
• Note d(c1,c2) = w(c1+ c2) as shown previously
• For an LBC, c1+ c2=c3
• So min (d(c1,c2)) = min (w(c1+ c2)) = min (w(c3))
• Therefore to find min Hamming distance just need
to search among the 2k codewords to find the min
Hamming weight – far simpler than doing a pair wise
check for all possible codewords.
Linear Block Codes – example 1
• For example a (4,2) code, suppose;

1 0 1 1 a1 = [1011]
G    a2 = [0101]
 0 1 0 1
• For d = [1 1], then;
1 0 1 1
 0 1 0 1
c 
_ _ _ _
 1 1 1 0
Linear Block Codes – example 2
• A (6,5) code with

1 0 0 0 0 1
0 1 0 0 0 1
 
G  0 0 1 0 0 1
 
0 0 0 1 0 1

0 0 0 0 1 1

• Is an even single parity code
Systematic Codes
• For a systematic block code the dataword
appears unaltered in the codeword –
usually at the start
• The generator matrix has the structure,
k R R=n-k
1 0 .. 0 p11 p12 .. p1R 

0 1 .. 0 p21 p22 .. p2 R 
G   I | P
.. .. .. .. .. .. .. .. 
 
0 0 .. 1 pk1 pk 2 .. pkR 
• P is often referred to as parity bits

Systematic Codes
• I is k*k identity matrix. Ensures dataword
appears as beginning of codeword
• P is k*R matrix.
Decoding Linear Codes
• One possibility is a ROM look-up table
• In this case received codeword is used as an address
• Example – Even single parity check code;
Address Data
000000 0
000001 1
000010 1
000011 0
……… .
• Data output is the error flag, i.e., 0 – codeword ok,
• If no error, dataword is first k bits of codeword
• For an error correcting code the ROM can also store
datawords
Decoding Linear Codes
• Another possibility is algebraic decoding, i.e.,
the error flag is computed from the received
codeword (as in the case of simple parity
codes)
• How can this method be extended to more
complex error detection and correction
codes?
CYCLIC CODES
Motivation & Properties of cyclic code
• Cyclic code are a class of linear block codes.

Thus, we can find generator matrix (G) and
parity check matrix (H).
• The reason is that they can be easily
implemented with externally cost effective
electronic circuit.
Definition
• An (n,k) linear code C is cyclic if every cyclic
shift of a codeword in C is also a codeword in
C.
If c0 c1 c2 …. cn-2 cn-1 is a codeword, then
cn-1 c0 c1 …. cn-3 cn-2
cn-2 cn-1 c0 …. cn-4 cn-3
: : : : :
c1 c2 c3 …. cn-1 c0 are all codewords.
Example: (6,2) repetition code
C    000000,  010101, 101010, 111111  

is a cyclic code.
Example2: (5,2) linear block code
1 0 1 1 1
G 
 0 1 1 0 1
is a single error correcting code, the set of codeword are:
0 0 0 0 0 Thus, it is not a cyclic code

1 0 1 1 1 since, for example, the cyclic
C shift of [10111] is [11011]
0 1 1 0 1
  C
1 1 0 1 0
Example 3
• The (7,4) Hamming code discussed before is cyclic:
1010001 1110010 0000000 1111111

1101000 0111001
0110100 1011100
0011010 0101110
0001101 0010111
1000110 1001011
0100011 1100101
Generator matrix of a non-systematic (n,k) cyclic
codes
• The generator matrix will be in this form:
g0 g1  gnk 1 gnk 0 0  0 

0 g0 g1  gnk 1 gnk 0  0 

G0 0 g0 g1  gnk 1 gnk 0 0 
 
  
 0 0 0  0 g0  gnk 1 gnk 
notice that the row are merely cyclic shifts of the r n
basis vector 
g  g g g0 1 g 00  0
nk 1 nk 
• The code vector are:
C  m G ; where m  [m0 m1  mk 1 ]
 c 0  m0 g 0
c1  m0 g1  m1g 0
c 2  m0g 2  m1g1  m2 g 0

c n 1  m0 g n k  m1g n k 1    mn k g 0
Notice that,k 1
C   m j g 1 ; where m j  0, if j  0 or j  k  1
j 0
This summation is a convolution between and . m g

• It would be much easier if we deal with multiplication, this transform is done
using Polynomial Representation.
Code Polynomial
• Let c = c0 c1 c2 …. cn-1. The code polynomial of
c: c(X) = c0 + c1X+ c2 X2 + …. + cn-1 Xn-1
where the power of X corresponds to the bit position, and
the coefficients are 0’s and 1’s.
• Example:
1010001 1+X2+X6
0101110 X+X3+X4+X5
Each codeword is represented by a polynomial of degree
less than or equal n-1. deg[ c(X) ]  n  1
The addition and multiplication are as follow:
ax j  bx j  (a  b) x j Where (a+b) and (a.b) are under GF(2). But
(ax j ).(bx k )  (a . b) x j  k j+k is integral addition
Example:
m( x )  m0  m1 x  m2 x 2
g ( x )  g 0  g1 x
   m( x )  g ( x )  (m0  g 0 )  (m1  g1 )x  (m2  0)x 2
addition
   m( x )g ( x )  m0 g 0  (m0 g1  m1g 0 )x  (m1g1  m2 g 0 )x 2  m2g1 x 3

Multipliat ion
Notice that in multiplication the coefficient are the

same as convolution sum
Implementing the Shift
Let c = c0 c1 c2 …. cn-1
and c(i) = cn-i cn-i+1 c0 …. cn-i-1 (i shifts to the right)
c(X) = c0 + c1X+ c2 X2 + …. + cn-1 Xn-1
c (i)(X) = cn-i + cn-i+1 X + …. + cn-1 Xi-1 + …. +
c0Xi +…. +cn-i-1 Xn-1
What is the relation between c(X) and c (i)(X)?
Apparently, shifting a bit one place to the right is equivalent
to multiplying the term by X.
Xic(X)= c0Xi +c1X i+1 + ….+ cn-i-1 Xn-1 + cn-i Xn ….+ cn-1 Xn+i-1
Implementing the Shift (cont’d)
Xic(X) = cn-i Xn +…+cn-1 Xn+i-1 +c0Xi +c1X i+1 + …+ cn-i-1 Xn-1
The first i terms have powers n, and are not suitable for
representing bit locations. 
Add to the polynomial the zero-valued sequence:
(cn-i + cn-i ) + (cn-i+1 + cn-i+1 )X + …. + (cn-1 + cn-1 )Xi-1
Xic(X) = cn-i (Xn +1) + cn-i+1 X (Xn +1)+…. +cn-1 Xi-1 (Xn +1)+ cn-i
+ cn-i+1 X +…. +cn-1 Xi-1+
c0Xi +c1X i+1 + …. + cn-i-1 Xn-1
That is:
Xic(X) = q(X)(Xn +1) + c(i)(X)
Implementing the Shift (cont’d)
c(i)(X) is the remainder from dividing Xic(X) by (Xn +1).
c(i)(X) = Rem[Xic(X)/ (Xn +1)] = Xic(X) mod (Xn +1).
Example:
c = 0101110. c(X) = X + X3 + X4 + X5.
X3c(X) = X4 + X6 + X7 + X8
Rem[X3c(X)/ (X7 +1)] = 1 + X + X4 + X6 [Show]
c(3) = 1100101
Short cut of long division:
Xic(X)|Xn=1 = q(X)(Xn +1) |Xn=1 + c(i)(X) |Xn=1
That is c(i)(X) = Xic(X)|Xn=1
More on Code Polynomials
• The nonzero code polynomial of minimum degree in a
cyclic code C is unique.
(If not, the sum of the two polynomials will be a code polynomial of degree less than
the minimum. Contradiction)
• Let g(X) = g0 + g1X +….+ gr-1Xr-1 +Xr be the nonzero code

polynomial of minimum degree in an (n,k) cyclic code.
Then the constant term g0 must be equal to 1.
(If not, then one cyclic shift to the left will produce a code polynomial of degree less
than the minimum. Contradiction)
• For the (7,4) code given in the Table, the nonzero code
polynomial of minimum degree is g(X) = 1 + X + X3
Generator Polynomial
• Since the code is cyclic: Xg(X), X2g(X),…., Xn-r-1g(X) are
code polynomials in C. (Note that deg[Xn-r-1g(X)] = n-1).
• Since the code is linear:
(a0 + a1X + …. + an-r-1 Xn-r-1)g(X) is also a code
polynomial, where ai = 0 or 1.
• A binary polynomial of degree n-1 or less is a code
polynomial if and only if it is a multiple of g(X).
(First part shown. Second part: if a code polynomial c(X) is not a
multiple of g(X), then Rem[c(X)/g(X)] must be a code polynomial of
degree less than the minimum. Contradiction)
Generator Polynomial (cont’d)
• All code polynomials are generated from the
multiplication c(X) = a(X)g(X).
deg[c(x)] n-1, deg[g(X)] = r, ==> deg[a(x)] n-r-1
# codewords,(2k) = # different ways of forming a(x),2n-r
Therefore, r = deg[g(X)] = n-k
• Since deg[a(X)] k-1, the polynomial a(X) may be taken to
be the information  polynomial u(X) (a polynomial whose
coefficients are the information bits). Encoding is
performed by the multiplication c(X) = u(X)g(X).
• g(X), generator polynomial, completely defines the code.
(7,4) Code Generated by 1+X+X 3
Infor. Code Code polynomials

0000 0000000 0 = 0 . g(X)
1000 1101000 1 + X + X3 = 1 . g(X)
0100 0110100 X + X2 + X4 = X . g(X)
1100 1011100 1 + X2 + X3 + X4 = (1 + X) . g(X)
0010 0011010 X2 + X3 + X5 = X2 . g(X)
1010 1110010 1 + X+ X2 + X5 = (1 + X2) . g(X)
0110 0101110 X+ X3 + X4 + X5 = (X+ X2) . g(X)
1110 1000110 1 + X4 + X5 = (1 + X + X2) . g(X)
0001 0001101 X3 + X4 + X6 = X3 . g(X)
(7,4) Code Generated by 1+X+X3
(Cont’d)
Infor. Code Code polynomials
1001 1100101 1 + X + X4 + X6 = (1 + X3) . g(X)
0101 0111001 X+ X2 + X3 + X6 = (X+ X3) . g(X)
1101 1010001 1 + X2 + X6 = (1 + X + X3) . g(X)
0011 0010111 X2 + X4 + X5 + X6 = (X2 + X3). g(X)
1011 1111111 1 + X + X2 + X3 + X4 + X5 + X6
= (1 + X2 + X3) . g(X)
0111 0100011 X + X5 + X6 = (X + X2 + X3). g(X)
1111 1001011 1 + X3 + X5 + X6
= (1 + X + X2 + X3) . g(X)
Constructing g(X)
• The generator polynomial g(X) of an (n,k) cyclic code is a
factor of Xn+1.
Xkg(X) is a polynomial of degree n.
Xkg(X)/ (Xn+1)=1 and remainder r(X). Xkg(X) = (Xn+1)+ r(X).
But r(X)=Rem[Xkg(X)/(Xn+1)]=g(k)(X) =code polynomial= a(X)g(X).
Therefore, Xn+1= Xkg(X) + a(X)g(X)= {Xk + a(X)}g(X). Q.E.D.
(1)To construct a cyclic code of length n, find the factors of

the polynomial Xn+1.
(2)The factor (or product of factors) of degree n-k serves as
the generator polynomial of an (n,k) cyclic code. Clearly,
a cyclic code of length n does not exist for every k.
Constructing g(X) (cont’d)
(3)The code generated this way is guaranteed to be cyclic.
But we know nothing yet about its minimum distance. The
generated code may be good or bad.
Example: What cyclic codes of length 7 can be constructed?
X7+1 = (1 + X)(1 + X + X3)(1 + X2 + X3)
g(X) Code g(X) Code
(1 + X) (7,6) (1 + X)(1 + X + X3) (7,3)
(1 + X + X3) (7,4) (1 + X) (1 + X2 + X3) (7,3)
(1 + X2 + X3) (7,4) (1 + X + X3)(1 + X2 + X3) (7,6)
Circuit for Multiplying Polynomials (1)
• u(X) = uk-1Xk-1 + …. + u1X + u0

• g(X) = grXr + gr-1Xr-1 + …. + g1X + g0
• u(X)g(X) = uk-1grXk+r-1
+ (uk-2gr+ uk-1gr-1)Xk+r-2 + ….
+ (u0g2+ u1g1 +u2g0)X2 +(u0g1+ u1g0)X +u0g0
+ + + + Output
gr gr-1 gr-2 g1 g0
Input
Circuit for Multiplying Polynomials (2)
• u(X)g(X) = uk-1Xk-1(grXr + gr-1Xr-1 + …. + g1X + g0)

+ ….
+ u1X(grXr + gr-1Xr-1 + …. + g1X + g0)
+ u0(grXr + gr-1Xr-1 + …. + g1X + g0)
+ + + +
Output
g0 g1 g2 gr-1 gr
Input
Systematic Cyclic Codes
Systematic: b0 b1 b2 …. bn-k-1 u0 u1 u2 …. uk-1
b(X) = b0 + b1X+….+bn-k-1Xn-k-1, u(X) = u0+u1X+ ….+uk-1Xk-1
thenc(X) = b(X) + Xn-k u(X)
a(X)g(X) = b(X) + Xn-k u(X)
Xn-k u(X)/g(X) = a(X) + b(X)/g(X)
Or b(X) = Rem[Xn-k u(X)/g(X)]
Encoding Procedure:
1. Multiply u(X) by Xn-k
2. Divide Xn-k u(X) by g(X), obtaining the remainder b(X).
3. Add b(X) to Xn-k u(X), obtaining c(X) in systematic form.
Systematic Cyclic Codes (cont’d)
Example
Consider the (7,4) cyclic code generated by
g(X) = 1 + X + X3. Find the systematic codeword for
the message 1001.
u(X) = 1 + X3
X3u(X) = X3 + X6
b(X) = Rem[X3u(x)/g(X)] = X3u(x) |g(X) = 0 = X3u(x) |X3 = X+1
= X3 (X3 +1) = (1 + X)X = X + X2
Therefore, c = 0111001
Circuit for Dividing Polynomials
Output
g0 g1 g2 gr-1 gr
+ + + +
Input
Encoder Circuit
Gate
g1 g2 gr-1
+ + +
+
• Gate ON. k message bits are shifted into the channel. The
parity bits are formed in the register.
• Gate OFF. Contents of register are shifted into the channel.
(7,4) Encoder Based on 1 + X + X 3
Gate
+
+
Input 1 1 0 1
Register : 000 110 101 100 100
initial 1st shift 2nd shift 3rd shift 4th shift
Codeword: 1 0 0 1 0 1 1
Parity-Check Polynomial
• Xn +1 = g(X)h(X)
• deg[g(x)] = n-k, deg[h(x)] = k
• g(x)h(X) mod (Xn +1) = 0.
• h(X) is called the parity-check polynomial. It
plays the rule of the H matrix for linear codes.
• h(X) is the generator polynomial of an (n,n-k)
cyclic code, which is the dual of the (n,k) code
generated by g(X).
Decoding of Cyclic Codes
• STEPS:
(1) Syndrome computation
(2) Associating the syndrome to the error pattern
(3) Error correction
Syndrome Computation
• Received word: r(X) = r0 + r1X +….+ rn-1Xn-1

• If r(X) is a correct codeword, it is divisible by g(X). Otherwise:
r(X) = q(X)g(X) + s(X).
• deg[s(X)] n-k-1.
• s(X) is called the syndrome polynomial.
• s(X) = Rem[r(X)/g(X)] = Rem[ (a(X)g(X) + e(X))/g(x)] =
Rem[e(X)/g(X)]
• The syndrome polynomial depends on the error pattern only.
• s(X) is obtained by shifting r(X) into a divider-by-g(X) circuit.
The register contents are the syndrome bits.
Example: Circuit for Syndrome Computation
Gate
r = 0010110
+
+
Shift Input Register contents
0 0 0 (initial state) • What is g(x)?
1 0 000 • Find the syndrome using
2 1 100 long division.
3 1 110 • Find the syndrome using
4 0 011 the shortcut for the
remainder.
5 1 011
6 0 111
7 0 1 0 1 (syndrome s)
Association of Syndrome to Error Pattern
• Look-up table implemented via a combinational logic circuit (CLC). The

complexity of the CLC tends to grow exponentially with the code length
and the number of errors to correct.
• Cyclic property helps in simplifying the decoding circuit.
• The circuit is designed to correct the error in a certain location only, say
the last location. The received word is shifted cyclically to trap the error,
if it exists, in the last location and then correct it. The CLC is simplified
since it is only required to yield a single output e telling whether the
syndrome, calculated after every cyclic shift of r(X), corresponds to an
error at the highest-order position.
• The received digits are thus decoded one at a time.
Meggit Decoder
Shift r(X) into the buffer B and the syndrome register R
simultaneously. Once r(X) is completely shifted in B, R
will contain s(X), the syndrome of r(X).
1. Based on the contents of R, the detection circuit yields
the output e (0 or 1).
2. During the next clock cycle:
(a) Add e to the rightmost bit of B while shifting the
contents of B. (The rightmost bit of B may be read out).
Call the modified content of B r1(1)(X).
Meggit Decoder (cont’d)
(b) Add e to the left of R while shifting the
contents of R. The modified content of R is s1(1)
(X), the syndrome of r1(1)(X) [will be shown
soon].
Repeat steps 1-2 n times.
General Decoding Circuit
.
More on Syndrome Computation
• Let s(X) be the syndrome of a received polynomial r(X) = r 0 +
r1X +….+ rn-1Xn-1 . Then the remainder resulting from dividing
Xs(X) by g(X) is the syndrome of r(1)(X), which is a cyclic shift
of r(X).
• Proof: r(X) = r0 + r1X +….+ rn-1Xn-1
r(1)(X) = rn-1 + r0X +….+ rn-2Xn-1 = rn-1 + Xr(X) + rn-1Xn
= rn-1(Xn+1) + Xr(X)
c(X)g(X) + y(X) = rn-1 g(X)h(X)+ X{a(X)g(x) + s(X)}
where y(X) is the syndrome of r(1)(X) .
Xs(X) = {c(X) + a(X) + rn-1 h(X)}g(X) + y(X)
Therefore, Syndrome of r(1)(X)= Rem[Xs(X)/g(X)]. Q.E.D.
More on Syndrome Computation (cont’d)
Note: for simplicity of notation, let Rem[Xs(X)/g(X)] be
denoted by s(1)(X). s(1)(X) is NOT a cyclic shift of s(X), but
the syndrome of r(1)(X) which is a cyclic shift of r(X).
Example:
r(X) = X2 + X4 + X5; g(X) = 1 + X + X3
s(X) = Rem[r(X)/g(X)] = 1 + X2
r(1)(X) = X3 + X5 + X6
s(1)(X) = Rem[r(1)(X)/g(X)] = 1 (polynomial)
Also, s(1)(X) = Rem[Xs(X)/g(X)] = 1.
More on Syndrome Computation
(cont’d)
Gate
+
r = 0010110 Gate +
Shift Input Register contents

0 0 0 (initial state)
1 0 000
2 1 100
3 1 110
4 0 011
5 1 011
6 0 111
7 0 1 0 1 (syndrome s)
8 (input gate off) - 1 0 0 (syndrome s (1) )
9 - 0 1 0 (syndrome s (2) )
More on Syndrome Computation (cont’d)
Let r(X) = r0 + r1X +….+ rn-1Xn-1 has the syndrome s(X).Then
r(1)(X) = rn-1 + r0 X + ….+ rn-2Xn-1 has the syndrome:
s(1)(X) = Rem[r(1)(X)/g(X)].
Define r1 (X) = r(X) + Xn-1 = r0 + r1X +….+ (rn-1+1)Xn-1
The syndrome of r1 (X), call it s1 (X):
s1 (X)= Rem[{r(X)+ Xn-1}/g(X)] = s(X) + Rem[Xn-1/g(X)]
r1(1)(X), which is one cyclic shift of r1 (X), has the syndrome
s1(1)(X) = Rem[X s1 (X)/g(X)] = Rem[Xs(X)/g(X)+ Xn/g(X)]
= s(1)(X) + 1 (since Xn +1 = g(X)h(X))
Worked Example
Consider the (7,4) Hamming code generated by 1+X+X3.
Error pattern Syndrome polynomial.

X6 1 + X2 101
X 1 + X + X2 111
X4 X + X2 011
X3 1+X 110
X2 X2 001
X1 X 010
X0 1 100
Let c = 1 0 0 1 0 1 1 and r = 1 0 1 1 0 1 1
Cyclic Decoding of the (7,4) Code
.
Error Correction Capability
• Error correction capability is inferred from the

roots of g(X).
Results from Algebra of Finite Fields:
Xn +1 has n roots (in an extension field)
These roots can be expressed as powers of one
element, a.
The roots are a 0, a1 , …., an-1.
The 2roots occur in conjugates.
 
 i 
j
mod n
  constitute a conjugate set.
 
Designing a Cyclic Code
• Theorem:
If g(X) has l roots (out of it n-k roots) that are consecutive
powers of a, then the code it generates has a minimum
distance d = l + 1.
• To design a cyclic code with a guaranteed minimum distance
of d, form g(X) to have d-1 consecutive roots. The parameter
d is called the designed minimum distance of the code.
• Since roots occur in conjugates, the actual number of
consecutive roots, say l, may be greater than d-1. dmin = l + 1
is called the actual minimum distance of the code.
Design Example
X15 + 1 has the roots 1= a 0, a1 , …., a14.
Conjugate group Corresponding polynomial
(a0) f1(X)=1 + X
(a, a2 , a4 , a8) f2(X)= 1 + X + X4
(a3 , a6 , a9 , a12) f3(X)= 1 + X + X2 + X3 + X4
(a5 , a10) f4(X)= 1 + X + X2
(a7, a14 , a13 , a11) f5(X)= 1 + X3 + X4
Design Example (cont’d)
• Find g(X) that is guaranteed to be a double error
correcting code.
The code must have a, a2 , a3 and a4 as roots.
g(X) = f2(X)f3(X) = 1 + X4 + X6 + X7 + X8
This generator polynomial generates a (15, 7) cyclic code
of minimum distance at least 5.
Roots of g(X) = a, a2, a3 , a4 , a6, a8 , a9 , a12.
Number of consecutive roots = 4.
The actual minimum distance of the code is 5.
Cyclic Codes
Some Standard Cyclic BCH Codes
Linear
Block Codes Hamming Codes
Codes
• The Hamming Codes: single-error correcting codes

which can be expressed in cyclic form.
• BCH: the Bose-Chaudhuri-Hocquenghem are among the
most important of all cyclic block codes. Extenstion of
Hamming for t-error correcting codes.
• Some Burst-Correcting Codes: good burst-correcting
codes have been found mainly by computer search.
• Cyclic Redundancy Check Codes: shortened cyclic error-
detecting codes used in automatic repeat request (ARQ)
systems.
BCH Codes
• Definition of the codes:
• For any positive integers m (m>2) and t0 (t0 <
n/2), there is a BCH binary code of length n =
2m - 1 which corrects all combinations of t0 or
fewer errors and has no more than mt0 parity-
check bits. m
Block length 2 1
Number of parity - check bits n  k  mt 0
min imum distance d min  2t 0  1
Table of Some BCH Codes
n k d (designed) d ( actual) g(X)*
7 4 3 3 13
15 11 3 3 23
15 7 5 5 721
15 5 7 7 2463
31 26 3 3 45
31 16 5 7 107657
31 11 7 11 5423325
* Octal representation with highest order at the left.

721 is 111 010 001 representing 1+X4+X6+X7+X8
Burst Correcting Codes
• good burst-correcting codes have been found mainly by
computer search.
• The length of an error burst, b, is the total number of bits in
error from the first error to the last error, inclusive.
• The minimum possible number of parity-check bits required
to correct a burst of length b or less is given by the Rieger
bound. r  2b
• The best understood codes for correcting burst errors are
cyclic codes.
• For correcting longer burst interleaving is used.
Table of Good Burst-Correcting Codes
n k b g(X) (octal)
7 3 2 35 (try to find dmin!)
15 10 2 65
15 9 3 171
31 25 2 161
63 56 2 355
63 55 3 711
511 499 4 10451
1023 1010 4 22365
Cyclic Redundancy Check Codes
• Shortened cyclic codes
• Error-detecting codes
• used in automatic repeat request (ARQ)
systems.
• Usually concatenated with error correcting code
Error Correction Error Correction CRC
CRC To
Encoder Decoder Syndrom
To Info Sink
Encoder e Checker
Transmitter
Performance of CRC Codes
• CRC are typically evaluated in terms of their
– error pattern coverage
– Burst error detection capability
– Probability of undetected error
• For a (n,k) CRC the coverage, λ, is the ratio of the number of invalid
blocks of length n to the total number of blocks of length n.
• This ratio is a measure of the probability that a randomly chosen block is
not a valid code block. By definition,
•
  1  2 r
where r is the number of check bits Code Coverage
• For some near-optima CRC codes, CRC-12 0.999756
see table 5.6.5 CRC-ANSI 0.999985
CRC-32A 0.99999999977
Basic Definitions
• k =1, n = 2 , (2,1) Rate-1/2 convolutional code

• Two-stage register ( M=2 )
• Each input bit influences the output for 3 intervals (K=3)
• K = constraint length of the code = M + 1
Generator Polynomial
• A convolutional code may be defined by a set
of n generating polynomials for each input bit.
• For the circuit under consideration:
g1(D) = 1 + D + D2
g2(D) = 1 + D2
• The set {gi(D)} defines the code completely.
The length of the shift register is equal to the
highest-degree generator polynomial.
State Diagram Representation
• The output depends on the current input and
the state of the encoder ( i. e. the contents of
the shift register).
Trellis Diagram Representation
• Expansion of state diagram in time.
Decoding
• A message m is encoded into the code

sequence c.
• Each code sequence represents a path in
the trellis diagram.
• Minimum Distance Decoding
– Upon receiving the received sequence r, search
for the path that is closest ( in Hamming
distance) to r .
The Viterbi Algorithm
• Walk through the trellis and compute the Hamming
distance between that branch of r and those in the trellis.
• At each level, consider the two paths entering the same
node and are identical from this node onwards. From these
two paths, the one that is closer to r at this stage will still
be so at any time in the future. This path is retained, and
the other path is discarded.
• Proceeding this way, at each stage one path will be saved
for each node. These paths are called the survivors. The
decoded sequence (based on MDD) is guaranteed to be
one of these survivors.
The Viterbi Algorithm (cont’d)
• Each survivor is associated with a metric of
the accumulated Hamming distance (the
Hamming distance up to this stage).
• Carry out this process until the received
sequence is considered completely. Choose
the survivor with the smallest metric.
The Viterbi Algorithm:
• The viterbi algorithm is used to decode convolutional codes

and any structure or system that can be described by a trellis.
• It is a maximum likelihood decoding algorithm that selects
the most probable path that maximizes the likelihood
function.
• The algorithm is based on add-compare-select the best path
each time at each state.
Example: For the convolutional code example in the previous lecture,
starting from state zero, Decode the following received sequence.
At the end of the
trellis, select the
path with the
minimum
cumulative
Hamming weight
This is the
survival
path in
this
example
Decoded
sequence is
m=[10 1110]

Compute the two possible paths at
Add the weight of the each state and select the one This is called the
path at each state with less cumulative Hamming survival path
weight
Distance Properties of Conv. Codes
• Def: The free distance, dfree, is the minimum Hamming
distance between any two code sequences.
• Criteria for good convolutional codes:
– Large free distance, dfree.
– Small Hamming distance (i.e. as few differences as possible )
between the input information sequences that produce the
minimally separated code sequences. dinf
• There is no known constructive way of designing a conv.
code of given distance properties. However, a given code
can be analyzed to find its distance properties.
Distance Prop. of Conv. Codes (cont’d)
• Convolutional codes are linear. Therefore, the Hamming
distance between any pair of code sequences
corresponds to the Hamming distance between the all-
zero code sequence and some nonzero code sequence.
Thus for a study of the distance properties it is possible
to focus on the Hamming distance between the all-zero
code sequence and all nonzero code sequences.
• The nonzero sequence of minimum Hamming weight
diverges from the all-zero path at some point and
remerges with the all-zero path at some later point.
Distance Properties: Illustration
• sequence 2: Hamming weight = 5, dinf = 1

• sequence 3: Hamming weight = 7, dinf = 3.
Modified State Diagram
• The span of interest to us of a nonzero path starts from
the 00 state and ends when the path first returns to the 00
state. Split the 00 state (state a) to two states: a0 and a1.
• The branches are labeled with the dummy variables D, L
and N, where:
The power of D is the Hamming weight (# of 1’s) of the
output corresponding to that branch.
The power of N is the Hamming weight (# of 1’s) of the
information bit(s) corresponding to that branch.
The power of L is the length of the branch (always = 1).
Modified State Diagram (cont’d)
.
Properties of the Path
Sequence 2:
code sequence: .. 00 11 10 11 00 ..
state sequence: a0 b c a1
Labeled: (D2LN)(DL)(D2L) = D5L3N
Prop. : w =5, dinf =1, diverges from the allzero path by 3 branches.
Sequence 3:
code sequence: .. 00 11 01 01 00 10 11 00 ..
state sequence: a0 b d c b c a 1
Labeled: (D2LN)(DLN)(DL)(DL)(LN)(D2L) = D7L6N3
Prop. : w =7, dinf =3, diverges from the allzero path by 6 branches.
Transfer Function
• Input-Output relations:
a0 =1
b = D2LN a0 + LNc
c = DLb + DLNd
d = DLNb + DLNd
a1 = D2Lc
• The transfer function DT(D,L,N)
L 5 = a1 /a0
3
T(D, L, N) 
1  DNL(1  L)
Transfer Function (cont’d)
• Performing long division:
T = D5L3N + D6L4N2 + D6L5N2 + D7L5N3 + ….
• If interested in the Hamming distance property of
the code only, set N = 1 and L = 1 to get the
distance transfer function:
T (D) = D5 + 2D6 + 4D7
There is one code sequence of weight 5. Therefore dfree=5.
There are two code sequences of weight 6,
four code sequences of weight 7, ….
Decoding of Convolutional Codes
 Let Cm be the set of allowable code sequences of length m.
 Not all sequences in {0,1}m are allowable code sequences!

 Each code sequence c  C can be represented by a unique path
 m
through the trellis diagram
 What is the probability that the code sequence c is sent and the

binary sequence is received?
y

  dH y,
c mdH y,
c
   
   
c p
Pr y |   


.(1  p )
  



where p is the probability of bit error of BSC from modulation

Decoding Rule for Convolutional Codes
 Maximum Likelihood Decoding Rule:
max
cCm
  
c  min
Pr y, 

  
d y, c
cCm H  
 Choose the code sequence through the trellis which has the
smallest Hamming distance to the received sequence!
 The Viterbi Algorithm (Viterbi, 1967) is a clever way of
implementing Maximum Likelihood Decoding.
Computer Scientists will recognize the Viterbi Algorithm as an
example of a CS technique called “ Dynamic Programming”
 Reference: G. D. Forney, “ The Viterbi Algorithm”,

Proceedings of the IEEE, 1973
 Chips are available from many manufacturers which
implement the Viterbi Algorithm for K < 10
 Can be used for either hard or soft decision decoding
We consider hard decision decoding initially
Basic Idea of Viterbi Algorithm
 There are 2rm code sequences in Cm .
 This number of sequences approaches infinity as m
becomes large
 Instead of searching through all possible sequences,
find the best code sequence "one stage at a time"
(Hamming Distance Metric)
 Initialization:
Let time i = 0.
We assign each state j a metric Z j (0) at time 0.
We know that the code must start in the state 0.
Therefore we assign:
Z j (0) = 0
Z j (0) = ¥ for all other states
The Viterbi Algorithm (continued)
Consider decoding of the ith segment:
 Let y i be the segment of n bits received between times i
and i + 1
 There are several code segments c i of n bits which lead
into state j at time i+1. We wish to find the most likely one.
s 
ci 
 Let be the state from which the code segment c
i
emerged
 For each state j, we assume that c is the path leading into
i
state j if:
   
ci  i  d 
Zs  c ,y
H i 
is the smallest of all the code segments leading into state j.
The Viterbi Algorithm (continued)
 Iteration:
 
• Let
Z  i  1  Z  i   d c i , y
• Let i=i+1 j s
ci






H
• Repeat previous step
• Incorrect paths drop out as i approaches infinity.
Viterbi Algorithm Decoding Example
 r =1/2, K = 3 code from previous example
 c = (0 0 1 1
 01 00 10 10 1 1) is sent
 y
 = (0 1 1 1 01 00 10 10 1 1) is received.
What path through the trellis does the Viterbi Algorithm choose?
Viterbi Algorithm Decoding Example
(continued)
SUMMARY
• Learnt about the concepts of Entropy and the
source coding techniques
• Statement and the theorems of Shanon
• Concepts about mutual information and channel
capacity
• Understand error control coding techniques and
the concepts about linear block codes, cyclic
codes, convolution codes & viterbi decoding
algorithm
END OF 4th UNIT

Information and Coding Theory

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Information and Coding Theory

Uploaded by

Copyright:

Available Formats

IT18303

• To understand encoding and decoding of digital

• To generate code words for different media elements.

1)What rate source generates information

More Information Needed for certainty

• The amount of information in message m1 occurring once is given as

• The total amount of information

• What are the requirements that a source encoder

(i) Shanons first theorem (or) Source coding theorem

The average code word length L is given as

The coding efficiency of the source encoder is

Suppose a discrete memoryless source alphabet S, Entropy H(S) bits per

• This coding method leads to the lowest possible value of L for

• Its also known as minimum redundancy code or optimum

• It requires probabilities of source symbols

• It assigns variable length codes for symbols

2. The mutual information is always non negative

3. The mutual information of a channel is related to the joint

• It must satisfy two constraints

• A vector notation is used for the datawords

• Hamming distance 3 code, i.e., SEC/DED

• As seen from the second Parity Code

• ai must be linearly independent, i.e.,

• 0 is always a codeword, i.e.,

• For example a (4,2) code, suppose;

• A (6,5) code with

1 0 .. 0 p11 p12 .. p1R 

• P is often referred to as parity bits

• Cyclic code are a class of linear block codes.

C    000000,  010101, 101010, 111111  

0 0 0 0 0 Thus, it is not a cyclic code

1010001 1110010 0000000 1111111

g0 g1  gnk 1 gnk 0 0  0 

This summation is a convolution between and . m g

   m( x )g ( x )  m0 g 0  (m0 g1  m1g 0 )x  (m1g1  m2 g 0 )x 2  m2g1 x 3

Notice that in multiplication the coefficient are the

• Let g(X) = g0 + g1X +….+ gr-1Xr-1 +Xr be the nonzero code

Infor. Code Code polynomials

(1)To construct a cyclic code of length n, find the factors of

• u(X) = uk-1Xk-1 + …. + u1X + u0

• u(X)g(X) = uk-1Xk-1(grXr + gr-1Xr-1 + …. + g1X + g0)

• Received word: r(X) = r0 + r1X +….+ rn-1Xn-1

• Look-up table implemented via a combinational logic circuit (CLC). The

Shift Input Register contents

Error pattern Syndrome polynomial.

• Error correction capability is inferred from the

• The Hamming Codes: single-error correcting codes

* Octal representation with highest order at the left.

• k =1, n = 2 , (2,1) Rate-1/2 convolutional code

• A message m is encoded into the code

• The viterbi algorithm is used to decode convolutional codes

• sequence 2: Hamming weight = 5, dinf = 1

 Not all sequences in {0,1}m are allowable code sequences!

where p is the probability of bit error of BSC from modulation

 Maximum Likelihood Decoding Rule:

 Reference: G. D. Forney, “ The Viterbi Algorithm”,

You might also like