Professional Documents
Culture Documents
Information Theory and Coding: What You Need To Know in Today's ICE Age!
Information Theory and Coding: What You Need To Know in Today's ICE Age!
CODING
thiru@hyderabad.bits-pilani.ac.in
Chamber: A214
Website:
http://sites.google.com/site/pkthiruvikraman/
For lecture slides and other material
Channel coder
Decoder (for
Channel
decompression)
(usually has
some noise)
Decrypter
Channel Decoder (for
error detection and
correction)
Compression in daily life : Abbreviation
BITS, CGPA, asap, BTW,sms, OMG, lol
LTC, SAC
So in general, compression is achieved by encoding the
information contained in the source symbols into code
words.
For storage purposes, we compress it and at a later stage, it
can be uncompressed.
Sometimes, a given codeword may not be uniquely
decodable!
Eg: WWF : World Wildlife Fund
World Wrestling Federation
BITSian: “If Business Standard can interpret AI
as Air India, then I know another way we can
interpret BS”
More howlers in the news:
India's state TV channel has fired a news anchor
for referring to Chinese President Xi Jinping as
"Eleven" Jinping.
Data is the means by which we convey information.
Various amounts of data maybe used to convey the
same information. Hence compression is possible.
Compression is achieved by removing redundancy.
An example of redundancy in the English
language:
Queen, quick, quite etc. Q is always followed by u ,
hence we can compress the message by not
transmitting the u.
Then why do we have redundancy in the first
place?
Error detection and correction : it’s an insurance
against failures
Redundancies in daily life:
•Redundancy in Computer Networks
•Repetition in lectures: Students who doze off for a
minute can “get back” into the lecture
•Redundancy in movies: You can miss half the
movie and still understand the story!
Coding :
Aim: To convert a message i.e, a sequence of
symbols (source alphabets) into a binary (or in
general q-ary message (consisting of code words).
Here the code alphabets are 0 and 1 and a string
made up of 0s and 1s is called a code word.
Definition:
Given finite sets A (source alphabet) and
B (code alphabet), a coding is a rule which assigns
to each source symbol exactly one word in the code
alphabet.
A code is a set of vectors called codewords
Compression ratio = n1 /n2
Where n1 and n2 are the number of information-
carrying units in the two data sets
A code is a rule which assigns to each
source symbol a code word.
A 00001
B 00010
C 00011
D 00100
E 00101
Uniquely decodable code (UDC):
Example of a code which is not a UDC
Source symbol Code word
A 00
B 10
C 101
D 110
E 1001
0 0
1 10
2 110
3 111
d1 d 2 d3 ....... d n
Let us say, we have chosen a binary word K(a1) of length
d1. We want to choose a binary word of length d2, but we
want to avoid those words which have K(a1) as a prefix.
The number of such words is:
d 2 d1
2
Explanation:
1 0 1 0 0 0 0 1 1 0
d1
d2
d 2 d1
2 2
d2
1
Number of words of length d2 > Number of words
ruled out + 1
Similarly, we have to choose a word of length d3 after
ruling out the words of length d1 and d2 .
Therefore:
d 3 d1 d3 d 2
2 2
d3
2 1
Or:
d1 d2 d3
1 2 2 2
d1 d2 d3 dn
1 2 2 2 ..........2
Theorem:
Given a source alphabet of n symbols and a code
alphabet of k symbols, then an instantaneous code
with given lengths d1,d2,d3….dn of code words
exists, whenever the following Kraft’s inequality
d1 d2 d3 dn
1 k k k ..........k
Note:
If a given code satisfies Kraft’s inequality
then it does not mean that the given code is
instantaneous. The conclusion is that there exists
at least one instantaneous code with the same
length of code words. The ultimate test for an
instantaneous code is the No-Prefix condition
Decoding of instantaneous codes:
•Compare the first bit received with the code words in the
look-up table. If there is a match, then the corresponding
source symbol can be obtained from the look-up table.
•If there is no match, then concatenate this bit with the
next bit received and then again search for a match with
the code words in the look-up table.
•This process of concatenating the last received bit with all
the preceding bits is continued till a match is found.
•Once a match is found, the temp variable where the
concatenated bits were stored is cleared and the next bit
which is received is stored there and the entire process is
repeated.
Huffman code:
We will now describe a systematic procedure for
arriving at an efficient instantaneous coding scheme called
the Huffman code.
Information Source: An information source is a source
alphabet together with a probability distribution; i.e., a set
{a1,…….an} together with numbers P(a1),……P(an)
satisfying
n
P(a ) 1 and
i 1
i
D 0.1 0.1
F 0.1
We now proceed backwards by “splitting” each code word
w assigned to a sum of two probabilities to two words w0
and w1.
splitti
a* P(a*) 1st
ng 2nd spt 3rd 4th
0.1 101
lavg = 2 x0.4+2x0.2+3x0.4=0.8+0.4+1.2=2.4
If we represent in usual binary fixed length code lavg = 3
Implementation of Huffman Encoding:
•We have to keep track of the probabilities which
have been combined. This will be used when we
“split” the code-words when proceeding in the
reverse direction.
function [m,k]= huff(m,k)
global palph1 fl code;
symp=palph1(m,k)+palph1(m-1,k);
tempk=1;
for i=1:m-1,
if symp <= palph1(i,k)
palph1(i,k+1)=palph1(i,k);
else
break;
end
end
fl(i,k+1)=1;
palph1(i,k+1)=symp;
for j=i+1:m-1,
palph1(j,k+1)=palph1(j-1,k);
end
m=m-1;
k=k+1;
if m==2
return;
end
[m,k]= huff(m,k)
% cd is a cell which will contain all the code words
% cellstr creates cell array of strings from
character array.
cd=cell(n,n-1);
cd(1,n-1)=cellstr('1');
cd(2,n-1)=cellstr('0');
while m<n
j=1;
while j<m+1
if fl(j,n-m+1)==0
cd(j,n-m)=cd(j,n-m+1);
else
bj=j;
j=m+1;
end
j=j+1;
end
t1=cd(bj,n-m+1);
t11=strcat(t1,'1');
t2=strcat(t1,'0');
t1=cellstr(t11);
t2=cellstr(t2);
cd(m,n-m)=t1;
cd(m+1,n-m)=t2;
k=bj+1;
while k<m+1
cd(k-1,n-m)=cd(k,n-m+1);
k=k+1;
end
m=m+1;
end
1. a) Is the code given in the following table
uniquely decodable? Is it instantaneously
decodable? Why?