Professional Documents
Culture Documents
Information Theory Part1
Information Theory Part1
Information Theory
INTRODUCTION
Inforrmation theory is a branch of probability theory, which can be applied to the study of communication
systems. In general, communication of information is statistical in nature and the main aim of
nformation theory is to study the simple ideal statistical communication models. Information theory
invented by communication scientists while they were studying the statistical structure of electrical
communication equipment.
Communication systems deal with the flow of some sort of information in some network. The
nformation may be electrical signals, words, digital data, pictures, music, etc. There are three basic
biocks of a communication system:
) Transmitter or source.
(i) Channel or transmission network which conveys the communique from transmitter to receiver.
(1) Receiver or destination.
Figure 10.1 shows the simplest form of a communication system. In practice, generally, thereare a
number of transmitters and receivers with a complex network. In such cases, it is desirable to study
etficiency is be
the distribution of information in the system. Therefore some sort of transmission to
UNIT OF INFORMATION
10.1
of the
considered here are of statistical nature; i.e. the performance
ne communication systems
sense. It is always described
instatistical terms. Thus,
ystem can never be described in a deterministic
Communication Systems:Analog and Digitol
548
J0. 1s its ynnr
shown in FIg.
the most significant
feature of the
communication
one ol
e
system
prc-specified mess. ictability
random any
or uncertainty. The
transmitter transmits at
When n e conmunication syst he
cach individual message is known.
perormánce, Thus, our
system mode
c
I)p)
where x, is an event with a probability p(x) and the amount of information associated with it is /c)
(10.1.1)
Now, iet there be another event yk such that x, and yk are independent. Hence, the probability of the
joint event is p(x,. vg) = p x )pvx) with associated information content
(10.12)
Px,) p(yk)
The total information
Ix,,yy) must be equal to the sum of individual informations /(x,) and Vk), wnere
o) py Thus, it can be seen that the function on RHS of Eq. 10.1.2 must be a function
which converts the
operation of
multiplication into addition.
Logarithm is one such function. tns
y)= log
log
PVK
10.2 ENTROPY
Acommunication system is not only meant to deal with a single message, but with all
Hence, although the instantaneous informations flows corresponding to individual possible messages.
SOurce may be erratic, we may describe the source in terms of messages from the
average information per individual
message, known as entropy of the source.
It is necessary at this stage to understand the difference between 'arithmetic average' and 'statistical
average.
The concept of 'arithmetic
average' is applicable when the quantities to be dealt with are deterministic
in nature and hence are to be
considered just once. An example is to find the average
in a class. Let there be M students in the class and their height of students
heights be
h,=1,2,3,. M)
The average
height will then be (arithmetic average)
h,
h,= M
Let apply the same definition to a problem involving non-deterministic
us
Therefore, the procedure to tind the average will be to lind the total ofall quantities over a sulhcienly
long period of time divided by the number of quantities occurring during that time interval
This is 'statistical average'. It is to be noted that since in communication theory, we have to deal with
statistical quantities and not delerministic quantities, whenever 'average' is referred, it always means
statistical average' and not "arithnmetic average'.
The average information per individual message can now be calculated in the following manner.
Let there be M different messages m, m... mM, with their respective probabilities of occurrences p.
P P Let us assume that in a long time interval, L messages have been generated. Let Lbe very
large so that L>> M; then, the number ofmessages m=pL.
The amount of information in message m = l o g . Thus, the total amount of information in all m
P
messagesPL log P
The total amount of information in all L messages will then be
,PhLlog+Ppl
P
log- t + Pu L log-
P2 PM
The average informalion per message, or
entropy, will then the
H P
log+
P
p, log++ Py log-
P2 PM
M
M
P log Pk
(10.2.1)
k=l
=
-P, log P
Ifthere is only a single possible message, i.e. M=l and p,=p, =1, then
Information Theory 551
H= P log= I log; =0
P
it can be seen that in the case of a single possible message, the reception of that message conveys
Thus,
no information.
dhe other hand, let there be only one message out ofthe M message having a probability I and all
th
In that case
others O.
M
H 2Px log
k=1 PK
P p log+lim plog+plog-
1 log+0
0
H P log+P2
P
log P2
Let
P1 p,then p2 = 1 -P1=1-p=q
Hence,
(10.2.2)
H plog+0-p)log
P
-p)
H(q)
plog+P qlog H(p)
= =
dH = 0 = - In 2 - l o g P * In2 t log (1 - p)
0.5 IP
dp Has a Function of PP
log (I-P) Fig. 10.2.1
logp
=
i.e.
and Digtal
Communkation
Systems:Analog
552
1.C.
0 . 5
T the e C o n d derivative .
0.3,
i.C.
maxima or a
minina al p maxina.
Now,
in a
that there is
either a then it
ifit is negative,
concludes
This and
minima
positive,
then there is a
d'H 0
p-p
at 0.5
Hence, H has a maximum in it. Thus,
puttingp=0,5
found from Eq. 10.2.2 by
The maximum value of H can be
0.5log2+0.5log2
=
I bit/message
Hm p =05 maximum whenp 0.5, i.e, when both th
2), the entropy is tmaxiu
that for the binary case(M= for an M ary case, the entropy is
We have be shown that
seen
it can
messages are cqually likcly. Similarly, In this cae, the maximum
since P M
k
O
(10.2.3)
Hmax= log Mbits/ message
since there are M terms in the summation.
situation when all messages are equiprobable. In this case,
It is interesting to consider a
PP2 P =PM= M
Hence,
M
M
Average information =2 P a = M
k=
This is same as the arithmetic average. The reason is simple. The relative weight of all the quantities s
same, because they are equiprobable, and hence it is redundant and the definition of arithmetic average
15 applicable.
Thus, it can
be concluded that the statistical all
average is cqual to the arithmetic average w
quantities are equiprobable.]
Example 10.2.1
Aquartenary source generates information with probabilities p = 0.1, p = 0.2, p, = 0.3 and
p 0.4. Find the entropy of the system. What percentage ofmaximum possible information is being
H P log
k=l Pk
10
= 0.1 log 10 +0.2 log5 +0.3 log+0.4 log 2.5
= 1.8464 bits/message
HmaxlogM
= log 4
= 2 bits/message
H ux
Hx100
1.8464
x100
2
= 92.32%
INFORMATION
10.3 RATE OF
information (or
the rate of
ofr per second, is the
messages
second. Now, H
Ifa message sso u r c e generates messages atthenumber
rate
of bits of
information per
info
info ge defined as the average
o n rate) R is Hence,
ava information per message.
eTage number of bits of
554 Communication Systems:Analog and Digital
R rH bits/sccond
and r2 messages per secondte
(10.3.1)
Let us consider two sources of equal entropy H, generating r, the second source will iran spectively.the
The first source will transmit the information at a rate Ri = riHand
Thus given period, more infa in a
intormationar a rate R, =r,H. Now, if> r, then R>R. ion is
grcater demands on the comm.
transmitted from the first source than the second source, placing
channel. Hence, the source is not deseribed by its entropy
alone but also by its ol inform
rate of
its rate ommunication
inlormation.
on
H P log
k=l Pk
log
2 2+log 4+log8+ 16 logl6+log
4 32
32+log
32
32
bits/message
Now, r= 16 outcomes/second
R = rH
16x 6 =
31 bits/second
Example 10.3.2
A continuous signal is band limited to 5 kHz.The signal is
with the
probabilities 0.25, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05 and
quantized in 8 levels of a PCM system
ofinformation. 0.05. Calculate the
entropy and the raunce
Solution The signal should be
sample is then sampled at frequency a
get
quantized to one of the eight levels. 5 x 2 10
=
(Sampling theorem).
kHz theoren. Each
Looking at each
quantized level as a messa We
Ac the sampling frequency is T0 KHZ, the message rate = 10,000 messages/sec. Hence, the rate of
information is
R =
rH= 10,000 x 2.74 27,400 bits/second
Example 10.3.3
n analog source has a 4 ki: BW. The signalis sampled at 2.5 times the Nvquist rate. Each sample
is quantized into 256 equally likely levels. The successive samples are statisticaly independent.
What is the information rate of the source? IES 2003]
Solution The BW of the signal is
Jm4 kHz
Nyquist frequency = 2/m
or Js min8 kHz
The sampling rate = 2.5 x Nyquist frequency
or S 2.5 x 8
20 kHz
The number of quantization levels = 256
256
n= 8
Hence,
The information rate =
size of the word x sampling rate
nxJs
= 8 x20
= 160 kbps
Example 10.3.4
generating information
as given below:
Two s o u r c e s are
Source 1 pi
=
Pz =P P4 =
Souree2P h =P -P
-
The message rates are respecuvey 200 and 250 messages per second. Compare H andR of the
sources.
S56 Communicatvion Syems: Andlog ond Diginal
k=1 P
= 2 bits/message
H, =
2P
k=1 P
=
1.75 bits/message
R,=rH
200x 2
400 bits/second
R rH2
= 250 x 1.75
= 437.5 bits/second
behaviour of a communication system, we must simultaneously study the behaviour of the transmite
and the receiver. This givesrises to the concept of a two-dimensional probability scheme. The res
of a one-dimensional probability scheme may be extended to a two-dimensional probability schemc.
which may further be extended to study any finite-dimensional probability scheme.
Let there be two finite discrete sample spaces S, and S, and let their product space be S = S, S; ( S
Fig. 10.4.1)
Let
and
X
he the sets of events in S and S2, respectively. Each event x, of S, may occur in conjunction with any
event y, of S2. Hence, the complete set of events in S= S, S, is
X12
XY =
21 X22
mY2 XmYn
where
Ps) Pt,,ya)
k=l
(10.4.2)
H(Y) =
2PV»)log p(y)
k=l
where
PO)- 2Px,,)
j=l
(10.4.3)
HXY)= -22ptx,,n)log p(x, V)
j=lk=l|
and .
are marginal entropies of X and Y, respectively, and H(XY) is the joint entropy of X
L C and H(Y)
H
Communicotion Systems:Analog and Digital
558
PX) =
PX,Y)
p(Y)
We know that y, may occur in conjuction with xi, 2... Xm Thus,
Xly=
Pl) p(T2,y)
P(V) P(y) P(V) (10.4.4)
Now,
p t , )+ p l r z , y) + +plm V) = PO)
Hence Px,/y) = 1
j=l|
Therefore, the sum ofelements of the matrix given by Eq. 104.4 is unity. Hence the probability scheme
defined by Eq. 10.4.4 is complete. Therefore, an entropy may be associated with it. Thus,
HXly)-
H(XIy) P =2 pla,,y)
j=l D()
PO) p(y)
-
j=l
P/y»)log p(z, /y. )
We may take the average of this
conditional entropy for all admissible values of y, in order to obtan
measure of an average conditional ofentropy the system.
H(XI Y) =
H(X/y;)
Po%) H(XIy)
k =1
--
p ) j=1
k =1
E ptx,/y) log p(x, /»)
22 PVi) px, /y,) log p(x, /y)
j=l k=1
2j=lk=1
2 P*, »n) log (1045)
p(x, !y)
Informatio Theory 559
be shown that
similarly, it can
H(YIX)=-2
2 P ,V)log p(y
j=l k= 1
/x) (10.4.6)
Y and H(Y/X) are average conditional entropies, or
simply conditional entropies.
This,in all, there are five entropies associated with a two-dimensional probability scheme. They are:
HX), H(Y), H(X,Y), H(XIY) and H(Ylx).
HX): Average information per character at the transmitter, or entropy of the transmitter.
H(Y): Average information per character at the receiver, or entropy of the receiver.
H(X,Y): Average information per pair of the transmitted and received characters, or average uncertainty
of the communication system as a whole.
HXIY): A received character yk may be the result of transmission of one of the xs with a given
probability.The entropy associated with this probability scheme, when y, covers all received symbols,
ie. H(XI yk) is the conditional entropy HXIY); a measure of information about the transmitter,
where it is known that Y is received.
H(YIX): A transmitted character x, may result in the reception of one of the y,s with a given probability.
The entropy associated with this probability scheme, when x, covers all transmitted symbols, i.e.
H(Y/x,) is the conditional entropy H(YIX); a measure of information about the receiver, where it is
known that Xis transmitted.
HX) and H(Y give indications ofthe probabilistic nature ofthe transmitter and the receiver, respectively.
HXIY) indicates how well one can recover the transmitted symbols, from the received symbols; i.e. it
gives a measure of equivocation. H(YIX) indicates how well one can recover the received symbols from
the transmitted symbols; i.e. it gives a measure of error, or noise. The meaning of H(X/Y) and H(YX)
will be more clear after we study mutual information.
H(XY) =
22 px
j=lk =1
) log p(x,, )
22 plx,.v)loglpa,/y)p(»%
j=\k = I
=
H( 7Y)+ H(Y)
Similarly, it can be shown that (10.4.7)
H(XY) =
H(YIX) + H(X)
Example 10.4.1 (10.4.8)
Complete the following probability matrix in
all possiblele ways:
wayS
0.
of a, b, c and d so chosen that
the sum of each row
conditotha
1,=
0.2
given matrix is the conditional-metricbeconmes
c and =
d= 0.5, the
Again, as the sum of all the given entries in the
I), it cannot be a fourth column of
the matrix P\a
conditional-probability matrix P(X/Y). is 1.2 (which Is
greaict
Thus, theTe is
only one possible way to
complete the given probability matrix and it is:
0.1 0.3 0.2 0.4\
0.3 0.1 0.1 0.5
0.2 0.4 0.1 0.3 P(YIX)
0.2 0.2 0.1 0.5
infortmenion Theory 56
Kampla 10.4.2
AdiOrete ouoe wmsmlla message Xp X, (mdxy with theprohubilities 0.3, (0A and 0.3, The source
Y
0.2
0.8 02 1
PYIX)
.3
0 0.3 0.7
0.7
Ys
AIsen given is
Conditional Probability Matrix
Fig, 10.4.2
PCX)- [0.3 04 0.3 for Exarnple 10.4.2
PMX,Y) x 04
0 0.3x0.3 0.7x0.3
0.24 0.06 0
x0 0.4 0
x0 0.09 0.21
The probabilitien p(y, ). p(v,) and py) can be obtained by adding the columns of P(X,) giving
0.21
py)-0.24. pV,)=0.06 +0.4+0.09 =0.55, py,)
=
0.109 0
PXTY) 0 0.727 o
0 0.164 1
(Check: The sum of all the columns of PXIY) is 1.)
(0 og0 04 og 04 03 log 0 3)
571bite/messag
2 V) log pMy)
1441bits/nessnge
1MA,Y)
i - I k= 1
MN/Y)- 22 Mx.
=Ik =1
y) log plx, /y)
HY) 2
J=Ik =1
p(x,,>) log ply, /x)
-[0.24 log 0.8 +0.06 log 0.2 +0.4 log 1+0.09 log 0.3 021 0.1
log
0.482 bit/message
(Note:HXIY) and HMYIX) can also be found
is by using Eqs 10.4.7 and 10.4.8,
casier way, because the
work is relatively
knowledge of P(YIX) and P(XTY) is not respectively. This, in at.
simnple.) necessary and the computanoal
Example 10.4.3
A transmiter has an
alphabet of Jour letters [x, x, x,
leters Iy, y, Vl.
The joint probabilty matrix is Xd and the receiver has an
alphudet of tnree
0.3 005 01
P(X,Y) | 0 0.25 o
0. 15 0.05
Caleulate all the entroples.
x
0 0.05 0.15
Information Theory 563
Solution
The probabilities of the transmitter symbols are found by a summation of rows and the
nrobabilities of the recciver symbols are found by a summation of columns of the matrix PX,).
Thus,
P(X) = [0.35 0.25 0.2 0.21
and
P(Y = [0.3 0.5 0.2]
HX) = 2 Px,)logplr,)
j=l
= [0.35 log 0.35 + 0.25 log 0.25 +0.2 log 0.2 + 0.2 log 0.2]
=1.96 bits/message
k =1
1.49 bits/message
HX = -
[0.3 log 0.3 +0.05 log 0.05 +0.25 log 0.25 +0.15 log
+0.05 log 0.05 + 0.05 log 0.05+0.15 log 0.15]
=2.49 bits/message
1.49= 1.00 bit/message
H(XIY) =
HX, Y)-H(Y =2.49
H(YIX) =
HX, Y)-H(X) 2.49
= 1.96 = 0.53 bit/message
Example 10.4.4
104.3. Calculate all the entropies.
A discrete source is connected to the channel given
in Fig
P(XT) as
Solution Figure 10.4.3 gives the conditional probability matrix
2 Y'3
0.80.8 0 0 0.8
n- 0.1
is therefore it is assumed as
P not given,
as it gives best possible results. 0.9
Conditional Probability
Fig. 10.4.3
Matrix for Example
564 Communication Systems:Analog and Digital
0.8x 0 0
Xn-02x; Ix0.1x
0 0.9x|
[0.267 0
0.067 0.333 0.033
0 0 0.3
[0.267 0 0
0.267
P(YIX) = 0.067 0.333 0.033
0.433 0.433 0.433
0
0.3
0.3
0
=
0.154 0.769 0.077
0 0 1
Note that the second row summation is 1, as desired
The entropies can now be calculated as follows:
H) =
2Px,)log px)
=
HY)2PY,)log plv) k=
= 1.585 bits/message
Information Theory 5665
HOXY) =
22plx,,y
j=lk=1| )log p(x;.j)
(0.267 log 0.267+0.067 log
+0.3 log 3) 0.067 +0.333 log 0.333 +0.033 1log 0.033
1.982 bits/message
HYIX) 2ZPx,-y
j=lk=1|
)og p(i/x,)
(0.267 logl +0.067 log
0.154 + 0.333 log 0.769 +0.033 log 0.077
+0.3 log 1)
0.430 bit/message
The information gained about x, by the reception of y; is the net reduction in its uncertainty, and is
known as mutual information 1(x, Y).
Thus,
y ) = initial uncertainty - final uncertainty
V)
-
1.e.
Thus, we see that mutual information is symmetrical in x, and Yi
y ) = Vx)
[Note: Self-information may be treate as a special case of mutual information when y,= X Thus,
x;x)
=
The average of mutual information, i.e. the entropy corresponding to mutual information, is given hu
KX:) = (x,;V)
P )Ia,;va)
j=l k=l
22P*,y)|logp(x,) - H(X/Yn
2 P(x) log p(x,) - H(X/Y)
j=l
H(X) t
H(Y)
H(X,Y) using Eq. 10.4.7
- (10.5.2)
Example 10.5.1
Find the mutual information for the channel given in (A) Ex. 10.4.2,
(B) Ex 104.3 and (C) Ex
104.4.
Solution
(A) I (X,) = H(X) - H(XIY)= 1.571 - 0.612 -0.959 bit message
As a check,
IXY)=H(Y)-H(YIX) =1.441 -0.482 = 0.959 bit message
= 1.155 bits/message
As a check,
KXY) = H(Y) - H(YX
= 1.585-0.430
= 1.155 bits/message
Noise-free Channel
iet usconsider the communication channel shown in
Fig. 10.5.1. It is known as a noise-free channel.
matrix
channel is shown in Fig. 10.5.l. The joint probability
P y)
PX,Y) is of the diagonal form
Noise-free Channel
Fig. 10.5.1
S68 Communkation Systems: Analog and Digitol
0.. 0
PM) P .V ) 0.
PA.D
(1954)
p Ym)J
unity-diagonal matrices
and the channel probability matrices [P(Y/) and
[P(X/YY] arc
00 0 0
01 0.. 0
PYD= |PND] 1955
H(X, ) =
-22 Pt;,Vn)log p(x,,)n)
j=lk=l|
Pt )= 0 forj #k
Aiso, from Eq. 10.5.4, it can be seen that
P ) P) =PO)
Hence
HY'X) =
H(XIY)=- m (1 log 1) =0
(Because there are m unity terms in the matrix and the remaining terms are zero.)
XY) =
HX) - H(X/) =
H() =
H(Y) =
H(X, ) (10.5.6)
P P P
PCX, Y) = 2P2 P p
(105 7
XPm Pm P
P
X
P P
P2
X2
P2
P2
Pm
Pm
Pm
(a) ()
Fig. 10.5.2 (a) and (b) Channels with Independert Input ond Outp
pX)nP 1 ,2, m
pPX Y) P
Hence,
(1054
plx,, y)= plr,) ply)
for allj and k, ie. input and onput are independen
quation 10.5.8 shows that x, and y, are independent
or the channel shown in Fig. 10.5.2(a).
Now,
22P(x,) P(V)logpOO)
jlk=1
Thus,
H(YIX) =
Pa,)|P%)log P)
Sincep(r,) = 1
-P(y) log P(y;)
k=l
= H(Y)
H(Y/X) =
H(Y) -
H(Y) =0
Equations 10.5.11 state that no information is transmitted through the channel shown in
Fig. 10.5.2(a).
The joint probability matrix for the channel shown in Fig. 10.5.2(b) is
4P P2 Pn
[PX, Y] = *2|P P2 Pn (10.5.12)
mP P2 Pn
It can be seen from Eq. 10.5.12 that
pr) -2P
k=l m
j1,2,m
PV) = mp. k= 1, 2, ..., n
and
P,y) P
Hence
pI,V) plx) ply) (10.5.13)
5.13 shows that
quation 10.5.1 mation Thesry
ar x, and y, are
orthe channel
the channel shown in
Fig. independent for
10.5.2(6). Following the samealij and k,i.e. input and
lc X Y)= 0; 1.e. no information is transmitted procedure, it can be outyut
are
indegendent
shown that this case
through the channel shown in in
x0.15 0.15
30.1 0.1 0.15
= 1.485 bits/message
H - 2 P( )log p(ya
k=l
= - [0.5 log 0.5 +0.5 log 0.5]
= I bit/message
2
HX, Y
=