Information Theory Part1

10
Information Theory
INTRODUCTION
Inforrmation theory is a branch of probability theory, which can be applied to the study of communication
systems. In general, communication of information is statistical in nature and the main aim of
nformation theory is to study the simple ideal statistical communication models. Information theory
invented by communication scientists while they were studying the statistical structure of electrical
communication equipment.
Communication systems deal with the flow of some sort of information in some network. The
nformation may be electrical signals, words, digital data, pictures, music, etc. There are three basic
biocks of a communication system:
) Transmitter or source.
(i) Channel or transmission network which conveys the communique from transmitter to receiver.
(1) Receiver or destination.
Figure 10.1 shows the simplest form of a communication system. In practice, generally, thereare a
number of transmitters and receivers with a complex network. In such cases, it is desirable to study
etficiency is be
the distribution of information in the system. Therefore some sort of transmission to
defined which will lead to the most eficient transmission.
Transmitter Channel Receiver
10.1 A Communication System
the study of the communication

When the communique is readily measurable, such as an electric current,
communique
the is information, the study becomes rather diticult.
system is relatively easy. But, when
an amount of infor.nation? And having
detined a suitable measure, how
1OW to define the measure for
theory answers these
communication of information? Information
can it be applied to improve the
questions.
UNIT OF INFORMATION
10.1
of the
considered here are of statistical nature; i.e. the performance
ne communication systems
sense. It is always described
instatistical terms. Thus,
ystem can never be described in a deterministic
Communication Systems:Analog and Digitol
548
J0. 1s its ynnr
shown in FIg.
the most significant
feature of the
communication
one ol
e
system
prc-specified mess. ictability
random any
or uncertainty. The
transmitter transmits at
When n e conmunication syst he
cach individual message is known.
perormánce, Thus, our
system mode
c
probability transmitting are able to describe its overall average

of
or an lor
1s statistically defined, we
statistical parameter associated
with
virtually a scarch for a
an amount
of information is relevant to the CCr
of uncerlajnty
indicate a relative mcasure
scheme. 1he prarameter should
cach message in the message ensemhl
improbability (which is one of the

of
basic principles oI the media
deg world) "if
world a
The principle helps us in ihiS regard. The prohabhil
bites a man. it s no news, but ifa man bites a dog, its a newN
of a dog biting a man is quite high, so this is not a news, 1.e. Vvery ittle amount of informuti is
communiecated by the message "a dog hites a man". On the other hana, tne probability of a man bitiny
i.e. quite an amount of intormation is communicated

a dog is extremely small, so this becomes a news,
there should be some sort of inverse relationshin
by the message "a man bites a dog". Thus, we sec that
between the probability of an event and the amount of information associated with it. The nare tthe
probability of an cvent, the less is the amount of information associated with it, and vicc versa. This
I)p)
where x, is an event with a probability p(x) and the amount of information associated with it is /c)
(10.1.1)
Now, iet there be another event yk such that x, and yk are independent. Hence, the probability of the
joint event is p(x,. vg) = p x )pvx) with associated information content
(10.12)
Px,) p(yk)
The total information
Ix,,yy) must be equal to the sum of individual informations /(x,) and Vk), wnere
o) py Thus, it can be seen that the function on RHS of Eq. 10.1.2 must be a function
which converts the
operation of
multiplication into addition.
Logarithm is one such function. tns
y)= log
log
PVK
Hence, the basic

equationidetining the amount of information (or
self-information) is
x)log log plr, )
PX) (10.1.3)
Different units of information can be delned lor
unit is a bil, when the base is 'e,
dillerent bases of 2" the
nal , and when the baselogarithms.
he unit is
When the
oa
is /('. the unit is decil o artley.
Information Theory 549
se 2. or
ee
binary system, is of particular
or binary
med as 2 and the unit of information importance. Hence. when no base is

as bit. mentioned, iit is to be
Table 10.1. Conversion of Information Units
MANAAAMMMANAAMMAMAMAAA
www
Unit Bits (hase 2) wwwww

Nats (base e w.wwwww
Decits (base 10)
Bits (base 2) bit = l bit =

log, e log, 10
= 0.6932 nat
= 0.3010 decit
Nats (base e) I nat

In 2 I nat
In 10
oitrish09 s 1eit = 14426 bits
=0.4342 decit
Decits (base 10) 1 decit I decit =
logio logioe
= 3.3219 bits
= 2.3026 nats
log 2 1 loge = 1.4426

log 10 3.3219
In2-0.6932 In e = 1
In 10 2.3026
log102= 0.3010 log10e=0.4342 logiol0 1
**************.**************
10.2 ENTROPY
Acommunication system is not only meant to deal with a single message, but with all
Hence, although the instantaneous informations flows corresponding to individual possible messages.
SOurce may be erratic, we may describe the source in terms of messages from the
average information per individual
message, known as entropy of the source.
It is necessary at this stage to understand the difference between 'arithmetic average' and 'statistical
average.
The concept of 'arithmetic
average' is applicable when the quantities to be dealt with are deterministic
in nature and hence are to be
considered just once. An example is to find the average
in a class. Let there be M students in the class and their height of students
heights be
h,=1,2,3,. M)
The average
height will then be (arithmetic average)
h,
h,= M
Let apply the same definition to a problem involving non-deterministic
us
quantities, like informations associated with every transmitter (probabilistic, statistical)

transmitter messages and message (symbol). Let there be MM
let the information associated with jth message be I x, =1, 2, 3, .., M).
$80 CmNvthwm ystem Anhyg omad Digtat
he vernge unlornalion per Iranuie

Then acconding to the detinition ot arithatie avemge, ed mewwap
will e
is Tlhe fransited messages arc not

sinple. tr
But this detinition is ot comeet. The reason nsmitted
Justonce. (As a conmparison, the height ol every student is considercd just once,) Oue.
ver
givenw a
but the number of limes mess
length of time, they are transmitted many times, smitted
if a messuge oCcurring with probabilitu 0.1 is
depends on its probability of occurrenee. For example,
transmitted 100 times in a given time interval, then another meSsuge occurring with probabilit,
will be transmitted 1SO times in the same timne interval. Thus, il 1s clear that the contribution
the
second message infomation, in the averuge information per message, ill be S0% morc than that of A
of the
fist message infomation. In other words, the contribution of individual message information, inh
the
average infomation per message, is weighted; the weighing lactor being the probability of occurre
rrence
of respective messages
Therefore, the procedure to tind the average will be to lind the total ofall quantities over a sulhcienly
long period of time divided by the number of quantities occurring during that time interval
This is 'statistical average'. It is to be noted that since in communication theory, we have to deal with
statistical quantities and not delerministic quantities, whenever 'average' is referred, it always means
statistical average' and not "arithnmetic average'.
The average information per individual message can now be calculated in the following manner.
Let there be M different messages m, m... mM, with their respective probabilities of occurrences p.
P P Let us assume that in a long time interval, L messages have been generated. Let Lbe very
large so that L>> M; then, the number ofmessages m=pL.
The amount of information in message m = l o g . Thus, the total amount of information in all m
P
messagesPL log P
The total amount of information in all L messages will then be
,PhLlog+Ppl
P
log- t + Pu L log-
P2 PM
The average informalion per message, or
entropy, will then the
H P
log+
P
p, log++ Py log-
P2 PM
M
M
P log Pk
(10.2.1)
k=l
=
-P, log P
Ifthere is only a single possible message, i.e. M=l and p,=p, =1, then
H= P log= I log; =0
P
it can be seen that in the case of a single possible message, the reception of that message conveys
Thus,
no information.
dhe other hand, let there be only one message out ofthe M message having a probability I and all
th
In that case
others O.
M
H 2Px log
k=1 PK
P p log+lim plog+plog-
1 log+0
0
the entropy is In all

for one, which ought to be unity, zero.
Thus, if all the probabilities are zero except
the entropy is greater than 0 as can be seen from Eq. 10.2..
theother cases,
Forabinary system (M=2), the entropy is
H P log+P2
P
log P2
Let
P1 p,then p2 = 1 -P1=1-p=q
Hence,
(10.2.2)
H plog+0-p)log
P
-p)
H(q)
plog+P qlog H(p)
= =
function of p, as in Eq. 10.2.2 is shown in

A plot of H, as a
H
Fig. 10.2.1
max1mum entropy and its value can be found as
The condition for
follows:
l10.2.2 w.r.t. p and equating it to zero yields.
Differentiating Eq.
dH = 0 = - In 2 - l o g P * In2 t log (1 - p)
0.5 IP
dp Has a Function of PP
log (I-P) Fig. 10.2.1
logp
=
i.e.
and Digtal
Communkation
Systems:Analog
552
1.C.
0 . 5
T the e C o n d derivative .
0.3,
i.C.
maxima or a
minina al p maxina.
Now,
in a
that there is
either a then it
ifit is negative,
concludes
This and
minima
positive,
then there is a
d'H 0
p-p
at 0.5
Hence, H has a maximum in it. Thus,
puttingp=0,5
found from Eq. 10.2.2 by
The maximum value of H can be
0.5log2+0.5log2
=
I bit/message
Hm p =05 maximum whenp 0.5, i.e, when both th
2), the entropy is tmaxiu
that for the binary case(M= for an M ary case, the entropy is
We have be shown that
seen
it can
messages are cqually likcly. Similarly, In this cae, the maximum
when all the messages are equally

likely. Thus, P P 2 . PMM
entropy 1s
since P M
k
O
(10.2.3)
Hmax= log Mbits/ message
since there are M terms in the summation.
situation when all messages are equiprobable. In this case,
It is interesting to consider a
PP2 P =PM= M
Hence,
M
M
Average information =2 P a = M
k=
This is same as the arithmetic average. The reason is simple. The relative weight of all the quantities s
same, because they are equiprobable, and hence it is redundant and the definition of arithmetic average
15 applicable.
Thus, it can
be concluded that the statistical all
average is cqual to the arithmetic average w
quantities are equiprobable.]
The important propertues of entropy can now be summarised as follows:

i) log M2H20.
i H=0 if all probabilities are zero, encept for
one, which be
must unity.
i ) H log Mif all the probabilities are cqual s0
that p(x), =p=for
M
all i's.
Inforn on Theory 553
e t u s e x a n i n e
under different cases for M2:
Case 1: P 0.01, p2 0.99, H0.08
Cune II: P 0.4, P2 0.6, H 0.97
Pi 0.5, P2 0.5, H1.00
Case I:
very easy to guess whether the message m, with
n
1,it
case
with a probability P2 0.99) will occur. (Most of the probability p,(0.01) will occur, or the
a
time message m, will

Og he uncertainty is less. In casc II, it is somewhat difficult occur.) Thus,
to guess whether will occur or
accur as their probabilities are nearly equal. Thus, in this case, the uncertainty m,is more. In case
evtrcmely difficult to gucss whether m, or m, will occur, as their probabilities are equal. Thus,
ie case, the uncertainty is maximum. We have seen that the entropy is less when uncertainty is less
dit is more when uncertainty is more. Thus, we can say that entropy is a measure of uncertainty
Example 10.2.1
Aquartenary source generates information with probabilities p = 0.1, p = 0.2, p, = 0.3 and
p 0.4. Find the entropy of the system. What percentage ofmaximum possible information is being
generated by this source?

Solution The entropy H is
H P log
k=l Pk
10
= 0.1 log 10 +0.2 log5 +0.3 log+0.4 log 2.5
= 1.8464 bits/message
The maximum possible information max 1S
HmaxlogM
= log 4
= 2 bits/message
ence, the percentage of Hnax generated by the source
H ux
Hx100
1.8464
x100
2
= 92.32%
INFORMATION
10.3 RATE OF
information (or
the rate of
ofr per second, is the
messages
second. Now, H
Ifa message sso u r c e generates messages atthenumber
rate
of bits of
information per
info
info ge defined as the average
o n rate) R is Hence,
ava information per message.
eTage number of bits of
554 Communication Systems:Analog and Digital
R rH bits/sccond
and r2 messages per secondte
(10.3.1)
Let us consider two sources of equal entropy H, generating r, the second source will iran spectively.the
The first source will transmit the information at a rate Ri = riHand
Thus given period, more infa in a
intormationar a rate R, =r,H. Now, if> r, then R>R. ion is
grcater demands on the comm.
transmitted from the first source than the second source, placing
channel. Hence, the source is not deseribed by its entropy
alone but also by its ol inform
rate of
its rate ommunication
inlormation.
on
Sometimes, R is referred to as bits/second entropy, and H is referred to as bits/message entr

tropy
Example 10.3.1
has six with the probahilities p,
=
/2, P2 1/4, p3 = 1/8,
p
An event possible outcomes
= 1/32 amdps= 1/32. Find the entropy ofthe system. Also find the rale of information if there ar
p/16,
re
16 outcomes per second.
Solution The entropy H is
H P log
k=l Pk
log
2 2+log 4+log8+ 16 logl6+log
4 32
32+log
32
32
bits/message
Now, r= 16 outcomes/second
Hence, the rate of information R is
R = rH
16x 6 =
31 bits/second
Example 10.3.2
A continuous signal is band limited to 5 kHz.The signal is
with the
probabilities 0.25, 0.2, 0.2, 0.1, 0.1, 0.05, 0.05 and
quantized in 8 levels of a PCM system
ofinformation. 0.05. Calculate the
entropy and the raunce
Solution The signal should be
sample is then sampled at frequency a
get
quantized to one of the eight levels. 5 x 2 10
=
(Sampling theorem).
kHz theoren. Each
Looking at each
quantized level as a messa We
H -(0.25 log 0.25 +0.2

log 0.2 +0.2 log
log 0.2 log 0.1
+0.05 log
0.05 0.05 log 0.05 +
+
0.2+0.1
+0.1 log 0.1 + 0.1
log 0.T
2.74 bits/message 0.05 log 0.05)
Ac the sampling frequency is T0 KHZ, the message rate = 10,000 messages/sec. Hence, the rate of
information is
R =
rH= 10,000 x 2.74 27,400 bits/second
Example 10.3.3
n analog source has a 4 ki: BW. The signalis sampled at 2.5 times the Nvquist rate. Each sample
is quantized into 256 equally likely levels. The successive samples are statisticaly independent.
What is the information rate of the source? IES 2003]
Solution The BW of the signal is
Jm4 kHz
Nyquist frequency = 2/m
or Js min8 kHz
The sampling rate = 2.5 x Nyquist frequency
or S 2.5 x 8
20 kHz
The number of quantization levels = 256
Ifn is the numbers of bits in a word, then

2"= number of quantization levels
256
n= 8
Hence,
The information rate =
size of the word x sampling rate
nxJs
= 8 x20
= 160 kbps
Example 10.3.4
generating information
as given below:
Two s o u r c e s are
Source 1 pi
=
Pz =P P4 =
Souree2P h =P -P
-
The message rates are respecuvey 200 and 250 messages per second. Compare H andR of the
sources.
S56 Communicatvion Syems: Andlog ond Diginal
Solution The entropies H and H, are
k=1 P
= 2 bits/message
H, =
2P
k=1 P
=
1.75 bits/message
The information rates R, and R, are
R,=rH
200x 2
400 bits/second
R rH2
= 250 x 1.75
= 437.5 bits/second
is greater than the second and the information rate R of

Thus, the entropy H of the first source source
the second source is greater than the first source.
10.4 JOINT ENTROPY AND cONDITIONAL ENTROPY

So far we have considered only a single probability scheme (known as one-dimensional probubility
scheme) and its associated entropy. This probability scheme may pertain either to the transmitterorta
the receiver. Thus we can study the behaviour of either the transmitter, orthe receiver. But, to study the
behaviour of a communication system, we must simultaneously study the behaviour of the transmite
and the receiver. This givesrises to the concept of a two-dimensional probability scheme. The res
of a one-dimensional probability scheme may be extended to a two-dimensional probability schemc.
which may further be extended to study any finite-dimensional probability scheme.
Let there be two finite discrete sample spaces S, and S, and let their product space be S = S, S; ( S
Fig. 10.4.1)
Let
and
X
(a) Sample space S (6) Sample space S c) Sample space S = S, S2
Fig. 10-4.1 Finite Fiscrete Sample Spaces
he the sets of events in S and S2, respectively. Each event x, of S, may occur in conjunction with any
event y, of S2. Hence, the complete set of events in S= S, S, is
X12
XY =
21 X22
mY2 XmYn
Thus, we have three sets of complete probability schemes

p(X) = pl)
p(XY) = (pl;, v)]
A probability scheme () is said to be complete

when 2 P(x,) =
I]
We have three complete probability schemes

and naturally there will be three associated entropies.
H(X) = -2P7,) log Pa (10.4.1)
where
Ps) Pt,,ya)
k=l
(10.4.2)
H(Y) =
2PV»)log p(y)
k=l
where
PO)- 2Px,,)
j=l
(10.4.3)
HXY)= -22ptx,,n)log p(x, V)
j=lk=l|
and .
are marginal entropies of X and Y, respectively, and H(XY) is the joint entropy of X
L C and H(Y)
H
Communicotion Systems:Analog and Digital
558
The conditional probability p(X/) is given by
PX) =
PX,Y)
p(Y)
We know that y, may occur in conjuction with xi, 2... Xm Thus,
Xly=
and the associated probability scheme is
p(XIy) = [pr/y) plrz!y)..pordy)
Pl) p(T2,y)
P(V) P(y) P(V) (10.4.4)
Now,
p t , )+ p l r z , y) + +plm V) = PO)
Hence Px,/y) = 1
j=l|
Therefore, the sum ofelements of the matrix given by Eq. 104.4 is unity. Hence the probability scheme
defined by Eq. 10.4.4 is complete. Therefore, an entropy may be associated with it. Thus,
HXly)-
H(XIy) P =2 pla,,y)
j=l D()
PO) p(y)
-
j=l
P/y»)log p(z, /y. )
We may take the average of this
conditional entropy for all admissible values of y, in order to obtan
measure of an average conditional ofentropy the system.
H(XI Y) =
H(X/y;)
Po%) H(XIy)
k =1
--
p ) j=1
k =1
E ptx,/y) log p(x, /»)
22 PVi) px, /y,) log p(x, /y)
j=l k=1
2j=lk=1
2 P*, »n) log (1045)
p(x, !y)
Informatio Theory 559
be shown that
similarly, it can
H(YIX)=-2
2 P ,V)log p(y
j=l k= 1
/x) (10.4.6)
Y and H(Y/X) are average conditional entropies, or
simply conditional entropies.
This,in all, there are five entropies associated with a two-dimensional probability scheme. They are:
HX), H(Y), H(X,Y), H(XIY) and H(Ylx).
Now, let X represent a transmitter,

and Ya receiver. The
following interpretations of the different
entropies for a two-port communication system can be derived:
HX): Average information per character at the transmitter, or entropy of the transmitter.
H(Y): Average information per character at the receiver, or entropy of the receiver.
H(X,Y): Average information per pair of the transmitted and received characters, or average uncertainty
of the communication system as a whole.
HXIY): A received character yk may be the result of transmission of one of the xs with a given
probability.The entropy associated with this probability scheme, when y, covers all received symbols,
ie. H(XI yk) is the conditional entropy HXIY); a measure of information about the transmitter,
where it is known that Y is received.
H(YIX): A transmitted character x, may result in the reception of one of the y,s with a given probability.
The entropy associated with this probability scheme, when x, covers all transmitted symbols, i.e.
H(Y/x,) is the conditional entropy H(YIX); a measure of information about the receiver, where it is
known that Xis transmitted.
HX) and H(Y give indications ofthe probabilistic nature ofthe transmitter and the receiver, respectively.
HXIY) indicates how well one can recover the transmitted symbols, from the received symbols; i.e. it
gives a measure of equivocation. H(YIX) indicates how well one can recover the received symbols from
the transmitted symbols; i.e. it gives a measure of error, or noise. The meaning of H(X/Y) and H(YX)
will be more clear after we study mutual information.
The relationship between the different entropies can be established as follows:
H(XY) =
22 px
j=lk =1
) log p(x,, )
22 plx,.v)loglpa,/y)p(»%
j=\k = I
2 Px, v) [log p(x, Iy)+ p(V«)

560 Communication Systems:Analog and Digltal
ElPx. n) log p,/ y)t pPW,,n) log p(y,

/Ik
H(XY)- 22 p(x, ya) log p(V)

Ik
H(X/Y)- P(xy)| log p()

n
-
H(X Y)- 2 P(V) log p(y»)

k
=
H( 7Y)+ H(Y)
Similarly, it can be shown that (10.4.7)
H(XY) =
H(YIX) + H(X)
Example 10.4.1 (10.4.8)
Complete the following probability matrix in
all possiblele ways:
wayS
[0.1 a 0.2 0.4

0.3 0.1 b 0.5
c0.4 0.1 0.3
0.2 0.2 0.1 d
Soludon The given matrix
is
Moreover, as the sum of all the
a
two-dimensional matrix as it is neither a row nor a column matrix.
probability matrix. given entries is 2.9 (which is
greater than 1), it cannot be a
join
As the sum of all the
given entries
in each row of the
probability matrix P(YIX) with the value matrix is less than 1, it can be a
1.Thus, when a 0.3, b =
0.
of a, b, c and d so chosen that
the sum of each row
conditotha
1,=
0.2
given matrix is the conditional-metricbeconmes
c and =
d= 0.5, the
Again, as the sum of all the given entries in the
I), it cannot be a fourth column of
the matrix P\a
conditional-probability matrix P(X/Y). is 1.2 (which Is
greaict
Thus, theTe is
only one possible way to
complete the given probability matrix and it is:
0.1 0.3 0.2 0.4\
0.3 0.1 0.1 0.5
0.2 0.4 0.1 0.3 P(YIX)
0.2 0.2 0.1 0.5
infortmenion Theory 56
Kampla 10.4.2
AdiOrete ouoe wmsmlla message Xp X, (mdxy with theprohubilities 0.3, (0A and 0.3, The source
a annested to the ehannel given n ig 104.2, Caleulate all the entrospies
soluton igure 1042 gves the unditinmal prhability matriz PMIK)as
Y
0.2
0.8 02 1
PYIX)
.3
0 0.3 0.7
0.7
Ys
AIsen given is
Conditional Probability Matrix
Fig, 10.4.2
PCX)- [0.3 04 0.3 for Exarnple 10.4.2
the rows of P(YIX) by plx), plxz)

The joint probabiliiy natrix PX,Y) can be obtained by multiplying
anddx ),i.e, by 0,3, 0,4 and O.3, rospectively. giving
0.M0.3 0.20.3
PMX,Y) x 04
0 0.3x0.3 0.7x0.3
0.24 0.06 0
x0 0.4 0
x0 0.09 0.21
be that the sum ofall the entries in P(X,) is 1.)

(AnA check, it cnn seen
The probabilitien p(y, ). p(v,) and py) can be obtained by adding the columns of P(X,) giving
0.21
py)-0.24. pV,)=0.06 +0.4+0.09 =0.55, py,)
=
the columns of P(X,) by py1),

The conditional-probubility matrix P(X/Y) can be obtained by dividing
Po,) und py), respectively, giving
0.109 0
PXTY) 0 0.727 o
0 0.164 1
(Check: The sum of all the columns of PXIY) is 1.)
The be caleulated as follows:

entropics can now
H(X) 2 Px,)log plx,)

S62 Comwunicaian Systeme Anag and igitad
(0 og0 04 og 04 03 log 0 3)
571bite/messag
2 V) log pMy)
1024 og 0.24 0.55 log 0.55 0,21 log0,.211
1441bits/nessnge
1MA,Y)
i - I k= 1
10.24 log 0.24 0.06 log 0.06 0.4 log 0.4
0,09 log 0.09 +0.21 log 0.21]

- 2.053 bits/mossago
MN/Y)- 22 Mx.
=Ik =1
y) log plx, /y)
10.24 log +0.06 log 0. 109 0.4 log

0.727 0.09 log 0.164
0.21 log 1
- 0.612 bit/message
HY) 2
J=Ik =1
p(x,,>) log ply, /x)
-[0.24 log 0.8 +0.06 log 0.2 +0.4 log 1+0.09 log 0.3 021 0.1
log
0.482 bit/message
(Note:HXIY) and HMYIX) can also be found
is by using Eqs 10.4.7 and 10.4.8,
casier way, because the
work is relatively
knowledge of P(YIX) and P(XTY) is not respectively. This, in at.
simnple.) necessary and the computanoal
Example 10.4.3
A transmiter has an
alphabet of Jour letters [x, x, x,
leters Iy, y, Vl.
The joint probabilty matrix is Xd and the receiver has an
alphudet of tnree
0.3 005 01
P(X,Y) | 0 0.25 o
0. 15 0.05
Caleulate all the entroples.
x
0 0.05 0.15
Solution
The probabilities of the transmitter symbols are found by a summation of rows and the
nrobabilities of the recciver symbols are found by a summation of columns of the matrix PX,).
Thus,
P(X) = [0.35 0.25 0.2 0.21
and
P(Y = [0.3 0.5 0.2]
The entropies can now be calculated as follows:
HX) = 2 Px,)logplr,)
j=l
= [0.35 log 0.35 + 0.25 log 0.25 +0.2 log 0.2 + 0.2 log 0.2]
=1.96 bits/message
p)=-[0.3 log 0.3 + 0.5 log 0.5 + 0.2 log 0.2]

H)
H(Y =-2 Po) log
=
k =1
1.49 bits/message
HX = -
22p«, ) log p,)

j=1k=1
0.1
-
[0.3 log 0.3 +0.05 log 0.05 +0.25 log 0.25 +0.15 log
+0.05 log 0.05 + 0.05 log 0.05+0.15 log 0.15]
=2.49 bits/message
1.49= 1.00 bit/message
H(XIY) =
HX, Y)-H(Y =2.49
H(YIX) =
HX, Y)-H(X) 2.49
= 1.96 = 0.53 bit/message
Example 10.4.4
104.3. Calculate all the entropies.
A discrete source is connected to the channel given
in Fig
P(XT) as
Solution Figure 10.4.3 gives the conditional probability matrix
2 Y'3
0.80.8 0 0 0.8
P(XIY) =x 0.2 I 0.1 0.2

0 0 0.9 o Y2
2
n- 0.1
is therefore it is assumed as
P not given,
as it gives best possible results. 0.9
Conditional Probability
Fig. 10.4.3
Matrix for Example
P(X) will then be
0.8x 0 0
Xn-02x; Ix0.1x
0 0.9x|
[0.267 0
0.067 0.333 0.033
0 0 0.3
This gives P(X) = [0.267 0.433 0.3]
P(YIX) will then be
[0.267 0 0
0.267
P(YIX) = 0.067 0.333 0.033
0.433 0.433 0.433
0
0.3
0.3
0
=
0.154 0.769 0.077
0 0 1
Note that the second row summation is 1, as desired
The entropies can now be calculated as follows:
H) =
2Px,)log px)
=
-0.267 log 0.267 +0.433 log 0.433 +0.3

=1.552 bits/message log 0.3)
HY)2PY,)log plv) k=
HOXY) =
22plx,,y
j=lk=1| )log p(x;.j)
(0.267 log 0.267+0.067 log
+0.3 log 3) 0.067 +0.333 log 0.333 +0.033 1log 0.033
1.982 bits/message
HYIX) 2ZPx,-y
j=lk=1|
)og p(i/x,)
(0.267 logl +0.067 log
0.154 + 0.333 log 0.769 +0.033 log 0.077
+0.3 log 1)
0.430 bit/message
HCXIY) =-22p(x,.y )log p(x, /y)

=1k=1
-(0.267 log 0.8 0.067 log 0.2 +0.333 log 1 +0.033 log 0.1 +0.3
log 0.9)
= 0.397 bit/message
Note that these values ofentropics satisfy the equations 104.7 and 10.4.8
10.5 MUTUAL INFORMATION

We are interested in the transfer of information from a transmitter through a channel to a receiver.
Prior to the reception of a message, the state of knowledge at the receiver about a transmitted symbol
be
1s the probability that x, would selected for transmission. This is
apriori probability plx). After
the reception and selection of the symbol yi, the state of knowledge concerning x, is the conditional
probability plx,/y) which is also known as a posteriori probability. Thus, before y, is received, the
uncertainty is
- log p(x)
After y is received, the uncertainty becomes

- log p(x,yD
The information gained about x, by the reception of y; is the net reduction in its uncertainty, and is
known as mutual information 1(x, Y).
Thus,
y ) = initial uncertainty - final uncertainty
V)
-
log pr)--log plx,/vD]= log P P)

p,i)== log
log
PVx)
P *=IV: x)
=
log-px,) pOV) PO)
1.e.
Thus, we see that mutual information is symmetrical in x, and Yi
y ) = Vx)
[Note: Self-information may be treate as a special case of mutual information when y,= X Thus,
x;x)
=
log pxX= = lo8 x)

p(x,)
log -
The average of mutual information, i.e. the entropy corresponding to mutual information, is given hu
KX:) = (x,;V)
P )Ia,;va)
j=l k=l
2Plr, yp)log 2uly)

j=l k=1 Px)
=
2 2px, y)[log p(x,/y)- log p(x,)]
j=lk=1
E P;, ye) log p(x)

j=lk=l|
-|-2 P,) log ps,/y
22P*,y)|logp(x,) - H(X/Yn
2 P(x) log p(x,) - H(X/Y)
j=l
- HCX)- H(XIY (10.5.1)
H(X) t
H(Y)
H(X,Y) using Eq. 10.4.7
- (10.5.2)
HY)-H(YIX) using Eq. 10.4.8 (10.5.3)
IX;Y) does not depend on the individual

symbols x, or yg; it is a property of the whole
cation
system. On the other hand commuar

(x,, y) depends on the individual symbols x, or y. In practice are
interested in the whole communication system and not in individual

symbols. For this reason, a
(X;) is an average mutual intormafion, it 1s usually referred as mutual information.
indicates a measure of the intormalion transferred frn
information or transinformation of the channel.
through the channel, it is also known as i"u
emticr Theary 567
Eauations 10.5.1 and 10.5.3
explain the
10.5.1 states that themeaning
of
conditionai entropies
respectively. Equation transferred information KX.) is HX T)to the HYX),
and
source information minus the
average uncertainty that still remains about theequal averag
words, HXIY) the average additional information
is messages. in other
needed at the receiver after
to completely specity the message sent. in rder
Thus, HCX Y) gives the informaticn lost inrecaption
the
is also known as equivocation. Equation l10.5.3 states that channel This
the transferred information KX:H) 1s
to the receiver entropy minus that eqal
part of the receiver entropy which is not the information about the
source. Therefore, H(YX) can be considered as the noise
measure of noise
entropy added in the channel. Thos. HYX)is
a or error due to the channel.
Example 10.5.1
Find the mutual information for the channel given in (A) Ex. 10.4.2,
(B) Ex 104.3 and (C) Ex
104.4.
Solution
(A) I (X,) = H(X) - H(XIY)= 1.571 - 0.612 -0.959 bit message
As a check,
IXY)=H(Y)-H(YIX) =1.441 -0.482 = 0.959 bit message
(B) XY = H(X) - H(XIY) =1.96-1.00 0.96 bit/message

As a check,
IX; Y)=H(Y - H(Yx) = 1.49-0.53 0.96 bit/message
IX) = HX) - H X )
(C)
= 1.552-0397
As a check,
KXY) = H(Y) - H(YX
= 1.585-0.430
Noise-free Channel
iet usconsider the communication channel shown in
Fig. 10.5.1. It is known as a noise-free channel.
In such channels, there is a correspondence between

one-to-one
received as one and
input and output, i.e. each input symbol is
the only one output symbol. Also, n m. A discrete noise-free
=
matrix
channel is shown in Fig. 10.5.l. The joint probability
P y)
PX,Y) is of the diagonal form
Noise-free Channel
Fig. 10.5.1
S68 Communkation Systems: Analog and Digitol
0.. 0
PM) P .V ) 0.
PA.D
(1954)
p Ym)J
unity-diagonal matrices
and the channel probability matrices [P(Y/) and
[P(X/YY] arc
00 0 0
01 0.. 0
PYD= |PND] 1955
rom Bq 10.54, itcan be seen that
H(X, ) =
-22 Pt;,Vn)log p(x,,)n)
j=lk=l|
2 Pt,,)log pl¥;, Yj)

j=l
Since
Pt )= 0 forj #k
Aiso, from Eq. 10.5.4, it can be seen that
P ) P) =PO)
Hence
H(X, Y) = HX) = H(Y
From Eq. 10.5.5, weget
HY'X) =
H(XIY)=- m (1 log 1) =0
(Because there are m unity terms in the matrix and the remaining terms are zero.)
Thus, for a noise-free channel,
XY) =
HX) - H(X/) =
H() =
H(Y) =
H(X, ) (10.5.6)
Channel with Independent Input and Output

In such channels, there is no correlation between the
input and output symbols.
Let us consider the channel shown in Fig. 10.5.2(a). The joint probability matrix in this case is
manoei thes
P P P
PCX, Y) = 2P2 P p
(105 7
XPm Pm P
P
X
P P
P2
X2
P2
P2
Pm
Pm
Pm
(a) ()
Fig. 10.5.2 (a) and (b) Channels with Independert Input ond Outp
It can be seen from Eq. 10.5.7 that
pX)nP 1 ,2, m
pMy) 2P, .k-1,2,

j
and
pPX Y) P
Hence,
(1054
plx,, y)= plr,) ply)
for allj and k, ie. input and onput are independen
quation 10.5.8 shows that x, and y, are independent
or the channel shown in Fig. 10.5.2(a).
From Eq. 10.5.8, we get

(105.
plx.) pAx) or p/y)=MI)
py)
and
10510
p h =dy) or y)PY)
px,)
Now,
H(YUX)=22Plt,,) log p(yelx

jlk=l
22P(x,) P(V)logpOO)
jlk=1
Thus,
H(YIX) =
Pa,)|P%)log P)
Sincep(r,) = 1
-P(y) log P(y;)
k=l
= H(Y)
Similarly, it can be shown that

H(XIY = H )
Hence, for the channel shown in Fig. 10.5.2(a), we have

I(X;Y)= H(X) - H(X/Y) = H(X) - H(X) = 0
(10.5.11)
I(X;Y) =
H() -
H(Y/X) =
H(Y) -
H(Y) =0
Equations 10.5.11 state that no information is transmitted through the channel shown in
Fig. 10.5.2(a).
The joint probability matrix for the channel shown in Fig. 10.5.2(b) is
4P P2 Pn
[PX, Y] = *2|P P2 Pn (10.5.12)
mP P2 Pn
It can be seen from Eq. 10.5.12 that
pr) -2P
k=l m
j1,2,m
PV) = mp. k= 1, 2, ..., n
and
P,y) P
Hence
pI,V) plx) ply) (10.5.13)
5.13 shows that
quation 10.5.1 mation Thesry
ar x, and y, are
orthe channel
the channel shown in
Fig. independent for
10.5.2(6). Following the samealij and k,i.e. input and
lc X Y)= 0; 1.e. no information is transmitted procedure, it can be outyut
are
indegendent
shown that this case
through the channel shown in in
enCe it can be said that in the

case of a
Fig. 105, 206)
is transmitted through it, i.e. I(X;
channel with
Y) 0. =
an
independent input and output, 10 information
Note: A channel is said to be with
an
independent
satisfies following conditions: input output when the joint prohability matrix
at least one of the and
(i) Each consists of the same element.
row
i) Each column consists of the same
element.]
Example 10.5.2
Find the mutual information for the channel shown in Fig 10.5.3.
Solution The joint probability matrix for the channel shown in Fig. 10.5.3 is
0.25
0.25 0.25 0.25
P(X, Y] =
x0.15 0.15
30.1 0.1 0.15
From P(X, Y, we get

0.15
P(x) = 0.25 +0.25 =0.5;
0.
Px)=0.15 +0.15 0.3;
Plx3) = 0.1 +0.1 =0.2 0.I
Fig. 10.5-3 Channel for Exampie ra.5

and
Po) P(y,) =0.25+0.15+0.1 =0.5

Hence,
3.
H) = 2 P(x,) log p(x,)
j=l
= -[0.5 log 0.5 +0.3 log 0.3 +0.2 log 0.2]
H - 2 P( )log p(ya
k=l
= - [0.5 log 0.5 +0.5 log 0.5]
= I bit/message
2
HX, Y
=
-22 P(x,,)v) log plx,,v)

j=l k=l

Information Theory Part1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Information Theory Part1

Uploaded by

Copyright:

Available Formats

10

defined which will lead to the most eficient transmission.

Transmitter Channel Receiver

10.1 A Communication System

the study of the communication

probability transmitting are able to describe its overall average

improbability (which is one of the

i.e. quite an amount of intormation is communicated

Hence, the basic

med as 2 and the unit of information importance. Hence. when no base is

Unit Bits (hase 2) wwwww

Bits (base 2) bit = l bit =

Nats (base e) I nat

log 2 1 loge = 1.4426

quantities, like informations associated with every transmitter (probabilistic, statistical)

he vernge unlornalion per Iranuie

is Tlhe fransited messages arc not

the entropy is In all

Forabinary system (M=2), the entropy is

function of p, as in Eq. 10.2.2 is shown in

when all the messages are equally

The important propertues of entropy can now be summarised as follows:

time message m, will

generated by this source?

The maximum possible information max 1S

ence, the percentage of Hnax generated by the source

Sometimes, R is referred to as bits/second entropy, and H is referred to as bits/message entr

Solution The entropy H is

Hence, the rate of information R is

H -(0.25 log 0.25 +0.2

Ifn is the numbers of bits in a word, then

Solution The entropies H and H, are

The information rates R, and R, are

is greater than the second and the information rate R of

the second source is greater than the first source.

10.4 JOINT ENTROPY AND cONDITIONAL ENTROPY

(a) Sample space S (6) Sample space S c) Sample space S = S, S2

Fig. 10-4.1 Finite Fiscrete Sample Spaces

Thus, we have three sets of complete probability schemes

p(XY) = (pl;, v)]

A probability scheme () is said to be complete

We have three complete probability schemes

H(X) = -2P7,) log Pa (10.4.1)

The conditional probability p(X/) is given by

and the associated probability scheme is

p(XIy) = [pr/y) plrz!y)..pordy)

Now, let X represent a transmitter,

The relationship between the different entropies can be established as follows:

2 Px, v) [log p(x, Iy)+ p(V«)

ElPx. n) log p,/ y)t pPW,,n) log p(y,

H(XY)- 22 p(x, ya) log p(V)

H(X/Y)- P(xy)| log p()

H(X Y)- 2 P(V) log p(y»)

[0.1 a 0.2 0.4

a annested to the ehannel given n ig 104.2, Caleulate all the entrospies

soluton igure 1042 gves the unditinmal prhability matriz PMIK)as

the rows of P(YIX) by plx), plxz)

be that the sum ofall the entries in P(X,) is 1.)

the columns of P(X,) by py1),

The be caleulated as follows:

H(X) 2 Px,)log plx,)

1024 og 0.24 0.55 log 0.55 0,21 log0,.211

10.24 log 0.24 0.06 log 0.06 0.4 log 0.4