Lect3 - 2021 IT

Measure of Information
UNIVERSITY OF MINES AND TECHNOLOGY
INFORMATION THEORY
Course Instructor Dr. Abdel-Fatao Hamidu
Computer Science and Engineering Department
March 15, 2022
Prepared by Dr. Abdel-Fatao Information Theory 1/29

Recap of MoI
Some Point to Remember
• The number of bits used to represent a message is completely

different from the amount of information it is conveying
• Intuitive Feel
• The occurrence of a less probable event conveys more
information
• Since a lower probability implies a higher degree of uncertainty
(and vice versa), a random variable with a higher degree of
uncertainty contains more information
• This correlation between uncertainty and information shall form
the basis of all physical interpretations

Measure of Information (MoI)
Uncetainty and Information

• Consider a discrete random variable X with possible outcomes
xi where i = 1, 2, ..., n
• The self information of the event X = xi is defined as
1
I(xi ) = log( ) = −log P(xi ) (1)
P(xi )
• When the base of the logarithm is 2 the units of I(x) are in bits
• When the base is e, the units of I(x) are in nats (natural units)

Example 1
• Consider a binary source A which tosses a fair coin
• It produces an output equal to 1 if a head appears and a 0 if a
tail appears
• What is the information content of each output?

Solution
• For the source, P(1) = P(0) = 0.5
• The information content of each output from the source is
I(xi ) = −log2 P(xi )

= −log2 (0.5) = 1 bit
• This is consistent with intuition, since the output of the binary

source being a fair coin can be represented with one bit (1 for
head and 0 for a tail)

Example 1 contd
• Suppose the successive outputs from this binary source are
statistically independent, i.e. the source is memory less
• Consider a block of m binary digits
• There are 2m possible m-bit blocks, each of which is equally
probable with probability of 2−m
• The self information of an m-bit block is

= −log2 2−m = m bits

Example 1 contd
• Suppose the successive outputs from this binary source B are
statistically independent, i.e. the source is memory less
• Consider a block of m binary digits
• There are 2m possible m-bit blocks, each of which is equally
probable with probability of 2−m
• The self information of an m-bit block is

= −log2 2−m = m bits
• Again, we observe that we indeed need m bits to represent the

possible m-bit blocks

Example 2
• Consider a discrete memoryless source C that generates two
bits at a time
• This source comprises of two binary sources (A and B), each
contributing one bit
• The two binary sources within the source C are independent
• What is the information content of the aggregate source C

Solution
• Intuitively, the information content of the aggregate source C
should be the sum of the information contained in the outputs
of the two independent sources that constitute this source C
• Since A and B are independent
P(C) = P(A)P(B) = 0.5 × 0.5 = 0.25
I(C) = −log2 P(xi ) = −log2 (0.25) = 2 bits
• The answer is again consistent with intuition

Example
• Given a bag containing 3 green, 4 red and 2 yellow balls, what is
the average surprise associated with choosing a ball at random
from the bag.
⊣ What is the information gained by choosing a green ball?
⊣ Differentiate between the two types of information
obtained.

Example
• Calculate the entropy of fair coin.
⊣ Entropy of a biased coin that comes up 75% heads
⊣ What is the entropy of the coin if somehow it is incapable
of landing tails

Why the Logarithm?

Considering two independent sources
• Independent events =⇒ Probabilities multiply
• Independent sources =⇒ Information must add up
• Logarithm seems to do the job

Why the Logarithm?

Visual Representation of the Independent Sources

Why the Logarithm?

• Note that coin A may be a fair coin whiles coin B is a biased
coin
• In that case, the rate of information generation buy coin A is
not the same as that of coin B
• The rate of information generation is amount of information
generated (number of bits) per second

Mutual Information
• Consider two discrete random variables X andY with possible
outcomes xi where i = 1, 2, ..., n and yj where j = 1, 2, ..., m
respectively
• Suppose we observe some outcome Y = yj and we want to
determine the amount of information this event provides about
the event X = xi ∀i = 1, 2, ..., n
• That is, we want to mathematically represent the mutual
information

Mutual Information
• Consider two discrete random variables X andY with possible

outcomes xi where i = 1, 2, ..., n and yj where j = 1, 2, ..., m
respectively
• Suppose we observe some outcome Y = yj and we want to
determine the amount of information this event provides about
the event X = xi ∀i = 1, 2, ..., n
• We want to mathematically represent the mutual information
• Found in applications such as DNA sequencing, where Y could

be profile of a disease and X, the genetic code
Mutual Information
Note the following
• If X and Y are independent events, the occurrence of Y = yj
provides no information about X = xi
• If X and Y are dependent events, the occurrence of Y = yj

determines the occurrence X = xi

Mutual Information – Transinformation

• Definition: The mutual information I(xi , yj ) between xi and
yj is defined as
( )
P(xi |yi )
I(xi ; yj ) = log (2)
P(xi )
• Mutual information is a measure of how much information can

be obtained about one random variable by observing another
• As before, the units of I(x) are determined by the base of the
logarithm, which is usually selected as 2 or e
• When the base is 2, the units are in bits


Mutual Information
• Representation
( )
P(xi |yi )
I(xi ; yj ) = log
P(xi )
• Observe that
P(xi |yi ) P(xi |yi )P(yi ) P(xi , yi ) P(yi |xi )
= = =
P(xi ) P(xi )P(yi ) P(xi )P(yi ) P(yi )
• Therefore there exist a two way relationship

( ) ( )
P(xi |yi ) P(yi |xi )
I(xi ; yj ) = log = log = I(yj ; xi )
P(xi ) P(yi )
• It is symmetric
Physical Interpretation of Mutual Information
The case of two extremes

• When the random variables X and Y are statistically
independent, P(xi |yj ) = P(xi ) which leads to I(xi ; yj ) = 0
• When the occurrence of Y = yj uniquely determines the
occurrence of the event X = xi , P(xi |yj ) = 1, the mutual
information becomes
1
I(xi ; yj ) = log = −logP(xi )
P(xi )
• This is the self information of the event X = xi

Mutual Information - A binary symmetric channel

(BSC)

Mutual Information - BSC

P(Y = 0) = P(X = 0)P(Y = 0|X = 0) + P(X = 1)P(Y = 0|X = 1)
= 0.5(1 − p) + 0.5(p) = 0.5
P(Y = 1) = P(X = 0)P(Y = 1|X = 0) + P(X = 1)P(Y = 1|X = 1)

= 0.5(p) + 0.5(1 − p) = 0.5
( ) ( 1−p )
P(Y=0|X=0)
I(x0 ; y0 ) = I(0; 0) = log2 P(Y=0)
= log2 0.5
= log2 2(1 − p)
( ) ( )
P(Y=0|X=1) p
I(x1 ; y0 ) = I(1; 0) = log2 P(Y=0)
= log2 0.5
= log2 2p


CASE 1
• Suppose p = 0, it is an ideal channel (noiseless)
• In that case
I(x0 ; y0 ) = I(0; 0) = log2 2(1 − p) = 1 bit
• Hence having observed with certainty the output, we can
determine what was transmitted
• Recall that the self information about event X = x0 was 1 bit

CASE 2
• If p = 0.5, we obtain
I(x0 ; y0 ) = I(0; 0) = log2 2(1 − p) = log2 2(0.5) = 0 bits
• This implies that having observed the output, we have no
information about what was transmitted
• Thus, it is a useless channel
⊣ For such a channel, there is no point in observing the
received symbol and trying to make a guess as to what
was sent
⊣ Instead we can as well toss a fair coin at the receiver in
order to estimate what was sent

Variation of Mutual Information with Probability in BSC

Variation of Mutual Information with Probability in BSC

• The lower the probability of the channel, the higher the mutual
information
• At probability p = 0.5, the mutual information is 0
• At probability p > 0.5, the mutual information becomes
negative
• The negative mutual information implies that, the channel
might have been in error
⊣ The channel might have sent a 0 instead of a 1 and vice
versa

Mutual Information - Binary Channel
From the channel transition probabilities we have

P(Y = 0) = P(X = 0)P(Y = 0|X = 0) + P(X = 1)P(Y = 0|X = 1)
= 0.5(1 − p0 ) + 0.5(p1 ) = 0.5(1 − p0 + p1 )
P(Y = 1) = P(X = 0)P(Y = 1|X = 0) + P(X = 1)P(Y = 1|X = 1)
= 0.5(p0 ) + 0.5(1 − p1 ) = 0.5(1 − p0 + p1 )

Mutual Information - Binary Channel
The mutual information about the occurrence of the event X = 0

given that Y = 0 is
( ) ( 1−p ) ( )
P(Y=0|X=0) 2(1−p0 )
I(x0 ; y0 ) = I(0; 0) = log2 P(Y=0)
= log2 0.5
= log2 1−p0 +p1
( ) ( ) ( )
P(Y=0|X=1) p 2p1
I(x1 ; y0 ) = I(1; 0) = log2 P(Y=0)
= log2 0.5
= log2 1−p0 +p1

ADIOS
THANK YOU, QUESTIONS!!!

Lect3 - 2021 IT

Uploaded by

Copyright:

Available Formats

You might also like

Lect3 - 2021 IT

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect3 - 2021 IT

Uploaded by

Copyright:

Available Formats

Measure of Information

UNIVERSITY OF MINES AND TECHNOLOGY

Course Instructor Dr. Abdel-Fatao Hamidu

Computer Science and Engineering Department

March 15, 2022

Prepared by Dr. Abdel-Fatao Information Theory 1/29

Some Point to Remember

• The number of bits used to represent a message is completely

Prepared by Dr. Abdel-Fatao Information Theory 2/29

Measure of Information (MoI)

Uncetainty and Information

Prepared by Dr. Abdel-Fatao Information Theory 3/29

Measure of Information (MoI)

Prepared by Dr. Abdel-Fatao Information Theory 4/29

Measure of Information (MoI)

I(xi ) = −log2 P(xi )

• This is consistent with intuition, since the output of the binary

Prepared by Dr. Abdel-Fatao Information Theory 5/29

Measure of Information (MoI)

I(xi ) = −log2 P(xi )

Prepared by Dr. Abdel-Fatao Information Theory 6/29

Measure of Information (MoI)

I(xi ) = −log2 P(xi )

• Again, we observe that we indeed need m bits to represent the

Prepared by Dr. Abdel-Fatao Information Theory 7/29

Measure of Information (MoI)

Prepared by Dr. Abdel-Fatao Information Theory 8/29

Measure of Information (MoI)

Prepared by Dr. Abdel-Fatao Information Theory 9/29

Measure of Information (MoI)

Prepared by Dr. Abdel-Fatao Information Theory 10/29

Measure of Information (MoI)

Prepared by Dr. Abdel-Fatao Information Theory 11/29

Measure of Information (MoI)

Why the Logarithm?

Prepared by Dr. Abdel-Fatao Information Theory 12/29

Measure of Information (MoI)

Why the Logarithm?

Prepared by Dr. Abdel-Fatao Information Theory 13/29

Measure of Information (MoI)

Why the Logarithm?

Prepared by Dr. Abdel-Fatao Information Theory 14/29

Measure of Information (MoI)

Prepared by Dr. Abdel-Fatao Information Theory 15/29

Measure of Information (MoI)

• Consider two discrete random variables X andY with possible

• Found in applications such as DNA sequencing, where Y could

Measure of Information (MoI)

• If X and Y are dependent events, the occurrence of Y = yj

Prepared by Dr. Abdel-Fatao Information Theory 17/29

Measure of Information (MoI)

Mutual Information – Transinformation

• Mutual information is a measure of how much information can

Prepared by Dr. Abdel-Fatao Information Theory 18/29

Measure of Information (MoI)

• Therefore there exist a two way relationship

Measure of Information (MoI)