Information Theory and Computing Assignment No. 1: April 10, 2020

Information Theory and Computing
Assignment No. 1
April 10, 2020
1
Que. 1.
Total number of outcomes = 32
Since, random variable is uniformly distributed,
1
thus, probability of each outcome = 32
Entropy of random variable X,
32
X
H(X) = − p(x) log2 (x)
i=1

1 1
= −32 · log2
32 32
1
= − log2
32
= log2 32
= 5 bits
∴ H(X) = 5 bits
2
Que. 2.
Entropy of random variable X,
X
H(X) = − p(x) log2 (x)
x
1 1 1 1 1 1 1 1 4 1
=− · log2 + · log2 + · log2 + · log2 + · log2
2 2 4 4 8 8 16 16 64 64
1 1 1 1 1 1 4 1
= − · log2 − · log2 − · log2 − · log2
2 2 4 4 8 8 64 64
1 1 1 1 4
= · log2 2 + · log2 4 + · log2 8 + · log2 16 + · log2 64
2 4 8 16 64
1 1 1 1 4
= ·1+ ·2+ ·3+ ·4+ ·6
2 4 8 16 64
1 1 3 1 3
= + + + +
2 2 8 4 8
= 2 bits
∴ H(X) = 2 bits
3
Que. 3.
From the given marginal distribution of X,
X
H(X) = − p(x) log2 (x)
x
1 1 1 1 1 1 1 1
=− · log2 + · log2 + · log2 + · log2
2 2 4 4 8 8 8 8
1 1 1 1 1 1 1 1
= − · log2 − · log2 − · log2 − · log2
2 2 4 4 8 8 8 8
1 1 1 1
= · log2 2 + · log2 4 + · log2 8 + · log2 8
2 4 8 8
1 1 1 1
= ·1+ ·2+ ·3+ ·3
2 4 8 8
3
=1+2·
8
7
= bits
4
∴ H(X) = 1.75 bits
4
Que. 4.
From the given marginal distribution of X,
X
H(X) = − p(x) log2 (x)
x
1 1 1 1 1 1 1 1
=− · log2 + · log2 + · log2 + · log2
2 2 4 4 8 8 8 8
1 1 1 1 1 1 1 1
= − · log2 − · log2 − · log2 − · log2
2 2 4 4 8 8 8 8
1 1 1 1
= · log2 2 + · log2 4 + · log2 8 + · log2 8
2 4 8 8
1 1 1 1
= ·1+ ·2+ ·3+ ·3
2 4 8 8
3
=1+2·
8
7
= bits
4
∴ H(X) = 1.75 bits
Similarly from the marginal distribution of Y,

X
H(Y ) = − p(y) log2 (y)
y

1 1
= −4 × · log2
4 4
1
= 4 × × log2 4
4
= log2 4
= 2 bits
∴ H(Y ) = 2 bits
5
Using values of the joint probability distribution of X and Y,
XX
H(X, Y ) = − p(xi , yj ) log2 p(xi , yj )
i j

1 1 1 1 1 1 1 1
= · log2 + 2 × · log2 + 6 × · log2 +4× · log2
4 4 8 8 16 16 32 32
1
= × [2 × log2 4 + 2 × log2 8 + 3 × log2 16 + log2 32]
8
1
= × [2 × 2 + 2 × 3 + 3 × 4 + 5]
8
27
= bits
8
∴ H(X, Y ) = 3.375 bits
For conditional entropy H(X|Y ),

∵ H(X, Y ) = H(Y ) + H(X|Y )
∴ H(X|Y ) = H(X, Y ) − H(Y )
27
= −2
8
27 16
= −
8 8
11
= bits
8
∴ H(X|Y ) = 1.375 bits
And for conditional entropy H(Y |X),

∵ H(X, Y ) = H(X) + H(Y |X)
∴ H(Y |X) = H(X, Y ) − H(X)
27 7
= −
8 4
27 14
= −
8 8
13
= bits
8
6
∴ H(Y |X) = 1.625 bits
For mutual information I(X; Y ),
∵ I(X; Y ) = H(X) − H(X|Y )

7 11
= −
4 8
14 11
= −
8 8
3
= bits
8
∴ I(X; Y ) = 0.375 bits
7
Ques. 6.
(a) It comes from entropy’s chain rule applied to the random variable X
and g(X ), i.e. H(X, Y ) = H(X) + H(Y |X), so H(X, g(X)) = H(X) +
H(g(X)|X).
(b) Intuitively, if g(x) depends only on X and if the value of X is known,

g(x) is completely specified and it has a deterministic value. The entropy of
a deterministic value is 0, so H(g(X)|X) = 0 and H(X) + H(g(X)|X) =
H(X).
(c) Again, this formula comes from the entropy’s chain rule, in the form:
H(X, Y ) = H(Y ) + H(X|Y ).
(d) Proving that H(g(X)) + H(X|g(X) ≥ H(g(X)) means proving that

H(X|g(X)) ≥ 0: the non-negativity is one of the property of entropy and
can be proved from its definition by noting that the logarithm of a probabil-
ity (a quantity always less than or equal to 1) is non-positive. In particular
H(X|g(X)) = 0 if the knowledge of value of g(X) allows to totally specify
the value of X; otherwise H(X|g(X)) > 0 (for example if g(X) is an injective
function).
8
Ques. 7.
Compute of marginal distributions :

2 1
p(x) = ,
3 3

1 2
p(y) = ,
3 3
(a) H(X), H(Y)

2 2 1 1
H(X) = − · log2 + · log2
3 3 3 3
= 0.918 bits
∴ H(X) = 0.918 bits

1 1 2 2
H(Y ) = − · log2 + · log2
3 3 3 3
= 0.918 bits
∴ H(Y ) = 0.918 bits
(b) H(X|Y ), H(Y |X)

1
X
H(X|Y ) = p(y = i)H(X|Y = y)
i=0
1 2
= H(X|Y = 0) + H(X|Y = 1)
3 3
1 2 1 1
= H(1, 0) + H( , )
3 3 2 2
2
=
3
9
2
∴ H(X|Y ) =
3
1
X
H(Y |X) = p(x = i)H(Y |X = x)
i=0
2 1
= H(Y |X = 0) + H(Y |X = 1)
3 3
2 1 1 1
= H( , ) + H(0, 1)
3 2 2 3
2
=
3
2
∴ H(Y |X) =
3
(c) H(X,Y)
1,1
X
H(X, Y ) = p(x, y) log2 p(x, y)
x=0,y=0
1 1
= −3 · · log2
3 3
= 1.5849625 bits
(d) H(Y) - H(Y |X)

2
H(Y ) − H(Y |X) = 0.918 −
3
= 0.25134
10
(e) I(X;Y)

X p(x, y)
I(X; Y ) = p(x, y) log2
x,y
p(x)p(y)
1 1 1
1 1 1
= log2 2 1 + log2 2 2 + log2 1 3 2
3 3
3 ·
3 3
3 ·
3 3
3 ·
3 3
= 0.25162916
∴ I(X; Y ) = 0.25162916
(f) Venn Diagrams :
11
12

Information Theory and Computing Assignment No. 1: April 10, 2020

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Information Theory and Computing Assignment No. 1: April 10, 2020

Uploaded by

Copyright:

Available Formats

Information Theory and Computing

April 10, 2020

∴ H(X) = 1.75 bits

∴ H(X) = 1.75 bits

Similarly from the marginal distribution of Y,

For conditional entropy H(X|Y ),

And for conditional entropy H(Y |X),

For mutual information I(X; Y ),

∵ I(X; Y ) = H(X) − H(X|Y )

∴ I(X; Y ) = 0.375 bits

(b) Intuitively, if g(x) depends only on X and if the value of X is known,

(d) Proving that H(g(X)) + H(X|g(X) ≥ H(g(X)) means proving that

∴ H(X) = 0.918 bits

∴ H(Y ) = 0.918 bits

(b) H(X|Y ), H(Y |X)

(d) H(Y) - H(Y |X)

(f) Venn Diagrams :

You might also like