Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

EE2340/EE5847: Information Sciences/Information Theory 2021

Homework 3: Properties of information measures


Instructor: Shashank Vatedka

Instructions: You are encouraged to discuss and collaborate with your classmates. However, you must
explicitly mention at the top of your submission who you collaborated with, and all external resources (web-
sites, books) you used, if any. Copying is NOT permitted, and solutions must be written independently and
in your own words.
Please scan a copy of your handwritten assignment as pdf with filename ăyour IDą HWăhomework noą.pdf.
Example: EE19BTECH00000 HW1.pdf.
For programming questions, create separate files. Please use the naming convention ăyour IDą HWăhomework
noą problemăproblem noą.*. Example: EE19BTECH00000 HW1 problem1.c. You may upload c,cpp,py or
m files only. No other format will be allowed.
Finally, upload your submission as a single zip file which includes all your programs and pdf file. The zip
file should have filename ăyour IDą HWăhomework noą.zip. Example: EE19BTECH00000 HW 1.zip.

Exercise 3.1. Prove that for every discrete random variable X and every function gpXq,

1. HpX, gpXqq “ HpXq


2. HpX, gpXqq ě HpgpXqq
Exercise 3.2. Give separate examples of jointly distributed random variables X, Y, Z such that

1. IpX; Y |Zq ă IpX; Y q


2. IpX; Y |Zq ą IpX; Y q

This shows that unlike entropy, conditioning does not always decrease (or increase) mutual information.
Exercise 3.3. Let X1 , X2 , . . . , Xn be n discrete random variables that are jointly distributed according to
some arbitrary distribution pX1 ,X2 ,...,Xn .
For any subset S “ ti1 , i2 , . . . , ik u Ă t1, 2, 3, . . . , nu, let us define

f pSq “ HpXi1 , Xi2 , . . . , Xik q.

For example, if S “ t2, 5, 9u, then f pSq “ HpX2 , X5 , X9 q.


Prove that for any subsets S1 , S2 of t1, 2 . . . , nu,

f pS1 Y S2 q ` f pS1 X S2 q ď f pS1 q ` f pS2 q

For example, if S1 “ t1, 3u and S2 “ t2, 3, 4u, then the above inequality says that

HpX1 , X2 , X3 , X4 q ` HpX3 q ď HpX1 , X3 q ` HpX2 , X3 , X4 q

Exercise 3.4. Let X, Y be jointly Gaussian with mean zero, variance 1 and correlation coefficient ρ. Find
the mutual information IpX; Y q.

3-1
3-2 Homework 3: Properties of information measures

Exercise 3.5. Let X, Y be jointly distributed random variables. Prove that a function f pXq is a sufficient
statistic for Y given X if and only if there exist functions g, h such that

pX,Y px, yq “ gpy, f pxqqhpxq

Using this (or otherwise), show the following:

1. If M is a real-valued
řn random variable, and X1 , X2 , . . . , Xn are iid Gaussian with mean M and variance
1, then n1 i“1 Xi is a sufficient statistic for M given X1 , X2 , . . . , Xn .
2. Let α ř
be a random variable distributed over the interval r0, 1s, and X1 , . . . Xn be iid Bernoulli(α).
n
Then, i“1 is a sufficient statistic for α given X1 , . . . , Xn
3. Let α be a real-valued random variable and U1 , . . . , Un be iid uniform over the interval r0, αs. Prove
that maxi Ui is a sufficient statistic for α given U1 , . . . , Un .

The above are all natural choices of statistics for the respective parameters that we are interested in.
Exercise 3.6. Frequently in statistics and machine learning, we want to know how “different” two distri-
butions are. As we’ve seen in class, the KL divergence is one such measure. More generally, given any
real-valued convex function f which satisfies f p1q “ 0, the f -divergence between two distributions px and
qX is defined as ˆ ˙ ˆ ˙
pX pXq ÿ pX pxq
Df ppx , qx q “ Eq f “ qX pxqf .
qX pXq xPX
qX pxq

1. For what choice of f do you get Df to be the KL divergence?


2. The total variation distance is defined as
ÿ
DT V ppX , qX q “ 0.5 |pX pxq ´ qX pxq|
xPX

For what choice of f do you get this?


3. The Jensen-Shannon divergence is defined as

DJS ppx , qX q “ DppX }ppX ` qX q{2q ` DpqX }ppX ` qX q{2q

For what choice of f do you get this?

In each case, show that the corresponding f is convex.


Also show that Df pp, qq is a convex function of the pair pp, qq.
Exercise 3.7. Let Pθ be a family of distributions parameterized by the variable θ P Θ, for some open
interval Θ. For example, we could have Θ “ p0, 1q, and Pθ to be the set of all Gaussian pdf’s with mean zero
and variance θ. Or, Pθ could be the set of all Bernoulli distributions with mean θ. Let us suppose that for
each x, Pθ pxq is continuously differentiable over Θ. The following quantity is called the Fisher information:
ÿ 1 ˆ BPθ pxq ˙2
Ipθq “
P pxq
xPX θ

Show that for every θ P Θ,


DpPθ1 }Pθ q 1
lim 1 2
“ Ipθq
1
θ Ñθ pθ ´ θq ln 4
Homework 3: Properties of information measures 3-3

The Fisher information arises as a fundamental lower bound for the estimation problem. Suppose that we
want to estimate θ from a sample X distributed according to Pθ . Then, for all functions (estimators) f
satisfying Ef pXq “ θ, the mean squared error Epf pXq ´ θq2 is lower bounded as follows
1
Epf pXq ´ θq2 “ Varpf pXqq ě
Ipθq

In the above, the expectation is taken over Pθ . For this, you might need the Cauchy-Schwartz inequality
(look it up online if you haven’t seen it already).

You might also like