Professional Documents
Culture Documents
Homework 3: Properties of Information Measures: 1 2 N X, X ,..., X
Homework 3: Properties of Information Measures: 1 2 N X, X ,..., X
Instructions: You are encouraged to discuss and collaborate with your classmates. However, you must
explicitly mention at the top of your submission who you collaborated with, and all external resources (web-
sites, books) you used, if any. Copying is NOT permitted, and solutions must be written independently and
in your own words.
Please scan a copy of your handwritten assignment as pdf with filename ăyour IDą HWăhomework noą.pdf.
Example: EE19BTECH00000 HW1.pdf.
For programming questions, create separate files. Please use the naming convention ăyour IDą HWăhomework
noą problemăproblem noą.*. Example: EE19BTECH00000 HW1 problem1.c. You may upload c,cpp,py or
m files only. No other format will be allowed.
Finally, upload your submission as a single zip file which includes all your programs and pdf file. The zip
file should have filename ăyour IDą HWăhomework noą.zip. Example: EE19BTECH00000 HW 1.zip.
Exercise 3.1. Prove that for every discrete random variable X and every function gpXq,
This shows that unlike entropy, conditioning does not always decrease (or increase) mutual information.
Exercise 3.3. Let X1 , X2 , . . . , Xn be n discrete random variables that are jointly distributed according to
some arbitrary distribution pX1 ,X2 ,...,Xn .
For any subset S “ ti1 , i2 , . . . , ik u Ă t1, 2, 3, . . . , nu, let us define
For example, if S1 “ t1, 3u and S2 “ t2, 3, 4u, then the above inequality says that
Exercise 3.4. Let X, Y be jointly Gaussian with mean zero, variance 1 and correlation coefficient ρ. Find
the mutual information IpX; Y q.
3-1
3-2 Homework 3: Properties of information measures
Exercise 3.5. Let X, Y be jointly distributed random variables. Prove that a function f pXq is a sufficient
statistic for Y given X if and only if there exist functions g, h such that
1. If M is a real-valued
řn random variable, and X1 , X2 , . . . , Xn are iid Gaussian with mean M and variance
1, then n1 i“1 Xi is a sufficient statistic for M given X1 , X2 , . . . , Xn .
2. Let α ř
be a random variable distributed over the interval r0, 1s, and X1 , . . . Xn be iid Bernoulli(α).
n
Then, i“1 is a sufficient statistic for α given X1 , . . . , Xn
3. Let α be a real-valued random variable and U1 , . . . , Un be iid uniform over the interval r0, αs. Prove
that maxi Ui is a sufficient statistic for α given U1 , . . . , Un .
The above are all natural choices of statistics for the respective parameters that we are interested in.
Exercise 3.6. Frequently in statistics and machine learning, we want to know how “different” two distri-
butions are. As we’ve seen in class, the KL divergence is one such measure. More generally, given any
real-valued convex function f which satisfies f p1q “ 0, the f -divergence between two distributions px and
qX is defined as ˆ ˙ ˆ ˙
pX pXq ÿ pX pxq
Df ppx , qx q “ Eq f “ qX pxqf .
qX pXq xPX
qX pxq
The Fisher information arises as a fundamental lower bound for the estimation problem. Suppose that we
want to estimate θ from a sample X distributed according to Pθ . Then, for all functions (estimators) f
satisfying Ef pXq “ θ, the mean squared error Epf pXq ´ θq2 is lower bounded as follows
1
Epf pXq ´ θq2 “ Varpf pXqq ě
Ipθq
In the above, the expectation is taken over Pθ . For this, you might need the Cauchy-Schwartz inequality
(look it up online if you haven’t seen it already).