Lecture 2

You might also like

Download as pdf
Download as pdf
You are on page 1of 6
COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Serie, 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable. The entropy of a diserete random variable X with alphabet 2 is H(X) =~ pleylogatey When the base of the logarithm is 2, entropy is measured in bits. Erample: One can model the temperature in a city (e.g. Amberst) as a random variable, X. Then the entropy of X measures the uncertainty in Amherst temperature. Let ¥ be a random variable repre- senting the temperature in Northampton, We know that X and Y are not independent (they are usually quite similar). Hence, when ¥ is given, some uncertainty about X goes away Example: Consider a fair die with pm p(1) = p(2) =... = p(6) = 1/6, Its entropy is ine) = 6. Hoe} =o Maximum entropy is achieved when all outcomes of a random variable occur with equal probability (Note: you can prove this by assigning a variable p; to the probability of outcome i. Then, partially- differentiate the entropy funetion with respect to each p;. Set the derivatives to zero and solve for the y's. You will see that they axe equal.) neral, for Mf equally-probable outcomes, the entropy is H(X) = log M. 1.1 Joint Entropy Definition: For two random variables X and ¥, # €.¥y ©. joint entropy is defined as DY Slew) bog ve, v) ee H(XY) where p(x, y) = Pr[X = 2,¥ = y] is the joint pmf of X and ¥ 1.2 Conditional Entropy Definition: The conditional entropy of a random variable ¥ given X = xis H(Y|X = 2) =~)? plyle) logplyiz) When a particular value of 2 is not given, we must average over all possible values of X: HY |X) = Yee) (= (y|x) log p(y ») ix se =- DV ply) log pyle) fe 1 ‘The conditional entropy of X given Y is A(X|Y) =— 2 Y vr) log val) Fen In general, H(X|Y) # H(Y|X). 1.3. Chain Rule for Entropy ‘The Chain Rule for Entropy states that t the conditional entropy of the other: nntropy of two random variables is the entropy of one plas H(X,Y) = H(X) + HX) w Proof, H(XY) = ~ YY plow) oe (pe=)vtvin)) sae =--> (= wo) Togp(2) — $2 Y ple») lox nul) sa es sete =~ Y ple) ogp(2) ~ YY ple. w) logptuley la aie = H(X) + HIY|X) Similarly, it can abso be shown that A(X,Y) = H(Y) + H(X|Y) (2) From (1) and (2), we see that H(X) ~ H(X|Y) = HO’) — HUY|X) = 10) J(X;Y) is known as Mutual Information, which can be thought of as a measure of reduction in ‘uneertainty. Prample lor the random variables X € {0,1},¥ © {0,1}, representing two coin tosses. ‘Phe ioint distribution i shown in Table L vixpo pt 0 hea 1 phe ‘Table 1: Joint distribution of two coin tosses The joint entropy of X and ¥ is 1 7 2 1 lod 1 (X.Y) =—jlog 5 ~ log 5 ~ jlog 5 = 1.75 Note that if all probabilities were equal, we would have (X.Y) = log-4 = 2 bits, which is the maximum entropy. ‘The individual entropies are 55 38 H(X) =~ glog ~ log = 0.9544 8 8 lad nog H(Y) =~ flog | ~ log, ¥ OS13 Question: What is the conditional entropy of X given ¥? Answer: One possible way of solving this problem is to compute the conditional distribution of X given Y, for all possible valnes of X and Y, However, since we have alreadly determined the joint and individual entropies, we can instead use the Chain Rule for Entropy: H(X,¥) = H(¥) + H(XIY) H(X|¥) = 1.75 -0.8113 = 0.9887 2 Relative Entropy Let X be a random variable with alphabet = {0,1}. Consider the following two distributions: v0 »() 0) a) Let r be another probability distribution, defined below r(0)= r= Question: Is r closer to p or 4? Answer: 1 is closer to p, because the re hoth biased toward the outcome X = 1. How do w measur uilarity? One way is to u © pelative entropy. 2 Relative entropy, also known as Divergence or Kullback-Leibler ance, is defined Diva) = Ye) oe 2 Property 1: Relative entropy i not symmetric. other words, it is not necessarily true that D(p\\a) = D(ailp. Property 2 Relative entropy is always non-negative: D(p)lg) 2 0. Equality is achieved only if ple) = qx), ¥ 2X. This property is also known as Information Inequality. Property 3: Relative entropy does not satisfy the triangle inequality Because A'L-distance does not obey the triangle inequality and is not symmetric, it is not a true metric Example: Consider the distributions p and q introduced earlier, where p,q {0,1}, p(0) = 1/4, p(l) = 3/4, (0) =1/2, (1) = 1/2. Loa Dipla) = Tos (i Dear) = iow (4% Note that D(p|lq) # D(q|\p). Proof of Property 2 (Information Inequality) ing fact Identity: low y Proof of identity: First, note that the follow ltrceVreR (3) 25) 14x 20 —e 15 10 Figure 1: 1+ and e Taking the natural log of both sides of (3), log, (1 + 2) <2 Using the fact that log, T= og, log, T (change of base formula) log, Blog, (1 +2) <2 lore (1 +2) $5 Let y= 24h euel joey S Toa Going back to the proof of Information Inequality, 1 YL wl) - ae) ik log. 2 = Dip|ia) >0 Identity: Let X <2, [A log M — H(X) > 0 Proof: As stated previously, the maximum value of entropy is log |X| = log M, which oceurs when X is uniformly-distributed, Now we ean prove this by KZ-distance, og M — A(X) = ST pla) log M+ 7 ple tog (2) = Son a = Dell) 20 Note that the inequality above is known to us by the mon-negativity property of A'E-distance. If the distribution of X is uniform, where p(X = 2) = 5 Ye €¥, then D(y)|4) . 3 Mutual Information Definitios 1 Mutual information is defined by UX:Y) = H(X) — HIX|Y) HY) ~ HX) 2 D(p(e,») || »erolw)) Proof play) D(ple,y) || plant) = yore ee only) plep(ule) Drew aoa) Trea a via x Fre mbonete) + Yvert fees = HY) ~ en) If X and ¥ are independent, 1(X;¥) = 0.

You might also like