Chapter 8: Differential Entropy

Chapter 8: Differential entropy
University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye
Chapter 8 outline
• Motivation
• Definitions
• Relation to discrete entropy
• Joint and conditional diﬀerential entropy
• Relative entropy and mutual information
• Properties
• AEP for Continuous Random Variables

Motivation
• Our goal is to determine the capacity of an AWGN channel
N Gaussian noise ~ N(0,PN)
h
X Y =hX+N
Wireless channel
with fading
time time
Motivation
• Our goal is to determine the capacity of an AWGN channel
N Gaussian noise ~ N(0,PN)
h
X Y =hX+N
Wireless channel
with fading
⇥
|h|2 P +PN
C= 1
2 log PN
= 1
2 log (1 + SN R) (bits/channel use)

Motivation
• need to define entropy, mutual information between CONTINUOUS random
variables
• Can you guess?
• Discrete X, p(x):
• Continuous X, f(x):
Definitions - densities

Properties - densities

8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY 247
an interpretation of the differential entropy: It is the logarithm of the

equivalent side length of the smallest set that contains most of the prob-
ability. Hence low entropy implies that the random variable is confined
to a small effective volume and high entropy indicates that the random
variable is widely dispersed.
Note. Just as the entropy is related to the volume of the typical set, there
is a quantity called Fisher information which is related to the surface
area of the typical set. We discuss Fisher information in more detail in
Sections 11.10 and 17.8.
8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE

ENTROPY
Quantized random variables
Consider a random variable X with density f (x) illustrated in Figure 8.1.
Suppose that we divide the range of X into bins of length !. Let us
assume that the density is continuous within the bins. Then, by the mean
value theorem, there exists a value xi within each bin such that
! (i+1)!
f (xi )! = f (x) dx. (8.23)
i!
Consider the quantized random variable X ! , which is defined by
X ! = xi if i! ≤ X < (i + 1)!. (8.24)
f(x)
FIGURE 8.1. Quantization of a continuous random variable.

variable is widely dispersed. variable is widely dispersed.
Note. Just as the entropy is related to the volume of the typicalNote.
set, there
Just as the entropy is related to the volume of the typical set, there
is a quantity called Fisher information which is related to the is asurface
quantity called Fisher information which is related to the surface
area of the typical set. We discuss Fisher information in morearea detail
of the
in typical set. We discuss Fisher information in more detail in
Sections 11.10 and 17.8. Sections 11.10 and 17.8.


ENTROPY ENTROPY
Consider a random variable X with density f (x) illustrated in Figure

Consider
8.1.a random variable X with density f (x) illustrated in Figure 8.1.
Suppose that we divide the range of X into bins of length !. Suppose
Let usthat we divide the range of X into bins of length !. Let us
assume that the density is continuous within the bins. Then, byassume
the mean
that the density is continuous within the bins. Then, by the mean
value theorem, there exists a value xi within each bin such thatvalue theorem, there exists a value xi within each bin such that
! (i+1)! ! (i+1)!
f (xi )! = f (x) dx. (8.23) f (xi )! = f (x) dx. (8.23)
i! i!

X ! = xi if i! ≤ X < (i + 1)!. (8.24) X ! = xi if i! ≤ X < (i + 1)!. (8.24)
f(x) f(x)
∆ ∆
x x
FIGURE 8.1. Quantization of a continuous random variable. FIGURE 8.1. Quantization of a continuous random variable.
8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY 247 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY 247
an interpretation of the differential entropy: It is the logarithm

an interpretation
of the of the differential entropy: It is the logarithm of the
equivalent side length of the smallest set that contains most ofequivalent
the prob- side length of the smallest set that contains most of the prob-
ability. Hence low entropy implies that the random variable isability.
confined
Hence low entropy implies that the random variable is confined
to a small effective volume and high entropy indicates that the to random
a small effective volume and high entropy indicates that the random
variable is widely dispersed. variable is widely dispersed.
Note. Just as the entropy is related to the volume of the typicalNote.
set, there
Just as the entropy is related to the volume of the typical set, there
is a quantity called Fisher information which is related to the is asurface
quantity called Fisher information which is related to the surface
area of the typical set. We discuss Fisher information in morearea detail
of the
in typical set. We discuss Fisher information in more detail in
Sections 11.10 and 17.8. Sections 11.10 and 17.8.

ENTROPY ENTROPY
Consider a random variable X with density f (x) illustrated in Figure

Consider
8.1.a random variable X with density f (x) illustrated in Figure 8.1.
Suppose that we divide the range of X into bins of length !. Suppose
Let usthat we divide the range of X into bins of length !. Let us
assume that the density is continuous within the bins. Then, byassume
the mean
that the density is continuous within the bins. Then, by the mean
value theorem, there exists a value xi within each bin such thatvalue theorem, there exists a value xi within each bin such that
! (i+1)! ! (i+1)!
f (xi )! = f (x) dx. (8.23) f (xi )! = f (x) dx. (8.23)
i! i!

X ! = xi if i! ≤ X < (i + 1)!. (8.24) X ! = xi if i! ≤ X < (i + 1)!. (8.24)
f(x) f(x)
∆ ∆
x x
FIGURE 8.1. Quantization of a continuous random variable. FIGURE 8.1. Quantization of a continuous random variable.
Differential entropy - definition
Examples
f(x)
a b x
Examples
Differential entropy - the good the bad and the ugly

Differential entropy - multiple RVs

Differential entropy of a multi-variate Gaussian
SUMMARY 41
Proof: We have
! ! r(x)
p(x) log p(x)+ p(x) log p(x)
2−H (p)−D(p||r) = 2 (2.151)
!
p(x) log r(x)
=2 (2.152)
"
≤ p(x)2log r(x) (2.153)
University of Illinois at Chicago ECE 534, Fall 2013, Natasha
" Devroye
= p(x)r(x) (2.154)
= Pr(X = X # ), (2.155)
where the inequality follows from Jensen’s inequality and the convexity
of the function f (y) = 2y . !
Parallels with discrete

The following telegraphic entropy....
summary omits qualifying conditions.
SUMMARY
Definition The entropy H (X) of a discrete random variable X is
defined by
"
H (X) = − p(x) log p(x).
x∈X
(2.156) ✓
Properties of H
✕
1. H (X) ≥ 0.
2. Hb (X) = (logb a)Ha (X).
3. (Conditioning reduces entropy) For any two random variables, X
and Y , we have
....
H (X|Y ) ≤ H (X) (2.157)
with equality if and only if X and Y are independent.
....
!
4. H (X1 , X2 , . . . , Xn ) ≤ ni=1 H (Xi ), with equality if and only if the
Xi are independent.
5. H (X) ≤ log | X |, with equality if and only if X is distributed uni-
formly over X.
....
6. H (p) is concave in p. ....
Parallels with discrete entropy....
42 ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION
Definition The relative entropy D(p ! q) of the probability mass

function p with respect to the probability mass function q is defined by
!
p(x)
D(p ! q) = . p(x) log(2.158)
q(x)
x
Definition The mutual information between two random variables X
and Y is defined as
!! p(x, y)
....
I (X; Y ) = p(x, y) log . (2.159)
p(x)p(y)
x∈X y∈Y
....
Alternative expressions
1
42 p logMUTUAL
H (X) = EAND , (2.160)
....
ENTROPY, RELATIVE ENTROPY, p(X) INFORMATION
1
....
Definition The relative = Ep log D(p ! q)
H (X, Y ) entropy (2.161) mass
, of the probability
p(X, Y )
function p with respect to the probability
1
mass function q is defined by
log
H (X|Y ) = Ep! , (2.162)
p(x)p(X|Y )
D(p ! q) = p(x) log . (2.158)
x p(X, Y ) q(x)
I (X; Y ) = Ep log , (2.163)
Definition The mutual information between
p(X)p(Y ) two random variables X
and Y is defined as p(X)
D(p||q) = Ep log . (2.164)43
PROBLEMS
!! q(X) p(x, y)
I (X; Y ) = p(x, y) log . (2.159)
Properties of D and I
Relative entropy: p(x)p(y)
x∈X y∈Y
D(p(x, y)||q(x, y)) = D(p(x)||q(x)) + D(p(y|x)||q(y|x)).
1. I (X; Y ) =
Alternative H (X) − H (X|Y ) = H (Y ) − H (Y |X) = H (X) +
expressions
H (Y
University of Jensen’s
Illinois )− H (X,
at Chicago ECEY534,
inequality.). IfFall is a convex
f 2013, Natasha function,
Devroye then Ef (X) ≥ f (EX).
2. D(p ! q) ≥ 0 with equality if and only if1p(x) = q(x), for all x ∈
LogX. sum inequality. For n positive
H (X) = Ep lognumbers, , (2.160)
p(X) a1 , a2 , . . . , an and
b3.1 , Ib(X;
2 , . .Y. ,
) =
bn , D(p(x, y)||p(x)p(y)) ≥ 0, with equality if and only if
p(x, y) = p(x)p(y) (i.e., X and are 1
# independent).
" npY log
! n H (X, Y ) = E $n ,
! distribution (2.161)
4. If | X |= m, and u is the ai uniform p(X, Y over
) ai X, then D(p !
ai log ≥ ai log $i=1 (2.165)
u) = log m − H (p). bi n
i=1 i=1 1 i=1 bi
5. D(p||q) is convexHin(X|Y ) = (p,
the pair log
Ep q). , (2.162)
with equality if and only if abii = constant.
p(X|Y )
Parallels with discrete entropy....
Chain rules
Entropy: H (X1 , X2 ,I.(X;
Data-processing
. . , XYn)) =
inequality.
"n
If=XE→
p(X, Y )
H (X |Xi−1 , . . .,, X1 ).
log
Y → iZ forms (2.163)
p(X)p(Y )a Markov chain,
i=1
p
Mutual information:
I (X; Y ) ≥ I (X; Z). "n
I (X1 , X2 , . . . , Xn ; Y ) = i=1 I (Xi ; Yp(X) |X1 , X2 , . . . , Xi−1 ).
Sufficient statistic. T D(p||q) = Ep logrelative to
(X) is sufficient . (2.164)
q(X) {fθ (x)} if and only
if I (θ ; X) = I (θ ; T (X)) for all distributions on θ .
Properties of D and I
Fano’s inequality. Let Pe = Pr{X̂(Y ) #= X}. Then
1. I (X; Y ) = H (X) − H (X|Y ) = H (Y ) − H (Y |X) = H (X) +
H (Y ) − H (X, YH
2. Inequality.
).(Pe ) + Pe log |X| ≥ H (X|Y ).
0 with
(2.166)
....
D(p ! q) If≥X and Xequality if and only
$ are independent andifidentically
p(x) = q(x), for all x ∈
distributed,
X.
then
3. I (X; Y ) = D(p(x, y)||p(x)p(y)) ≥ 0, with equality if and only if
Pr(X = X $ ) ≥ 2−H (X) ,
p(x, y) = p(x)p(y) (i.e., X and Y are independent).
(2.167)
....
4. If | X |= m, and u is the uniform distribution over X, then D(p !
u) = log m − H (p).
PROBLEMS
5. D(p||q) is convex in the pair (p, q).
....
2.1
Chain Xrules
Coin flips. A fair coin is flipped until the first head occurs. Let
denote the number of flips required.
....
"n
Entropy:
(a) Find
H (Xthe1, X 2 , . . . ,HX
entropy n) =
(X) in bits.
i=1 H (X
The i |Xi−1 expressions
following , . . . , X1 ). may
be useful:
Mutual information: "n
I (X1 , X2 , . . . , X!n ; Y ) = 1 i=1 I (X!
∞ i ; Y |X1 , X2 ,r. . . , Xi−1 ).
∞
rn = , nr n = .
1−r (1 − r)2
n=0 n=0
(b) A random variable X is drawn according to this distribution.

Find an “efficient” sequence of yes–no questions of the form,

Parallels with discrete entropy... PROBLEMS 43
Relative entropy:
D(p(x, y)||q(x, y)) = D(p(x)||q(x)) + D(p(y|x)||q(y|x)).
Jensen’s inequality. If f is a convex function, then Ef (X) ≥ f (EX).
Log sum inequality. For n positive numbers, a1 , a2 , . . . , an and

b1 , b2 , . . . , bn ,
n
! ai
ai log ≥
" n #
!
$n
ai log $i=1
ai
(2.165)
....
n
bi i=1 bi
i=1 i=1
with equality if and only if ai

bi = constant. ....
Data-processing inequality. If X → Y → Z forms a Markov chain,
I (X; Y ) ≥ I (X; Z). ....
Sufficient statistic. T (X) is sufficient relative to {fθ (x)} if and only ....
if I (θ ; X) = I (θ ; T (X)) for all distributions on θ .
Fano’s inequality. Let Pe = Pr{X̂(Y ) #= X}. Then
H (Pe ) + Pe log |X| ≥ H (X|Y ). (2.166)
Inequality. If X and X $ are independent and identically distributed,

then
Pr(X = X $ ) ≥ 2−H (X) , (2.167)
PROBLEMS
2.1 Coin flips. A fair coin is flipped until the first head occurs. Let
Relative entropy and mutual information

X denote the number of flips required.
(a) Find the entropy H (X) in bits. The following expressions may
be useful:
∞
! ∞
!
1 r
rn = , nr n = .
1−r (1 − r)2
n=0 n=0
(b) A random variable X is drawn according to this distribution.

Find an “efficient” sequence of yes–no questions of the form,

Properties

A quick example
• Find the mutual information between the correlated Gaussian random

variables with correlation coeﬃcient ρ
• What is I(X;Y)?
More properties of differential entropy

More properties of differential entropy
Examples of changes in variables

Concavity and convexity
• Same as in the discrete entropy and mutual information....
58 ASYMPTOTIC EQUIPARTITION PROPERTY
probability distribution. Here it turns out that p(X1 , X2 , . . . , Xn ) is close

to 2−nH with high probability.
We summarize this by saying, “Almost all events are almost equally
surprising.” This is a way of saying that
! "
Pr (X1 , X2 , . . . , Xn ) : p(X1 , X2 , . . . , Xn ) = 2−n(H ±!) ≈ 1 (3.1)
if X1 , X2 , . . . , Xn are i.i.d. ∼ p(x). # #

In the example just given, where p(X1 , X2 , . . . , Xn ) = p Xi q n− Xi ,
we are simply saying that the number of 1’s in the sequence is close
to np (with high probability), and all such sequences have (roughly) the
same probability 2−nH (p) . We use the idea of convergence in probability,
defined as follows:
Definition (Convergence of random variables). Given a sequence of
random variables, X1 , X2 , . . . , we say that the sequence X1 , X2 , . . . con-
verges to a random variable X:
1. In probability if for every ! > 0, Pr{|Xn − X| > !} → 0
2. In mean square if E(Xn − X)2 → 0
3. With probability 1 (also called almost surely) if Pr{limn→∞ Xn =
X} = 1
The AEP for continuous RVs
3.1 ASYMPTOTIC EQUIPARTITION PROPERTY THEOREM
• The AEPThe
for asymptotic equipartition property is formalized in the following
discrete RVs said.....
theorem.
Theorem 3.1.1 (AEP) If X1 , X2 , . . . are i.i.d. ∼ p(x), then
1
− log p(X1 , X2 , . . . , Xn ) → H (X) in probability. (3.2)
n
Proof: Functions of independent random variables are also independent
random variables. Thus, since the Xi are i.i.d., so are log p(Xi ). Hence,
• The AEP by
forthe weak law of RVs
continuous large numbers,
says.....
1 1$
− log p(X1 , X2 , . . . , Xn ) = − log p(Xi ) (3.3)
n n
i
→ −E log p(X) in probability (3.4)

= H (X), (3.5)
which proves the theorem. !

Typical sets
• One of the points of the AEP is to define typical sets.
• Typical set for discrete RVs...
• Typical set of continuous RVs....
Typical sets and volumes

Maximum entropy distributions
• For a discrete random variable taking on K values, what distribution

maximized the entropy?
• Can you think of a continuous counter-part?
[Look ahead to Ch.12, pg. 409-412]

Maximum entropy distributions
[Look ahead to Ch.12, pg. 409-412]

Maximum entropy examples
Prove 2 ways!
Maximum entropy examples
Prove 2 ways!

38 ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION
Estimation
values inerror
X̂. We will and
not restrictdifferential entropy
calculate a function g(Y ) = X̂, where X̂ is an estimate of X and takes on
the alphabet X̂ to be equal to X, and we
will also allow the function g(Y ) to be random. We wish to bound the
probability that X̂ != X. We observe that X → Y → X̂ forms a Markov
chain. Define the probability of error
• A counter part to Fano’s inequality ! for discrete
" RVs...
Pe = Pr X̂ != X . (2.129)
Theorem 2.10.1 (Fano’s Inequality) For any estimator X̂ such that

X → Y → X̂, with Pe = Pr(X != X̂), we have
H (Pe ) + Pe log |X| ≥ H (X|X̂) ≥ H (X|Y ). (2.130)
This inequality can be weakened to
1 + Pe log |X| ≥ H (X|Y ) (2.131)
Why can’t we use Fano’s?

or
H (X|Y ) − 1
Pe ≥ . (2.132)
log |X|
Remark Note from (2.130) that Pe = 0 implies that H (X|Y ) = 0, as
intuition suggests.
Proof: We first ignore the role of Y and prove the first inequality in
(2.130). We will then use the data-processing inequality to prove the more
traditional form of Fano’s inequality, given by the second inequality in
(2.130). Define an error random variable,
#
1 if X̂ != X,
E= (2.133)
0 if X̂ = X.
University of Illinois at Chicago Then, using

ECE 534, the Natasha
Fall 2013, chain rule for
Devroye entropies to expand H (E, X|X̂) in two
different ways, we have
H (E, X|X̂) = H (X|X̂) + H (E|X, X̂) (2.134)

$ %& '
=0
= H (E|X̂) + H (X|E, X̂) . (2.135)

$ %& ' $ %& '
≤H (Pe ) ≤Pe log |X |
Since conditioning reduces entropy, H (E|X̂) ≤ H (E) = H (Pe ). Now

Estimation
since E error
is a functionand
of X and differential
X̂, the conditional entropyentropy
H (E|X, X̂) is

Summary
256 DIFFERENTIAL ENTROPY
SUMMARY
!
h(X) = h(f ) = − f (x) log f (x) dx (8.81)
S
.
f (X n )=2−nh(X) (8.82)
. nh(X)
Vol(A(n)
! )=2 . (8.83)
H ([X]2−n ) ≈ h(X) + n. (8.84)
256 1
DIFFERENTIALh( N(0, σ 2 ))
ENTROPY log 2π eσ 2 .
= (8.85)
2
1
h(Nn (µ, K)) = log(2π e)n |K|. (8.86)
SUMMARY
2
!
f
D(f ||g) = f log ! ≥ 0. (8.87)
g
h(X) = h(f n) = − f (x) log f (x) dx (8.81)
" S
h(X1 , X2 , . . . , Xn ) = h(Xi |X1 , X2 , . . . , Xi−1 ). (8.88)
.
f (X n )=2−nh(X)
i=1 (8.82)
. ) ≤ h(X).
h(X|Y (8.89)
Vol(A(n)
! )=2
nh(X)
. (8.83)
h(aX) = h(X) + log |a|. (8.90)
HFall([X]
University of Illinois at Chicago ECE 534, −n ) ≈
2013, 2Natasha h(X)! + n.
Devroye (8.84)
f (x, y)
I (X; Y ) 1 = f (x, y) log ≥ 0. (8.91)
h(N(0, σ 2 )) = log 2π eσ 2 . f (x)f (y) (8.85)
2 1 n
max h(X) = log(2π e) |K|. (8.92)
EXXt =K 1 2
h(Nn (µ, K)) = log(2π e)n |K|. (8.86)
22 1 2h(X|Y )
E(X − X̂(Y )) !≥ e .
2π e f
Summary D(f ||g) = f log ≥ 0. (8.87)
g for a discrete random variable.
2nH (X) is the effective alphabet size
2nh(X) is the effective support
"n set size for a continuous random variable.
2C
is the effective alphabet
h(X1 , X2 , . . . , Xn ) = size
h(Xofi |X
a 1channel
, X2 , . .of. ,capacity
Xi−1 ). C. (8.88)
i=1
PROBLEMS h(X|Y ) ≤ h(X). (8.89)
8.1 h(aX)
Differential = h(X)
entropy. + log |a|.
Evaluate (8.90)
the differential entropy h(X) =
# !
− f ln f for the following: f (x, y)
(a) The
I (X; Y ) = density,
exponential f (x, y) log = λe−λx , x ≥
f (x) ≥ 0.0. (8.91)
f (x)f (y)
1
max h(X) = log(2π e)n |K|. (8.92)
EXXt =K 2
1 2h(X|Y )
E(X − X̂(Y ))2 ≥ e .
2π e
2nH (X) is the effective alphabet size for a discrete random variable.
2nh(X) is the effective support set size for a continuous random variable.
2C is the effective alphabet size of a channel of capacity C.
PROBLEMS
8.1 Differential
# entropy. Evaluate the differential entropy h(X) =
− f ln f for the following:
(a) The exponential density, f (x) = λe−λx , x ≥ 0.

Chapter 8: Differential Entropy

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 8: Differential Entropy

Uploaded by

Copyright:

Available Formats

Chapter 8: Differential entropy

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

• Our goal is to determine the capacity of an AWGN channel

N Gaussian noise ~ N(0,PN)

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

• Our goal is to determine the capacity of an AWGN channel

N Gaussian noise ~ N(0,PN)

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

• Can you guess?

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY 247

an interpretation of the differential entropy: It is the logarithm of the

8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE

Consider the quantized random variable X ! , which is defined by

X ! = xi if i! ≤ X < (i + 1)!. (8.24)

FIGURE 8.1. Quantization of a continuous random variable.

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE

Quantized random variables

Consider a random variable X with density f (x) illustrated in Figure

Consider the quantized random variable X ! , which is defined by

X ! = xi if i! ≤ X < (i + 1)!. (8.24) X ! = xi if i! ≤ X < (i + 1)!. (8.24)

an interpretation of the differential entropy: It is the logarithm

Quantized random variables

Consider a random variable X with density f (x) illustrated in Figure

Consider the quantized random variable X ! , which is defined by

X ! = xi if i! ≤ X < (i + 1)!. (8.24) X ! = xi if i! ≤ X < (i + 1)!. (8.24)

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

Differential entropy - the good the bad and the ugly

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

Differential entropy - multiple RVs

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

Parallels with discrete

Definition The relative entropy D(p ! q) of the probability mass

(b) A random variable X is drawn according to this distribution.

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

Jensen’s inequality. If f is a convex function, then Ef (X) ≥ f (EX).

Log sum inequality. For n positive numbers, a1 , a2 , . . . , an and

with equality if and only if ai

Fano’s inequality. Let Pe = Pr{X̂(Y ) #= X}. Then

H (Pe ) + Pe log |X| ≥ H (X|Y ). (2.166)

Inequality. If X and X $ are independent and identically distributed,

Pr(X = X $ ) ≥ 2−H (X) , (2.167)

Relative entropy and mutual information

(b) A random variable X is drawn according to this distribution.

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

• Find the mutual information between the correlated Gaussian random

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

More properties of differential entropy

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

Examples of changes in variables

University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye

• Same as in the discrete entropy and mutual information....