Professional Documents
Culture Documents
Chapter 8: Differential Entropy
Chapter 8: Differential Entropy
Chapter 8 outline
• Motivation
• Definitions
• Relation to discrete entropy
• Joint and conditional differential entropy
• Relative entropy and mutual information
• Properties
• AEP for Continuous Random Variables
h
X Y =hX+N
Wireless channel
with fading
time time
Motivation
h
X Y =hX+N
Wireless channel
with fading
⇥
|h|2 P +PN
C= 1
2 log PN
= 1
2 log (1 + SN R) (bits/channel use)
• Discrete X, p(x):
• Continuous X, f(x):
Definitions - densities
Properties - densities
f(x)
f(x) f(x)
∆ ∆
x x
FIGURE 8.1. Quantization of a continuous random variable. FIGURE 8.1. Quantization of a continuous random variable.
8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY 247 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY 247
f(x) f(x)
∆ ∆
x x
FIGURE 8.1. Quantization of a continuous random variable. FIGURE 8.1. Quantization of a continuous random variable.
University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye
Differential entropy - definition
Examples
f(x)
a b x
University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye
Examples
SUMMARY 41
Proof: We have
! ! r(x)
p(x) log p(x)+ p(x) log p(x)
2−H (p)−D(p||r) = 2 (2.151)
!
p(x) log r(x)
=2 (2.152)
"
≤ p(x)2log r(x) (2.153)
University of Illinois at Chicago ECE 534, Fall 2013, Natasha
" Devroye
= p(x)r(x) (2.154)
= Pr(X = X # ), (2.155)
where the inequality follows from Jensen’s inequality and the convexity
of the function f (y) = 2y . !
SUMMARY
Definition The entropy H (X) of a discrete random variable X is
defined by
"
H (X) = − p(x) log p(x).
x∈X
(2.156) ✓
Properties of H
✕
1. H (X) ≥ 0.
2. Hb (X) = (logb a)Ha (X).
3. (Conditioning reduces entropy) For any two random variables, X
and Y , we have
....
H (X|Y ) ≤ H (X) (2.157)
with equality if and only if X and Y are independent.
....
!
4. H (X1 , X2 , . . . , Xn ) ≤ ni=1 H (Xi ), with equality if and only if the
Xi are independent.
5. H (X) ≤ log | X |, with equality if and only if X is distributed uni-
formly over X.
....
6. H (p) is concave in p. ....
University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye
Parallels with discrete entropy....
42 ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION
....
Alternative expressions
1
42 p logMUTUAL
H (X) = EAND , (2.160)
....
ENTROPY, RELATIVE ENTROPY, p(X) INFORMATION
1
....
Definition The relative = Ep log D(p ! q)
H (X, Y ) entropy (2.161) mass
, of the probability
p(X, Y )
function p with respect to the probability
1
mass function q is defined by
log
H (X|Y ) = Ep! , (2.162)
p(x)p(X|Y )
D(p ! q) = p(x) log . (2.158)
x p(X, Y ) q(x)
I (X; Y ) = Ep log , (2.163)
Definition The mutual information between
p(X)p(Y ) two random variables X
and Y is defined as p(X)
D(p||q) = Ep log . (2.164)43
PROBLEMS
!! q(X) p(x, y)
I (X; Y ) = p(x, y) log . (2.159)
Properties of D and I
Relative entropy: p(x)p(y)
x∈X y∈Y
D(p(x, y)||q(x, y)) = D(p(x)||q(x)) + D(p(y|x)||q(y|x)).
1. I (X; Y ) =
Alternative H (X) − H (X|Y ) = H (Y ) − H (Y |X) = H (X) +
expressions
H (Y
University of Jensen’s
Illinois )− H (X,
at Chicago ECEY534,
inequality.). IfFall is a convex
f 2013, Natasha function,
Devroye then Ef (X) ≥ f (EX).
2. D(p ! q) ≥ 0 with equality if and only if1p(x) = q(x), for all x ∈
LogX. sum inequality. For n positive
H (X) = Ep lognumbers, , (2.160)
p(X) a1 , a2 , . . . , an and
b3.1 , Ib(X;
2 , . .Y. ,
) =
bn , D(p(x, y)||p(x)p(y)) ≥ 0, with equality if and only if
p(x, y) = p(x)p(y) (i.e., X and are 1
# independent).
" npY log
! n H (X, Y ) = E $n ,
! distribution (2.161)
4. If | X |= m, and u is the ai uniform p(X, Y over
) ai X, then D(p !
ai log ≥ ai log $i=1 (2.165)
u) = log m − H (p). bi n
i=1 i=1 1 i=1 bi
5. D(p||q) is convexHin(X|Y ) = (p,
the pair log
Ep q). , (2.162)
with equality if and only if abii = constant.
p(X|Y )
Parallels with discrete entropy....
Chain rules
Entropy: H (X1 , X2 ,I.(X;
Data-processing
. . , XYn)) =
inequality.
"n
If=XE→
p(X, Y )
H (X |Xi−1 , . . .,, X1 ).
log
Y → iZ forms (2.163)
p(X)p(Y )a Markov chain,
i=1
p
Mutual information:
I (X; Y ) ≥ I (X; Z). "n
I (X1 , X2 , . . . , Xn ; Y ) = i=1 I (Xi ; Yp(X) |X1 , X2 , . . . , Xi−1 ).
Sufficient statistic. T D(p||q) = Ep logrelative to
(X) is sufficient . (2.164)
q(X) {fθ (x)} if and only
if I (θ ; X) = I (θ ; T (X)) for all distributions on θ .
Properties of D and I
Fano’s inequality. Let Pe = Pr{X̂(Y ) #= X}. Then
1. I (X; Y ) = H (X) − H (X|Y ) = H (Y ) − H (Y |X) = H (X) +
H (Y ) − H (X, YH
2. Inequality.
).(Pe ) + Pe log |X| ≥ H (X|Y ).
0 with
(2.166)
....
D(p ! q) If≥X and Xequality if and only
$ are independent andifidentically
p(x) = q(x), for all x ∈
distributed,
X.
then
3. I (X; Y ) = D(p(x, y)||p(x)p(y)) ≥ 0, with equality if and only if
Pr(X = X $ ) ≥ 2−H (X) ,
p(x, y) = p(x)p(y) (i.e., X and Y are independent).
(2.167)
....
4. If | X |= m, and u is the uniform distribution over X, then D(p !
u) = log m − H (p).
PROBLEMS
5. D(p||q) is convex in the pair (p, q).
....
2.1
Chain Xrules
Coin flips. A fair coin is flipped until the first head occurs. Let
denote the number of flips required.
....
"n
Entropy:
(a) Find
H (Xthe1, X 2 , . . . ,HX
entropy n) =
(X) in bits.
i=1 H (X
The i |Xi−1 expressions
following , . . . , X1 ). may
be useful:
Mutual information: "n
I (X1 , X2 , . . . , X!n ; Y ) = 1 i=1 I (X!
∞ i ; Y |X1 , X2 ,r. . . , Xi−1 ).
∞
rn = , nr n = .
1−r (1 − r)2
n=0 n=0
Relative entropy:
D(p(x, y)||q(x, y)) = D(p(x)||q(x)) + D(p(y|x)||q(y|x)).
PROBLEMS
2.1 Coin flips. A fair coin is flipped until the first head occurs. Let
Properties
• What is I(X;Y)?
• The AEPThe
for asymptotic equipartition property is formalized in the following
discrete RVs said.....
theorem.
Theorem 3.1.1 (AEP) If X1 , X2 , . . . are i.i.d. ∼ p(x), then
1
− log p(X1 , X2 , . . . , Xn ) → H (X) in probability. (3.2)
n
Proof: Functions of independent random variables are also independent
random variables. Thus, since the Xi are i.i.d., so are log p(Xi ). Hence,
• The AEP by
forthe weak law of RVs
continuous large numbers,
says.....
1 1$
− log p(X1 , X2 , . . . , Xn ) = − log p(Xi ) (3.3)
n n
i
Prove 2 ways!
Prove 2 ways!
Estimation
values inerror
X̂. We will and
not restrictdifferential entropy
calculate a function g(Y ) = X̂, where X̂ is an estimate of X and takes on
the alphabet X̂ to be equal to X, and we
will also allow the function g(Y ) to be random. We wish to bound the
probability that X̂ != X. We observe that X → Y → X̂ forms a Markov
chain. Define the probability of error
• A counter part to Fano’s inequality ! for discrete
" RVs...
Pe = Pr X̂ != X . (2.129)
Proof: We first ignore the role of Y and prove the first inequality in
(2.130). We will then use the data-processing inequality to prove the more
traditional form of Fano’s inequality, given by the second inequality in
(2.130). Define an error random variable,
#
1 if X̂ != X,
E= (2.133)
0 if X̂ = X.
SUMMARY
!
h(X) = h(f ) = − f (x) log f (x) dx (8.81)
S
.
f (X n )=2−nh(X) (8.82)
. nh(X)
Vol(A(n)
! )=2 . (8.83)
H ([X]2−n ) ≈ h(X) + n. (8.84)
256 1
DIFFERENTIALh( N(0, σ 2 ))
ENTROPY log 2π eσ 2 .
= (8.85)
2
1
h(Nn (µ, K)) = log(2π e)n |K|. (8.86)
SUMMARY
2
!
f
D(f ||g) = f log ! ≥ 0. (8.87)
g
h(X) = h(f n) = − f (x) log f (x) dx (8.81)
" S
h(X1 , X2 , . . . , Xn ) = h(Xi |X1 , X2 , . . . , Xi−1 ). (8.88)
.
f (X n )=2−nh(X)
i=1 (8.82)
. ) ≤ h(X).
h(X|Y (8.89)
Vol(A(n)
! )=2
nh(X)
. (8.83)
h(aX) = h(X) + log |a|. (8.90)
HFall([X]
University of Illinois at Chicago ECE 534, −n ) ≈
2013, 2Natasha h(X)! + n.
Devroye (8.84)
f (x, y)
I (X; Y ) 1 = f (x, y) log ≥ 0. (8.91)
h(N(0, σ 2 )) = log 2π eσ 2 . f (x)f (y) (8.85)
2 1 n
max h(X) = log(2π e) |K|. (8.92)
EXXt =K 1 2
h(Nn (µ, K)) = log(2π e)n |K|. (8.86)
22 1 2h(X|Y )
E(X − X̂(Y )) !≥ e .
2π e f
Summary D(f ||g) = f log ≥ 0. (8.87)
g for a discrete random variable.
2nH (X) is the effective alphabet size
2nh(X) is the effective support
"n set size for a continuous random variable.
2C
is the effective alphabet
h(X1 , X2 , . . . , Xn ) = size
h(Xofi |X
a 1channel
, X2 , . .of. ,capacity
Xi−1 ). C. (8.88)
i=1
8.1 h(aX)
Differential = h(X)
entropy. + log |a|.
Evaluate (8.90)
the differential entropy h(X) =
# !
− f ln f for the following: f (x, y)
(a) The
I (X; Y ) = density,
exponential f (x, y) log = λe−λx , x ≥
f (x) ≥ 0.0. (8.91)
f (x)f (y)
1
max h(X) = log(2π e)n |K|. (8.92)
EXXt =K 2
1 2h(X|Y )
E(X − X̂(Y ))2 ≥ e .
2π e
2nH (X) is the effective alphabet size for a discrete random variable.
2nh(X) is the effective support set size for a continuous random variable.
2C is the effective alphabet size of a channel of capacity C.
PROBLEMS
University of Illinois at Chicago ECE 534, Fall 2013, Natasha Devroye
8.1 Differential
# entropy. Evaluate the differential entropy h(X) =
− f ln f for the following:
(a) The exponential density, f (x) = λe−λx , x ≥ 0.