Week 10 - Differential Entropy

INFORMATION THEORY & CODING
Week 10 : Differential Entropy
Dr. Rui Wang
Department of Electrical and Electronic Engineering

Southern University of Science and Technology (SUSTech)
Email: wang.r@sustech.edu.cn
November 16, 2020
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 1 / 16
Differential Entropy - 1
Definitions
AEP for Continuous Random Variables
Relation of differential entropy to discrete entropy
From discrete to continuous variables
Differential Entropy
Definition
Let X be a random variable with cumulative distribution function (CDF)
F (x) = Pr(X ≤ x). If F (x) is continuous, the random variable is
Rcontinuous. Let f (x) = F 0 (X) when the derivative is defined. If
+∞
−∞ f (x) = 1, f (x) is called the probability density function (pdf) for X.
The set of x where f (x) > 0 is called the support set of the X.
Definition
The differential entropy h(X) of a continuous random variable X with
density f (x) is defined as
Z
h(X) = − f (x) log f (x)dx = h(f ),
S
where S is the support set of the random variable.
Example: Uniform distribution
f (x) = a1 , x ∈ [0, a]
The differential entropy is:

Z a
1 1
h(X) = − log dx = log a bits
0 a a
for a < 1, h(X) = log a< 0, differential entropy can be negative!

(unlike discrete entropy)
Example: Uniform distribution
f (x) = a1 , x ∈ [0, a]
The differential entropy is:

Z a
1 1
h(X) = − log dx = log a bits
0 a a
for a < 1, h(X) = log a< 0, differential entropy can be negative!

(unlike discrete entropy)
Example: Normal distribution
2
X ∼ φ(x) = √ 1 exp( −x
2σ 2
), x ∈ R
2πσ 2
Differential entropy:
1
h(φ) = log 2πeσ 2 bits
2
Calculation:
Z
x2
Z √
h(φ) = − φ log φdx = − φ(x) − 2 log e − log 2πσ dx 2
2σ
E(X ) 2 1 1 1
= 2
log e + log 2πσ 2 = log e + log 2πσ 2
2σ 2 2 2
1 2
= log 2πeσ
2
Example: Normal distribution
2
X ∼ φ(x) = √ 1 exp( −x
2σ 2
), x ∈ R
2πσ 2
Differential entropy:
1
h(φ) = log 2πeσ 2 bits
2
Calculation:
Z
x2
Z √
h(φ) = − φ log φdx = − φ(x) − 2 log e − log 2πσ dx 2
2σ
E(X ) 2 1 1 1
= 2
log e + log 2πσ 2 = log e + log 2πσ 2
2σ 2 2 2
1 2
= log 2πeσ
2
AEP for continuous random variables
Discrete world: for a sequence of i.i.d. random variables

1
log p(X1 , X2 , . . . , Xn ) → H(X).
n
Continuous world: for a sequence of i.i.d. random variables
1
− log f (X1 , X2 , . . . , Xn ) → E[− log f (X)] = h(X) in probability
n
Proof follows from the weak law of large numbers.
AEP for continuous random variables
Discrete world: for a sequence of i.i.d. random variables

1
log p(X1 , X2 , . . . , Xn ) → H(X).
n
Continuous world: for a sequence of i.i.d. random variables
1
− log f (X1 , X2 , . . . , Xn ) → E[− log f (X)] = h(X) in probability
n
Proof follows from the weak law of large numbers.
Typical set
Discrete case: number of typical sequences

(n)
A ≈ 2nH(X)
Continuous case: The volume of the typical set

Z
Vol(A) = dx1 dx2 . . . dxn , A ⊂ Rn .
A
Definition
(n)
For > 0 and any n, we define the typical set A with respect to f (x) as
follows:

1
A = (x1 , x2 , . . . , xn ) ∈ S n
(n)

: − log f (x1 , x2 , . . . , xn ) − h(X) ≤ ,

n
Qn
where f (x1 , x2 , . . . , xn ) = i=1 f (xi ).
Typical set
Theorem
(n)
The typical set A has the following properties:
(n)
1. Pr(A ) > 1 − for n sufficiently large.
(n)
2. Vol(A ) ≤ 2n(h(X)+) for all n.
(n)
3. Vol(A ) ≥ (1 − )2n(h(X)−) for n sufficiently large.
Proof. 1.
Similar to the discrete case.
By definition, − n1 log f (X n ) = − n1
P
log f (Xi ) → h(X) in
probability.
Typical set
Theorem
(n)
(n)
1. Pr(A ) > 1 − for n sufficiently large.
(n)
(n)
Proof. 1.
Similar to the discrete case.
By definition, − n1 log f (X n ) = − n1
P
log f (Xi ) → h(X) in
probability.
Typical set
Theorem
(n)
(n)
Poof. 2.
Z
1= f (x1 , x2 , . . . , xn )dx1 dx2 . . . dxn
n
ZS
≥ f (x1 , x2 , . . . , xn )dx1 dx2 . . . dxn
(n)
ZA Z
−n(h(X)+) −n(h(X)+)
≥ 2 dx1 dx2 . . . dxn = 2 dx1 dx2 . . . dxn
(n) (n)
A A
−n(h(X)+)
=2 Vol(A(n)
).
Typical set
Theorem
(n)
(n)
Proof. 3.
Z
1−≤ f (x1 , x2 , . . . , xn )dx1 dx2 . . . dxn
(n)
A
Z
≤ 2−n(h(X)−) dx1 dx2 . . . dxn
(n)
A
Z
= 2−n(h(X)−) dx1 dx2 . . . dxn
(n)
A
= 2−n(h(X)−) Vol(A(n)
).
An interpretation
The volume of the smallest set that contains most of the probability
is approximately 2nh(X) .
For an n-dim volume, this means that each dim has length
1
2nh(X) n = 2h(X) .
Mean value theorem (MVT)
If a function f is continuous on the closed interval [a, b], and differentiable
on (a, b), then there exists a point c ∈ (a, b) such that
f (b) − f (a)
f 0 (c) =
b−a
Consider a random variable X with pdf f (x). We divide the range of

X into bins of length ∆.
MVT: there exists a value xi ∈ (i∆, (i + 1)∆) within each bin such
that
Z (i+1)∆
f (xi )∆ = f (x)dx.
i∆
Define the quantized random variable as X ∆ = xi if

i∆ ≤ X ≤ (i + 1)∆ with pmf
Z (i+1)∆
pi = Pr[X ∆ = xi ] = f (x)dx = f (xi )∆.
i∆
The entropy of X ∆ is
+∞
X X
H(X ∆ ) = − pi log pi = − ∆f (xi ) log f (xi ) − log ∆.
−∞
If f (x) is is Riemann integrable, as ∆ → 0,
H(X ∆ ) + log ∆ → h(f ) = h(X)

i∆ ≤ X ≤ (i + 1)∆ with pmf
Z (i+1)∆
pi = Pr[X ∆ = xi ] = f (x)dx = f (xi )∆.
i∆
+∞
X X
−∞
H(X ∆ ) + log ∆ → h(f ) = h(X)

i∆ ≤ X ≤ (i + 1)∆ with pmf
Z (i+1)∆
pi = Pr[X ∆ = xi ] = f (x)dx = f (xi )∆.
i∆
+∞
X X
−∞
H(X ∆ ) + log ∆ → h(f ) = h(X)
Reading & Homework
Reading: Chapter 8: 8.1 - 8.3

Homework: Problems 8.1, 8.5, 8.7

Week 10 - Differential Entropy

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 10 - Differential Entropy

Uploaded by

Copyright:

Available Formats

INFORMATION THEORY & CODING

Week 10 : Differential Entropy

Dr. Rui Wang

Department of Electrical and Electronic Engineering

November 16, 2020

AEP for Continuous Random Variables

Relation of differential entropy to discrete entropy

where S is the support set of the random variable.

The differential entropy is:

for a < 1, h(X) = log a< 0, differential entropy can be negative!

The differential entropy is:

for a < 1, h(X) = log a< 0, differential entropy can be negative!

Discrete world: for a sequence of i.i.d. random variables

Discrete world: for a sequence of i.i.d. random variables

Continuous case: The volume of the typical set

Consider a random variable X with pdf f (x). We divide the range of

Define the quantized random variable as X ∆ = xi if

If f (x) is is Riemann integrable, as ∆ → 0,

H(X ∆ ) + log ∆ → h(f ) = h(X)

Define the quantized random variable as X ∆ = xi if

If f (x) is is Riemann integrable, as ∆ → 0,

H(X ∆ ) + log ∆ → h(f ) = h(X)

Define the quantized random variable as X ∆ = xi if

If f (x) is is Riemann integrable, as ∆ → 0,

H(X ∆ ) + log ∆ → h(f ) = h(X)

Reading: Chapter 8: 8.1 - 8.3

You might also like