Professional Documents
Culture Documents
Week 10 - Differential Entropy
Week 10 - Differential Entropy
Email: wang.r@sustech.edu.cn
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 1 / 16
Differential Entropy - 1
Definitions
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 2 / 16
From discrete to continuous variables
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 3 / 16
Differential Entropy
Definition
Let X be a random variable with cumulative distribution function (CDF)
F (x) = Pr(X ≤ x). If F (x) is continuous, the random variable is
Rcontinuous. Let f (x) = F 0 (X) when the derivative is defined. If
+∞
−∞ f (x) = 1, f (x) is called the probability density function (pdf) for X.
The set of x where f (x) > 0 is called the support set of the X.
Definition
The differential entropy h(X) of a continuous random variable X with
density f (x) is defined as
Z
h(X) = − f (x) log f (x)dx = h(f ),
S
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 4 / 16
Example: Uniform distribution
f (x) = a1 , x ∈ [0, a]
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 5 / 16
Example: Uniform distribution
f (x) = a1 , x ∈ [0, a]
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 5 / 16
Example: Normal distribution
2
X ∼ φ(x) = √ 1 exp( −x
2σ 2
), x ∈ R
2πσ 2
Differential entropy:
1
h(φ) = log 2πeσ 2 bits
2
Calculation:
Z
x2
Z √
h(φ) = − φ log φdx = − φ(x) − 2 log e − log 2πσ dx 2
2σ
E(X ) 2 1 1 1
= 2
log e + log 2πσ 2 = log e + log 2πσ 2
2σ 2 2 2
1 2
= log 2πeσ
2
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 6 / 16
Example: Normal distribution
2
X ∼ φ(x) = √ 1 exp( −x
2σ 2
), x ∈ R
2πσ 2
Differential entropy:
1
h(φ) = log 2πeσ 2 bits
2
Calculation:
Z
x2
Z √
h(φ) = − φ log φdx = − φ(x) − 2 log e − log 2πσ dx 2
2σ
E(X ) 2 1 1 1
= 2
log e + log 2πσ 2 = log e + log 2πσ 2
2σ 2 2 2
1 2
= log 2πeσ
2
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 6 / 16
AEP for continuous random variables
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 7 / 16
AEP for continuous random variables
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 7 / 16
Typical set
Discrete case: number of typical sequences
(n)
A ≈ 2nH(X)
Definition
(n)
For > 0 and any n, we define the typical set A with respect to f (x) as
follows:
1
A = (x1 , x2 , . . . , xn ) ∈ S n
(n)
: − log f (x1 , x2 , . . . , xn ) − h(X) ≤ ,
n
Qn
where f (x1 , x2 , . . . , xn ) = i=1 f (xi ).
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 8 / 16
Typical set
Theorem
(n)
The typical set A has the following properties:
(n)
1. Pr(A ) > 1 − for n sufficiently large.
(n)
2. Vol(A ) ≤ 2n(h(X)+) for all n.
(n)
3. Vol(A ) ≥ (1 − )2n(h(X)−) for n sufficiently large.
Proof. 1.
Similar to the discrete case.
By definition, − n1 log f (X n ) = − n1
P
log f (Xi ) → h(X) in
probability.
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 9 / 16
Typical set
Theorem
(n)
The typical set A has the following properties:
(n)
1. Pr(A ) > 1 − for n sufficiently large.
(n)
2. Vol(A ) ≤ 2n(h(X)+) for all n.
(n)
3. Vol(A ) ≥ (1 − )2n(h(X)−) for n sufficiently large.
Proof. 1.
Similar to the discrete case.
By definition, − n1 log f (X n ) = − n1
P
log f (Xi ) → h(X) in
probability.
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 9 / 16
Typical set
Theorem
(n)
The typical set A has the following properties:
(n)
2. Vol(A ) ≤ 2n(h(X)+) for all n.
Poof. 2.
Z
1= f (x1 , x2 , . . . , xn )dx1 dx2 . . . dxn
n
ZS
≥ f (x1 , x2 , . . . , xn )dx1 dx2 . . . dxn
(n)
ZA Z
−n(h(X)+) −n(h(X)+)
≥ 2 dx1 dx2 . . . dxn = 2 dx1 dx2 . . . dxn
(n) (n)
A A
−n(h(X)+)
=2 Vol(A(n)
).
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 10 / 16
Typical set
Theorem
(n)
The typical set A has the following properties:
(n)
3. Vol(A ) ≥ (1 − )2n(h(X)−) for n sufficiently large.
Proof. 3.
Z
1−≤ f (x1 , x2 , . . . , xn )dx1 dx2 . . . dxn
(n)
A
Z
≤ 2−n(h(X)−) dx1 dx2 . . . dxn
(n)
A
Z
= 2−n(h(X)−) dx1 dx2 . . . dxn
(n)
A
= 2−n(h(X)−) Vol(A(n)
).
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 11 / 16
An interpretation
The volume of the smallest set that contains most of the probability
is approximately 2nh(X) .
For an n-dim volume, this means that each dim has length
1
2nh(X) n = 2h(X) .
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 12 / 16
Mean value theorem (MVT)
If a function f is continuous on the closed interval [a, b], and differentiable
on (a, b), then there exists a point c ∈ (a, b) such that
f (b) − f (a)
f 0 (c) =
b−a
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 13 / 16
Relation of differential entropy to discrete entropy
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 14 / 16
Relation of differential entropy to discrete entropy
The entropy of X ∆ is
+∞
X X
H(X ∆ ) = − pi log pi = − ∆f (xi ) log f (xi ) − log ∆.
−∞
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 15 / 16
Relation of differential entropy to discrete entropy
The entropy of X ∆ is
+∞
X X
H(X ∆ ) = − pi log pi = − ∆f (xi ) log f (xi ) − log ∆.
−∞
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 15 / 16
Relation of differential entropy to discrete entropy
The entropy of X ∆ is
+∞
X X
H(X ∆ ) = − pi log pi = − ∆f (xi ) log f (xi ) − log ∆.
−∞
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 15 / 16
Reading & Homework
Dr. Rui Wang (EEE) INFORMATION THEORY & CODING November 16, 2020 16 / 16