Shannon Final SCS

9 July 2008
Source Coding and Simulation

Robert M. Gray
Information Systems Laboratory
Department of Electrical Engineering
Stanford, CA 94305
rmgray@stanford.edu
http://ee.stanford.edu/∼ gray
Historical and recent research described here was supported in part by
Source Coding and Simulation 1

Source coding and simulation
Source coding/compression/quantization
source reproduction
bits-
{Xn} -
encoder decoder -
{X̂n}
Simulation/synthesis/fake process
simulation
random bits -
coder -
{X̃n}

Source
X = {Xn; n ∈ Z} stationary and ergodic random process, distribution µ
Xn ∈ AX = alphabet: discrete, continuous, or mixed
random vectors X N = (X0, X1, · · · , XN−1), distribution µN
Shannon entropy
 P
− x N µ N N
(x ) log µ N N
(x ) AX discrete
H(X ) = H(µ ) = 
N N


∞
 otherwise
Shannon entropy (rate) H(X) = H(µ) = inf H(X N )/N = lim H(X N )/N
N N→∞
& other information measures

Source coding with a fidelity criterion
[Shannon (1959)]
Communicate a source {Xn} to a user through a bit pipe
source bits reproduction

{Xn} -
encoder -
decoder -
{X̂n}
What is the best tradeoff between the rate in bits per source sample
and the quality of the reproduction with respect to the input?
Shannon rate-distortion theory, source coding with a fidelity

criterion, lossy data compression, quantization

The simulation problem (1977)
Simulate (synthesize, imitate, model, fake) a source {Xn}
simulation
random bits -
coder -
{X̃n}
What is the best simulation of the source given
• a simple random bit generation, e.g., coin flips (iid),
• a stationary (time-invariant) coder, and
• a constraint on # of bits (possibly infinite) per simulated symbol?

Would like a simulated process to
• have key properties of original process: stationarity, ergodicity,

mixing, 0-1 law (purely nondeterministic, K)
• “resemble” original as closely as possible
• be perfect if bitrate sufficient. I.e., same distributions as original

source. What stationary ergodic processes have exactly this form?
(Not all do!) (modeling, taxonomy of random processes)
An alternative notion of simulation introduced by Steinburg and

Verdu (1996) and related to source coding. Does not require
stationarity and ergodicity (or preservation of such properties).

O An information theoretic “folk theorem”
If source code nearly optimal,

then bits ≈ iid fair coin flips
source
↓ reproduction
bits
{Xn} -
encoder -
decoder -
{X̂n}
Bits are maximally informative, maximum entropy.

True??
Coin flips provide simple input mechanism for simulation.
Suggests connection between source coding and simulation:

Source coding/compression
source reproduction
bits-
{Xn} -
encoder 6
decoder -
{X̂n}
Simulation/synthesis/fake process
?
simulation
random bits -
coder -
{X̃n}
Does nearly optimal performance ⇒ “nearly” iid bits?

Are source decoders and source simulators equivalent?
Are source coding and simulation equivalent?

Coding
Two basic coding structures for coding a process X with alphabet

AX into Y with alphabet AY :
Block coding (BC) Map each nonoverlapping block of source

symbols into an index or block of encoded symbols (e.g., bits)
(standard for IT)
Sliding-block coding (SBC) Map overlapping blocks of source

symbols into single encoded symbol (e.g., bit)
(standard for ergodic theory)
There are constructions in IT and ergodic theory to get BC from

SBC & vice-versa.
Block Coding E : ANX → AYN (or other index set), N = block length
··· , | X−N , X−N+1
{z, . . . , X−1}, X0, X1, {z
| . . . , XN−1}, XN , X1, {z
| . . . , X2N−1}, · · ·
··· , ↓ E ↓ E ↓ E ···
· · · , Y−N , Y−N+1, . . . , Y−1, Y0, Y1, . . . , YN−1, YN , Y1, . . . , Y2N−1, · · ·
z }| { z }| { z }| {
Sliding-block Coding N =window length = N1 +N2 +1, f : ANX → AY
· · · , Xn−N1 , Xn−N1+1, · · · , Xn, Xn+1, · · · , Xn+N2 , Xn+N2+1, · · ·

| {z }
slide window −→ | {z }
f f
?
Yn = f (Xn−N1 , . . . , Xn, . . . , Xn+N2 )
Yn+1 = f (Xn−N1+1, . . . , Xn+1, . . . , Xn+N2+1)

?

Block coding
U
• Far more known about design: e.g., transform codes, vector
quantization, clustering
D
• Does not preserve key properties (stationarity, ergodicity, mixing,
0-1 law)
In general output neither stationary nor ergodic (it is N -stationary
and can have a periodic structure, not necessarily N -ergodic).
Can “stationarize” with uniform random start, but retains possible
periodicities. Not equivalent to SBC of input.
• Not defined for infinite block length, no limiting codes. D

Sliding-block (stationary, sliding-window) coding
• preserves key properties of input process: stationarity, ergodicity,

mixing, 0-1 law
• well-defined for N = ∞. Infinite codes can be approximated by finite

codes. Sequence of finite codes can converge.
• models many communication and signal processing techinques:

time-invariant convolutional codes, predictive quantization, nonlinear
and linear time-invariant filtering, wavelet coefficient evaluation
• used to prove fundamental results in ergodic theory, e.g., the

Kolmogorov-Sinai-Ornstein isomorphism theorem:

Sliding-block coding and isomorphism
A sliding-block code (SBC) has the form
{Xn} - Xn+N2 ··· Xn ··· Xn−N1
XXX
XXX
XXX
XXX
XXX
XXX
XXX '$
?
XXX

f
Xz
X 9

&%
- Yn = f (Xn−N1 , . . . , Xn, . . . , Xn+N2 )

Infinite N1, N2 are allowed. Two processes are isomorphic if there
exists an invertible SBC from either process to the other.
A process is a B-process if it is an SBC of an iid process.
Ornstein proved (1970) that two B-processes are isomorphic iff
their entropy rates are equal.

Source Coding: Block Coding
Distortion measure dN (x , y ) = d1(xi, yi)

N N 1 PN−1
N i=0
Codebook/Decoder CN = {DN (i); i ∈ I}, |I| = M
Encoder EN : ANX → I

Distortion D(EN , DN ) = E dN (X , DN (EN (X )))
N N

1
 N log M
 fixed-rate
Rate R(EN ) = 

N −1 H(EN (X N )) variable-rate


Optimal performance? Operational distortion-rate function (DRF)
δ(N)
BC (R) = inf D(EN , DN )
EN ,DN :R(EN )≤R
δBC(R) = inf δ(N)

BC (R) = lim δ(N)
BC (R)
N N→∞
Not computable. Evaluate by Shannon DRF:
DX (R) = inf DN (R) = lim DN (R)

N N→∞
DN (R) = inf EdN (X N , Y N )

pN :pN ⇒µN ,N −1 I(X N ,Y N )≤R
Block Source Coding Theorem: For a stationary and

ergodic source∗, δBC(R) = DX (R)
*With the usual technical conditions.

Source Coding: Sliding-Block Coding
Encoder fN : ANX → AU , Un = fN (Xn−N1 , . . . , Xn+N2 )
K
Decoder gK : AU → ÂX , X̂n = gK (Un−K1 , . . . , Un+K2 )

Distortion D( f, g) = E d1(X0, X̂0) , Rate R( f ) = log |AU |
Optimal performance:
SBC (R) =
δ(N,K) inf D( fN , gK )
fN ,gK :R( f )≤R
δSBC(R) = inf δ(N,K)

SBC (R) = inf D( f, g)
N,K f,g:R( f )≤R
Sliding-lock Source Coding Theorem: For a stationary

and ergodic source∗, δBC(R) = δSBC(R) = DX (R)
*ditto

X0, X1,{z
| . . . , XN−1
} XN , X1, {z
| . . . , X2N−1
} X2N , X1,{z
| . . . , X3N−1} ···
↓ EN ↓ EN ↓ EN ···
| 0, U1,{z. . . , U N−1 U N , U N+1{z, . . . , U2N−1 | 2N , U2N+1 , . . . , U3N−1
z }| { z }| { z }| {
U } | } U {z } ···
↓ DN ↓ DN ↓ DN ···
z }| { z }| { z }| {
X̂0, X̂1, . . . , X̂N−1 X̂N , X̂1, . . . , X̂2N−1 X̂2N , X̂1, . . . , X̂3N−1 ···
vs. · · · , Xn−N1−1 Xn−N1 , · · · , Xn, · · · , Xn+N2 , Xn+N2+1, · · ·
| {z }
↓ f
· · · , Un−K1−1, Un−K1 , · · · , Un, · · · , Un+K2 , Un+K2+1, · · ·
| {z }
↓g
· · · , X̂n−1, X̂n, X̂n+1, · · ·
If coding nearly optimal, is Un nearly iid?

Process Distance Measures
How quantify “nearly iid”?
Related: How quantify “best” simulation?
One approach: process distortion measures
Useful example in information theory and ergodic theory:
d̄-distance: Kantorovich/Vasershtein/Ornstein distance

Basic ideas:
Two stationary random processes, X with distribution µ, Y with

distribution ν. Vector distortion dN .
d̄N (µN , νN ) = inf E pN dN (X N , Y N )

pN ⇒µN ,νN
d̄(µ, ν) = sup d̄N (µN , νN ) = inf E pd1(X0, Y0)

N p⇒µ,ν
Smallest achievable distortion between two processes with given

marginals over all joint distributions consistent with marginals.
Many equivalent definitions. E.g., how much have to change one

typical sequence of one source to get a typical sequence of another.

Historical aside
d̄N rediscovered and renamed numerous times.
Kantorovich (1942): metrics on compact metric spaces. Often

called the Kantorovich or transportation metric. Inseparable from
development of linear programming.
Early focus on scalar case and `r norms: Dall’Aglio (1956), Frechet

(1956), Vasershtein/Wasserstein (1969), Mallows (1972), Vallender
(1973).
Ornstein (1970-73) used the idea with the Hamming distance on

vectors and processes. Called the d̄ distance. First appearance as
distance measure on processes.

Gray, Neuhoff, and Shields (1975) considered vector and process
case using additive distortion measures, including d1(x, y) = |x − y|2,
calling the distortion ρ̄ after Ornstein. Vector case equivalent to
subsequent development of Kantorovich for `r norms to vectors (Lr -
minimal metric):
∆ h i 1r
ρ̄1/r (µN , νN ) = `r (µN , νN ) = N d̄N (µN , νN )
h i 1r
= inf E(||X N − Y N ||rr )
pN ⇒µN ,νN
Usually reserve notation d̄ for Ornstein (Hamming), use ρ̄ for `rr
Rediscovered as “earth mover’s distance” in CS literature, used in

clustering algorithms for pattern recognition. Later renamed (1981)
“Mallows distance” after 1972 rediscovery of scalar Kantorovich.

Properties
• Ornstein d̄ distance and Lr -minimal distance/ρ̄1/r are metrics.
• Infimum is actually a minimum.
• The class of all B-processes of a given alphabet is the closure under

Ornstein’s d̄ of all k-step mixing Markov processes of that alphabet.
• Entropy rate is continuous in d̄, Shannon DRF in ρ̄
• Can evaluate ρ̄ for iid, purely nondeterministic Gaussian processes,

filtered uniform iid, d̄ for discrete iid. In general a linear
programming problem.

Application 1: Geometric view of source coding
δBC(R) = δSBC(R) = DX (R) = inf ρ̄(µ, ν)

ν:H(ν)≤R
[Gray, Neuhoff, and Omura (1974)]
A form of simulation, but cannot say ν generated from iid.
Distance to “closest” process in ρ̄ with entropy rate ≤ R
Compare with process version of Shannon DRF [Marton (1972)]:
DX (R) = inf E[d1(X0, Y0)]

p:p⇒µ,I(X,Y)≤R

Application 2: Quantization as distribution
approximation
[Pollard (1982), Graf and Luschgy (2000)]
(Vector) Quantizer ⇔ probability distribution on codebook
Block coding/quantization: fixed rate
δ(N)
BC (R) = inf ρ̄ N (µ ,ν )
N N
νN
Minimum is over all discrete distributions νN with 2NR atoms.
Suppose discrete distribution (π, CN ) = {πi, yi; i = 1, . . . , 2NR},

P2NR
i=1 π i = 1, yi ∈ A N
X , solves minimization ⇒
a discrete simulation of X N ⇒ block independent, N -stationary
process simulation

Application 3: Optimal simulation and source coding
A definition of optimal simulation of process X ∼ µ using an SBC of

an iid process Z [Gray (1977)]:
∆(X|Z) = inf ρ̄(µ, µ̃)

µ̃:Zn- g - X̃n ∼ µ̃
Sliding-block coding reduces entropy ⇒ H(Z) ≥ H(µ̃) ⇒
∆(X|Z) ≥ inf ρ̄(µ, µ̂) = DX (H(Z))

stationary ergodic µ̂:H(µ̂)≤H(Z)

If X is a B-process, converse is true and
∆(X|Z) = DX (H(Z)) = δSBC(H(Z))

⇒ the source coding and simulation problems are
equivalent if the source is a B-process
Proof: Choose f, g: Ed1(X0, X̂0) ≈ DX (1),

iid Zn, H(Z) = 1 ≥ H(U)
?
α Sinai theorem
Xn -
f ? -
g -
X̂n ∼ µ̂
Un
Cascade β = gα is SBC producing X̂ from Z , Ed1(X0, X̂0) ≥ ∆(X|Z).

Bit behavior for near optimal codes
Suppose use block code CN to code source. Let π denote induced

index pmf.
What can be said about π if code performance near Shannon

optimal?
Approximately uniform, like 2N coin flips?
Sort of . . .
Shannon ⇒ there is an asymptotically optimal sequence of block

codes C(N) for which DN = EdN (X N , X̂ N ) ↓ DX (1)
RX (D) is a continuous function, hence
1 = N −1 log2 2N ≥ N −1 H(E(X N )) ≥ N −1 H(X̂ N )

≥ N −1 I(X N ; X̂ N ) ≥ RN (DN ) ≥ RX (DN ) → 1
N→∞
As blocklength grows, indexes have maximal per symbol entropy and

hence can be thought of as approximately uniformly distributed, but
not stationary or ergodic and can not get process theorem — does
not determine entropy rate or show that overall process behavior is
like coin flips, even if stationarize.
If use SBCs, can get rigorous process version:

Choose f (N), g(N) so that DN = D( f (N), g(N)) ↓ DX (1)
Let U (N), X̂ (N) denote encoded and reproduction processes

(necessarily stationary and ergodic)
1 ≥ H(U (N)) ≥ H(X̂ (N)) ≥ I(X, X̂ (N))

≥ R(DN ) → 1
N→∞
lim H(U (N)) = 1 ⇒ lim d̄(U (N), Z) = 0

N→∞ N→∞
Proof: Marton’s inequality for relative entropy and d̄ (T. Linder)
As average distortion nears Shannon limit for stationary ergodic

source, binary channel process approaches coin flips in d̄

Recap
Old: If source is a stationary filtering of an iid process (a B-process,

discrete or continuous alphabet), then the source coding problem
and the simulation problem have the same solution (and the optimal
simulator and decoder are equivalent).
New: If stationary source coding performs close to Shannon

optimum, encoded process is close to iid in d̄
Frosting: An excuse to present ideas of modeling, coding,

and process distance measures common to ergodic theory and
information theory.

A few final thoughts and questions
• The d̄ close to iid property is nice for intuition, but does it actually
help?
E.g., B-processes (SBC of iid process) have many special properties.

Are there weak versions of those properties for processes that = SBC
of a process d̄-close to iid?
• Does equivalence of source coding and simulation hold for

the more general case of stationary and ergodic sources? —
Steinberg/Verdu results hold more generally, but in ergodic theory
it is known that there are stationary, ergodic, mixing, purely
nondeterministic processes which are not d̄ close to a B-process.

• Source coding as “almost isomorphism,” avoids hard part
(invertibility).
• How does fitting a model using ρ̄ compare to the Itakura-Saito

distortion used in speech processing to fit autoregressive speech
models to real speech? Can Marton/Talagrand inequalities be
extended? (Steinberg/Verdu considered relative entropy rates in their
simulation problem formulation.)
• Shortcoming of B-processes: In speech, only model unvoiced

sounds well. Voiced sounds better modeled by periodic input to same
filter type: 0-entropy. Composite models? Connections to Pinsker’s
A
( disproved) conjecture regarding products of K processes and 0-
entropy processes?
• Simulator design, e.g., best fake Gaussian from bits?


Shannon Final SCS

Uploaded by

Copyright:

Available Formats

You might also like

Shannon Final SCS

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shannon Final SCS

Uploaded by

Copyright:

Available Formats

9 July 2008

Source Coding and Simulation

Historical and recent research described here was supported in part by

Source Coding and Simulation 1

Source Coding and Simulation 2

X = {Xn; n ∈ Z} stationary and ergodic random process, distribution µ

Xn ∈ AX = alphabet: discrete, continuous, or mixed

random vectors X N = (X0, X1, · · · , XN−1), distribution µN

& other information measures

Communicate a source {Xn} to a user through a bit pipe

source bits reproduction

Shannon rate-distortion theory, source coding with a fidelity

Source Coding and Simulation 4

Simulate (synthesize, imitate, model, fake) a source {Xn}

What is the best simulation of the source given

• a simple random bit generation, e.g., coin flips (iid),

• a stationary (time-invariant) coder, and

• a constraint on # of bits (possibly infinite) per simulated symbol?

Source Coding and Simulation 5

• have key properties of original process: stationarity, ergodicity,

• “resemble” original as closely as possible

• be perfect if bitrate sufficient. I.e., same distributions as original

An alternative notion of simulation introduced by Steinburg and

Source Coding and Simulation 6

If source code nearly optimal,

Bits are maximally informative, maximum entropy.

Suggests connection between source coding and simulation:

Source Coding and Simulation 7

Does nearly optimal performance ⇒ “nearly” iid bits?

Source Coding and Simulation 8

Two basic coding structures for coding a process X with alphabet

Block coding (BC) Map each nonoverlapping block of source

Sliding-block coding (SBC) Map overlapping blocks of source

There are constructions in IT and ergodic theory to get BC from

Sliding-block Coding N =window length = N1 +N2 +1, f : ANX → AY

· · · , Xn−N1 , Xn−N1+1, · · · , Xn, Xn+1, · · · , Xn+N2 , Xn+N2+1, · · ·

Yn = f (Xn−N1 , . . . , Xn, . . . , Xn+N2 )

Yn+1 = f (Xn−N1+1, . . . , Xn+1, . . . , Xn+N2+1)

Source Coding and Simulation 10

• Not defined for infinite block length, no limiting codes. D

• preserves key properties of input process: stationarity, ergodicity,

• well-defined for N = ∞. Infinite codes can be approximated by finite

• models many communication and signal processing techinques:

• used to prove fundamental results in ergodic theory, e.g., the

Source Coding and Simulation 12

{Xn} - Xn+N2 ··· Xn ··· Xn−N1

- Yn = f (Xn−N1 , . . . , Xn, . . . , Xn+N2 )

Source Coding and Simulation 13

Distortion measure dN (x , y ) = d1(xi, yi)

Codebook/Decoder CN = {DN (i); i ∈ I}, |I| = M

Source Coding and Simulation 14

δBC(R) = inf δ(N)

Not computable. Evaluate by Shannon DRF:

DX (R) = inf DN (R) = lim DN (R)

DN (R) = inf EdN (X N , Y N )

Block Source Coding Theorem: For a stationary and

*With the usual technical conditions.

Source Coding and Simulation 15