Shannon Final SCS

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

9 July 2008

Source Coding and Simulation


Robert M. Gray
Information Systems Laboratory
Department of Electrical Engineering
Stanford, CA 94305
rmgray@stanford.edu

http://ee.stanford.edu/∼ gray

Historical and recent research described here was supported in part by

Source Coding and Simulation 1


Source coding and simulation

Source coding/compression/quantization

source reproduction
bits-
{Xn} -
encoder decoder -
{X̂n}

Simulation/synthesis/fake process

simulation
random bits -
coder -
{X̃n}

Source Coding and Simulation 2


Source

X = {Xn; n ∈ Z} stationary and ergodic random process, distribution µ

Xn ∈ AX = alphabet: discrete, continuous, or mixed

random vectors X N = (X0, X1, · · · , XN−1), distribution µN

Shannon entropy
 P
− x N µ N N
(x ) log µ N N
(x ) AX discrete
H(X ) = H(µ ) = 
N N


∞
 otherwise

Shannon entropy (rate) H(X) = H(µ) = inf H(X N )/N = lim H(X N )/N
N N→∞

& other information measures


Source Coding and Simulation 3
Source coding with a fidelity criterion

[Shannon (1959)]

Communicate a source {Xn} to a user through a bit pipe

source bits reproduction


{Xn} -
encoder -
decoder -
{X̂n}

What is the best tradeoff between the rate in bits per source sample
and the quality of the reproduction with respect to the input?

Shannon rate-distortion theory, source coding with a fidelity


criterion, lossy data compression, quantization

Source Coding and Simulation 4


The simulation problem (1977)

Simulate (synthesize, imitate, model, fake) a source {Xn}

simulation
random bits -
coder -
{X̃n}

What is the best simulation of the source given

• a simple random bit generation, e.g., coin flips (iid),

• a stationary (time-invariant) coder, and

• a constraint on # of bits (possibly infinite) per simulated symbol?

Source Coding and Simulation 5


Would like a simulated process to

• have key properties of original process: stationarity, ergodicity,


mixing, 0-1 law (purely nondeterministic, K)

• “resemble” original as closely as possible

• be perfect if bitrate sufficient. I.e., same distributions as original


source. What stationary ergodic processes have exactly this form?
(Not all do!) (modeling, taxonomy of random processes)

An alternative notion of simulation introduced by Steinburg and


Verdu (1996) and related to source coding. Does not require
stationarity and ergodicity (or preservation of such properties).

Source Coding and Simulation 6


O An information theoretic “folk theorem”

If source code nearly optimal,


then bits ≈ iid fair coin flips

source
↓ reproduction
bits
{Xn} -
encoder -
decoder -
{X̂n}

Bits are maximally informative, maximum entropy.


True??
Coin flips provide simple input mechanism for simulation.

Suggests connection between source coding and simulation:

Source Coding and Simulation 7


Source coding/compression

source reproduction
bits-
{Xn} -
encoder 6
decoder -
{X̂n}

Simulation/synthesis/fake process

?
simulation
random bits -
coder -
{X̃n}

Does nearly optimal performance ⇒ “nearly” iid bits?


Are source decoders and source simulators equivalent?
Are source coding and simulation equivalent?

Source Coding and Simulation 8


Coding

Two basic coding structures for coding a process X with alphabet


AX into Y with alphabet AY :

Block coding (BC) Map each nonoverlapping block of source


symbols into an index or block of encoded symbols (e.g., bits)
(standard for IT)

Sliding-block coding (SBC) Map overlapping blocks of source


symbols into single encoded symbol (e.g., bit)
(standard for ergodic theory)

There are constructions in IT and ergodic theory to get BC from


SBC & vice-versa.
Source Coding and Simulation 9
Block Coding E : ANX → AYN (or other index set), N = block length
··· , | X−N , X−N+1
{z, . . . , X−1}, X0, X1, {z
| . . . , XN−1}, XN , X1, {z
| . . . , X2N−1}, · · ·
··· , ↓ E ↓ E ↓ E ···
· · · , Y−N , Y−N+1, . . . , Y−1, Y0, Y1, . . . , YN−1, YN , Y1, . . . , Y2N−1, · · ·
z }| { z }| { z }| {

Sliding-block Coding N =window length = N1 +N2 +1, f : ANX → AY

· · · , Xn−N1 , Xn−N1+1, · · · , Xn, Xn+1, · · · , Xn+N2 , Xn+N2+1, · · ·


| {z }
slide window −→ | {z }
f f
?

Yn = f (Xn−N1 , . . . , Xn, . . . , Xn+N2 )

Yn+1 = f (Xn−N1+1, . . . , Xn+1, . . . , Xn+N2+1)


?

Source Coding and Simulation 10


Block coding

U
• Far more known about design: e.g., transform codes, vector
quantization, clustering

D
• Does not preserve key properties (stationarity, ergodicity, mixing,
0-1 law)
In general output neither stationary nor ergodic (it is N -stationary
and can have a periodic structure, not necessarily N -ergodic).
Can “stationarize” with uniform random start, but retains possible
periodicities. Not equivalent to SBC of input.

• Not defined for infinite block length, no limiting codes. D


Source Coding and Simulation 11
Sliding-block (stationary, sliding-window) coding

• preserves key properties of input process: stationarity, ergodicity,


mixing, 0-1 law

• well-defined for N = ∞. Infinite codes can be approximated by finite


codes. Sequence of finite codes can converge.

• models many communication and signal processing techinques:


time-invariant convolutional codes, predictive quantization, nonlinear
and linear time-invariant filtering, wavelet coefficient evaluation

• used to prove fundamental results in ergodic theory, e.g., the


Kolmogorov-Sinai-Ornstein isomorphism theorem:

Source Coding and Simulation 12


Sliding-block coding and isomorphism
A sliding-block code (SBC) has the form

{Xn} - Xn+N2 ··· Xn ··· Xn−N1

XXX 
XXX  
XXX 
XXX 
XXX 
XXX 
XXX '$
?   
XXX 

f
Xz
X 9


&%

- Yn = f (Xn−N1 , . . . , Xn, . . . , Xn+N2 )


Infinite N1, N2 are allowed. Two processes are isomorphic if there
exists an invertible SBC from either process to the other.
A process is a B-process if it is an SBC of an iid process.
Ornstein proved (1970) that two B-processes are isomorphic iff
their entropy rates are equal.

Source Coding and Simulation 13


Source Coding: Block Coding

Distortion measure dN (x , y ) = d1(xi, yi)


N N 1 PN−1
N i=0

Codebook/Decoder CN = {DN (i); i ∈ I}, |I| = M

Encoder EN : ANX → I

 
Distortion D(EN , DN ) = E dN (X , DN (EN (X )))
N N


1
 N log M
 fixed-rate
Rate R(EN ) = 

N −1 H(EN (X N )) variable-rate

Source Coding and Simulation 14


Optimal performance? Operational distortion-rate function (DRF)

δ(N)
BC (R) = inf D(EN , DN )
EN ,DN :R(EN )≤R

δBC(R) = inf δ(N)


BC (R) = lim δ(N)
BC (R)
N N→∞

Not computable. Evaluate by Shannon DRF:

DX (R) = inf DN (R) = lim DN (R)


N N→∞

DN (R) = inf EdN (X N , Y N )


pN :pN ⇒µN ,N −1 I(X N ,Y N )≤R

Block Source Coding Theorem: For a stationary and


ergodic source∗, δBC(R) = DX (R)

*With the usual technical conditions.

Source Coding and Simulation 15


Source Coding: Sliding-Block Coding
Encoder fN : ANX → AU , Un = fN (Xn−N1 , . . . , Xn+N2 )
K
Decoder gK : AU → ÂX , X̂n = gK (Un−K1 , . . . , Un+K2 )
 
Distortion D( f, g) = E d1(X0, X̂0) , Rate R( f ) = log |AU |

Optimal performance:

SBC (R) =
δ(N,K) inf D( fN , gK )
fN ,gK :R( f )≤R

δSBC(R) = inf δ(N,K)


SBC (R) = inf D( f, g)
N,K f,g:R( f )≤R

Sliding-lock Source Coding Theorem: For a stationary


and ergodic source∗, δBC(R) = δSBC(R) = DX (R)

*ditto

Source Coding and Simulation 16


X0, X1,{z
| . . . , XN−1
} XN , X1, {z
| . . . , X2N−1
} X2N , X1,{z
| . . . , X3N−1} ···
↓ EN ↓ EN ↓ EN ···
| 0, U1,{z. . . , U N−1 U N , U N+1{z, . . . , U2N−1 | 2N , U2N+1 , . . . , U3N−1
z }| { z }| { z }| {
U } | } U {z } ···
↓ DN ↓ DN ↓ DN ···
z }| { z }| { z }| {
X̂0, X̂1, . . . , X̂N−1 X̂N , X̂1, . . . , X̂2N−1 X̂2N , X̂1, . . . , X̂3N−1 ···
vs. · · · , Xn−N1−1 Xn−N1 , · · · , Xn, · · · , Xn+N2 , Xn+N2+1, · · ·
| {z }
↓ f
· · · , Un−K1−1, Un−K1 , · · · , Un, · · · , Un+K2 , Un+K2+1, · · ·
| {z }
↓g
· · · , X̂n−1, X̂n, X̂n+1, · · ·

If coding nearly optimal, is Un nearly iid?

Source Coding and Simulation 17


Process Distance Measures

How quantify “nearly iid”?

Related: How quantify “best” simulation?

One approach: process distortion measures

Useful example in information theory and ergodic theory:

d̄-distance: Kantorovich/Vasershtein/Ornstein distance

Source Coding and Simulation 18


Basic ideas:

Two stationary random processes, X with distribution µ, Y with


distribution ν. Vector distortion dN .

d̄N (µN , νN ) = inf E pN dN (X N , Y N )


pN ⇒µN ,νN

d̄(µ, ν) = sup d̄N (µN , νN ) = inf E pd1(X0, Y0)


N p⇒µ,ν

Smallest achievable distortion between two processes with given


marginals over all joint distributions consistent with marginals.

Many equivalent definitions. E.g., how much have to change one


typical sequence of one source to get a typical sequence of another.

Source Coding and Simulation 19


Historical aside

d̄N rediscovered and renamed numerous times.

Kantorovich (1942): metrics on compact metric spaces. Often


called the Kantorovich or transportation metric. Inseparable from
development of linear programming.

Early focus on scalar case and `r norms: Dall’Aglio (1956), Frechet


(1956), Vasershtein/Wasserstein (1969), Mallows (1972), Vallender
(1973).

Ornstein (1970-73) used the idea with the Hamming distance on


vectors and processes. Called the d̄ distance. First appearance as
distance measure on processes.

Source Coding and Simulation 20


Gray, Neuhoff, and Shields (1975) considered vector and process
case using additive distortion measures, including d1(x, y) = |x − y|2,
calling the distortion ρ̄ after Ornstein. Vector case equivalent to
subsequent development of Kantorovich for `r norms to vectors (Lr -
minimal metric):

∆ h i 1r
ρ̄1/r (µN , νN ) = `r (µN , νN ) = N d̄N (µN , νN )
h i 1r
= inf E(||X N − Y N ||rr )
pN ⇒µN ,νN

Usually reserve notation d̄ for Ornstein (Hamming), use ρ̄ for `rr

Rediscovered as “earth mover’s distance” in CS literature, used in


clustering algorithms for pattern recognition. Later renamed (1981)
“Mallows distance” after 1972 rediscovery of scalar Kantorovich.

Source Coding and Simulation 21


Properties

• Ornstein d̄ distance and Lr -minimal distance/ρ̄1/r are metrics.

• Infimum is actually a minimum.

• The class of all B-processes of a given alphabet is the closure under


Ornstein’s d̄ of all k-step mixing Markov processes of that alphabet.

• Entropy rate is continuous in d̄, Shannon DRF in ρ̄

• Can evaluate ρ̄ for iid, purely nondeterministic Gaussian processes,


filtered uniform iid, d̄ for discrete iid. In general a linear
programming problem.

Source Coding and Simulation 22


Application 1: Geometric view of source coding

δBC(R) = δSBC(R) = DX (R) = inf ρ̄(µ, ν)


ν:H(ν)≤R
[Gray, Neuhoff, and Omura (1974)]

A form of simulation, but cannot say ν generated from iid.

Distance to “closest” process in ρ̄ with entropy rate ≤ R

Compare with process version of Shannon DRF [Marton (1972)]:

DX (R) = inf E[d1(X0, Y0)]


p:p⇒µ,I(X,Y)≤R

Source Coding and Simulation 23


Application 2: Quantization as distribution
approximation

[Pollard (1982), Graf and Luschgy (2000)]

(Vector) Quantizer ⇔ probability distribution on codebook

Block coding/quantization: fixed rate

δ(N)
BC (R) = inf ρ̄ N (µ ,ν )
N N
νN

Minimum is over all discrete distributions νN with 2NR atoms.

Suppose discrete distribution (π, CN ) = {πi, yi; i = 1, . . . , 2NR},


P2NR
i=1 π i = 1, yi ∈ A N
X , solves minimization ⇒
a discrete simulation of X N ⇒ block independent, N -stationary
process simulation

Source Coding and Simulation 24


Application 3: Optimal simulation and source coding

A definition of optimal simulation of process X ∼ µ using an SBC of


an iid process Z [Gray (1977)]:

∆(X|Z) = inf ρ̄(µ, µ̃)


µ̃:Zn- g - X̃n ∼ µ̃

Sliding-block coding reduces entropy ⇒ H(Z) ≥ H(µ̃) ⇒

∆(X|Z) ≥ inf ρ̄(µ, µ̂) = DX (H(Z))


stationary ergodic µ̂:H(µ̂)≤H(Z)

Source Coding and Simulation 25


If X is a B-process, converse is true and

∆(X|Z) = DX (H(Z)) = δSBC(H(Z))


⇒ the source coding and simulation problems are
equivalent if the source is a B-process

Proof: Choose f, g: Ed1(X0, X̂0) ≈ DX (1),


iid Zn, H(Z) = 1 ≥ H(U)
?

α Sinai theorem

Xn -
f ? -
g -
X̂n ∼ µ̂
Un

Cascade β = gα is SBC producing X̂ from Z , Ed1(X0, X̂0) ≥ ∆(X|Z). 

Source Coding and Simulation 26


Bit behavior for near optimal codes

Suppose use block code CN to code source. Let π denote induced


index pmf.

What can be said about π if code performance near Shannon


optimal?

Approximately uniform, like 2N coin flips?

Sort of . . .

Shannon ⇒ there is an asymptotically optimal sequence of block


codes C(N) for which DN = EdN (X N , X̂ N ) ↓ DX (1)
Source Coding and Simulation 27
RX (D) is a continuous function, hence

1 = N −1 log2 2N ≥ N −1 H(E(X N )) ≥ N −1 H(X̂ N )


≥ N −1 I(X N ; X̂ N ) ≥ RN (DN ) ≥ RX (DN ) → 1
N→∞

As blocklength grows, indexes have maximal per symbol entropy and


hence can be thought of as approximately uniformly distributed, but
not stationary or ergodic and can not get process theorem — does
not determine entropy rate or show that overall process behavior is
like coin flips, even if stationarize.

If use SBCs, can get rigorous process version:

Source Coding and Simulation 28


Choose f (N), g(N) so that DN = D( f (N), g(N)) ↓ DX (1)

Let U (N), X̂ (N) denote encoded and reproduction processes


(necessarily stationary and ergodic)

1 ≥ H(U (N)) ≥ H(X̂ (N)) ≥ I(X, X̂ (N))


≥ R(DN ) → 1
N→∞

lim H(U (N)) = 1 ⇒ lim d̄(U (N), Z) = 0


N→∞ N→∞

Proof: Marton’s inequality for relative entropy and d̄ (T. Linder)

As average distortion nears Shannon limit for stationary ergodic


source, binary channel process approaches coin flips in d̄

Source Coding and Simulation 29


Recap

Old: If source is a stationary filtering of an iid process (a B-process,


discrete or continuous alphabet), then the source coding problem
and the simulation problem have the same solution (and the optimal
simulator and decoder are equivalent).

New: If stationary source coding performs close to Shannon


optimum, encoded process is close to iid in d̄

Frosting: An excuse to present ideas of modeling, coding,


and process distance measures common to ergodic theory and
information theory.

Source Coding and Simulation 30


A few final thoughts and questions

• The d̄ close to iid property is nice for intuition, but does it actually
help?

E.g., B-processes (SBC of iid process) have many special properties.


Are there weak versions of those properties for processes that = SBC
of a process d̄-close to iid?

• Does equivalence of source coding and simulation hold for


the more general case of stationary and ergodic sources? —
Steinberg/Verdu results hold more generally, but in ergodic theory
it is known that there are stationary, ergodic, mixing, purely
nondeterministic processes which are not d̄ close to a B-process.

Source Coding and Simulation 31


• Source coding as “almost isomorphism,” avoids hard part
(invertibility).

• How does fitting a model using ρ̄ compare to the Itakura-Saito


distortion used in speech processing to fit autoregressive speech
models to real speech? Can Marton/Talagrand inequalities be
extended? (Steinberg/Verdu considered relative entropy rates in their
simulation problem formulation.)

• Shortcoming of B-processes: In speech, only model unvoiced


sounds well. Voiced sounds better modeled by periodic input to same
filter type: 0-entropy. Composite models? Connections to Pinsker’s
A
( disproved) conjecture regarding products of K processes and 0-
entropy processes?

• Simulator design, e.g., best fake Gaussian from bits?



Source Coding and Simulation 32

You might also like