Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Signal Processing, Representation,

Modeling, and Analysis

Alfred M. Bruckstein & Ron Kimmel

Geometric Image Processing Lab.


CS Dept. Technion

Winter 2019-2020

1 / 102
Chapter 2
Representations with choices

34 / 102
Representative numbers
How to find more than one “representative” number for a group of
objects and to use such sets of representatives.
Consider the list of ordered numbers

x 1  x 2  · · ·  xN xi 2 R. (37)

How should we choose two or more (say k ⌧ N) numbers to


represent them?
Denote the describing ✓’s by ✓1 , ✓2 , . . . , ✓k . We could order them
as ✓1 < ✓2 < · · · < ✓k .
In this case we map each xi to a corresponding ✓j ,

x 1 x2 . . . xr 1 xr1 +1 . . . xr2 xr k 1 +1
. . . xN
l l l l l l l (38)
✓1 ✓1 . . . ✓ 1 ✓2 . . . ✓2 ✓k . . . ✓k .

35 / 102
Note that we mapped the nondecreasing sequence xi ’s to a strictly
increasing sequence ✓r by selecting r1 , r2 , . . . , rk 1 breakpoints.
To determine ✓1 , ✓2 , . . . , ✓k we need the rj ’s, while in order to
determine the rj ’s we need the data and the optimization criterion.
Consider the MSE loss function. Then, for a sequence {rj } and
numbers {✓i } we can write
0
r1 rl+1
1 @X X
MSE ({rj }, {✓i }) = (xi ✓1 )2 + · · · + (xi ✓l+1 )2 +
N
i=1 i=rl +1
1
XrN
··· + (xi ✓k )2 A
i=rk 1 +1
rl+1
k 1 X
X
= (xi ✓l+1 )2 , (39)
l=0 i=rl +1

where r0 ⌘ 0 and rk = N.

36 / 102
Given the data x1 , x2 , . . . , xN we want to determine {rj } and
numbers {✓i }’s to minimize the MSE loss function.
Fixing rl and rl+1 , we have that
rl+1
opt 1 X
✓l+1 = xi , (40)
rl+1 rl
i=rl +1

which is the average of xi ’s over the indices interval [rl + 1, rl+1 ].

37 / 102
It is difficult to optimize for the breakpoints as integers. Hence,
consider the real line with numbers x1 , x2 , . . . , xN as the range in
which a random variable Xw takes values, and assume its
realizations are xi with probability 1/N for each i.
The probability distribution function is
N
X 1
p(x) = (x xi ),
N
i=1

where (·) is the impulse or Dirac function with properties

Z (x) = 0 8x 6= 0
+1
(x)dx = 1.
1

38 / 102
We clearly have
Z N
X N
+1
1 1 X
E {Xw } = xp(x)dx = xi = xi . (41)
1 N N
i=1 i=1

Next, we define a way to represent the realization of the random


variable Xw by a set of k representation values ✓1 , ✓2 , . . . , ✓k as
follows:
Select threshold levels r1 , r2 , . . . , rk 1 and define a mapping
function Q(x) such that Q : R ! {✓1 , ✓2 , . . . , ✓k }, (where
✓ 1 < ✓2 < · · · < ✓k )
8
< ✓1 if x  r1
Q(x) = ✓ if ri 1 < x  ri i = 2, . . . , k 1 (42)
: i
✓k if rk 1 < x.

This function is a k-level representation or quantization-function.

39 / 102
We restate the problem of representing x1 , x2 , . . . , xN as one of
representing the random variable Xw with k values ✓1 , ✓2 , . . . , ✓k
“optimally” - by a proper choice of r1 , r2 , . . . , rk 1 and
✓ 1 , ✓2 , . . . , ✓ k .
The optimality criterion will be the (expected w.r.t. the
distribution p(x)) mean squared error,
Z +1
ESE = (x Q(x))2 p(x)dx
1
X Z ri
k
= (x ✓i )2 p(x)dx, (43)
i=1 ri 1

where we define r0 = 1 and rk = +1, the end points of the


range of Xw . We mapped our original discrete problem into one
with “continuous” variables ✓1 , ✓2 , . . . , ✓k and r1 , r2 , . . . , rk 1 .

40 / 102
Note that if we fix the values r1 , r2 , . . . , rk 1 , then the optimal
choices for ✓1 , ✓2 , . . . , ✓k are completely determined, since clearly
Z ri Z ri
min (x ✓)2 p(x)dx = min (x 2 2✓x + ✓2 )p(x)dx
✓ ri 1
✓ ri 1
(44)

Z ri
1
) ✓iopt = R ri xp(x)dx
ri 1 p(x)dx ri 1
1 X 1
= xj .
(num of xj falling in the range (ri 1 , ri )) N1 ri
N
1 <xj <ri

41 / 102
Important observation.
If we set the representation values ✓1 < ✓2 < · · · < ✓k , and we
have a value x 2 R as a realization of Xw , for minimizing E MSE
we must assign to x the closest representation value. That is, we
must ask which ✓i is the closest in |x ✓i | sense.
Thus, the decision levels r1 , r2 , . . . , rk 1 are determined by the
representation levels ✓1 , ✓2 , . . . , ✓k to be
✓i + ✓i+1
ri = i = 1, 2, . . . , k 1. (45)
2
These are the midpoints of the intervals between the
representation levels, regardless of the distribution of the x’s!

42 / 102
Therefore, if we know the thresholds r1 , r2 , . . . , rk 1 , we can
determine the representation levels, and if we know the
representation levels ✓1 , ✓2 , . . . , ✓k we can determine the thresholds.
Yet, we have to determine both, jointly optimizing for the function
Q(x) w.r.t. the criterion ESE ,
k Z
X ri
ESE (r1 , . . . , rk 1 , ✓1 , . . . , ✓k ) = (x ✓i )2 p(x)dx. (46)
i=1 ri 1

To minimize this function w.r.t. the 2k 1 variables we write the


necessary conditions for a minimum (in fact a stationary point!).
@ ESE @ ESE
=0 and = 0, (47)
@✓i @ri
that yields
Z ri
(x ✓i )p(x)dx = 0 i = 1, 2, . . . , k
ri 1
2
(ri ✓i ) p(ri ) (ri ✓i+1 )2 p(ri ) = 0 i = 1, 2, . . . , k 1.(48)
43 / 102
Assuming (for the moment) p(ri ) 6= 0, we get
Z ri
1
✓i = R r i xp(x)dx
ri 1 p(x)dx ri 1
✓i + ✓i+1
ri = , (49)
2
the conditions we have seen before that are necessary for
optimality.

44 / 102
Suppose we get the solution of the optimization problem. What
would the expected mean squared error be?
k Z
X ri
ESE (opt) = (x 2 2✓i x + ✓i2 )p(x)dx
i=1 ri 1
Z rk =+1 k
X Z ri k
X Z ri
2
= x p(x)dx 2 ✓i xp(x)dx + ✓i2 p(x)dx
r0 = 1 i=1 ri 1 i=1 ri 1
Z +1 k
X Z ri
2
= x p(x)dx ✓i2 p(x)dx
1 i=1 ri 1
Z k Z !2
+1 X 1 ri
= x 2 p(x)dx R ri xp(x)dx
1 i=1 ri 1
p(x)dx ri 1
Z +1 Xk
= x 2 p(x)dx ✓i2 Pr {ri 1  x < ri }. (50)
1 i=1
R ri R ri
where we used ri 1
xp(x)dx = ✓i ri 1
p(x)dx

45 / 102
How should we determine the optimal representation levels
✓1 , ✓2 , . . . , ✓k and corresponding decision levels r1 , r2 , . . . , rk 1?
A awful–wonderful idea.
I Start with an “arbitrary” decision levels {r10 , r20 , . . . , rk0 1 }, say
uniformly spaced between x1 and xN .
I Compute the corresponding optimal representations over the
regions defined by {r10 , r20 , . . . , rk0 1 }, and obtain
{✓10 , ✓20 , . . . , ✓k0 }.
I Next, with this representation levels we can compute new
decision levels
✓i0 + ✓i+1
0
ri1 = i = 1, . . . , k 1. (51)
2
I Proceed as before (iterate the above steps).
How can we prove this procedure reduces the error?

46 / 102
At each step we select variables with respect to which we optimize.
That is, each step reduces ESE .
The process is called coordinate descent which is also known in
this context as the Lloyd-Max algorithm, and in its more general
setting as the k-means clustering method.
The process decreases the error each step, as
ESE (step t + 1)  ESE (step t) and we can stop iterating when
there is no (or little) improvement assuming we are at a local
minimum.

47 / 102
HW:
Assume that Xw is a random variable uniformly distributed over
[xL , xH ], or [0, 1] w.l.o.g.. The pdf over [0, 1] is

0 x2 / [0, 1]
p(x) = (52)
1 x 2 [0, 1]
Let us start the Lloyd–Max process with arbitrary decision levels
r1 , r2 , . . . , rk 1 2 (0, 1).
For the given pdf p(x) we have
Z ri
1
✓i = xdx
ri ri 1 ri 1
r
1 x2 i 1 ri2 ri2 1 ri + ri 1
= = = (53)
ri ri 1 2 ri 1 2 ri ri 1 2

and
✓i + ✓i+1
ri = . (54)
2
48 / 102
Hence, given {r10 , r20 , . . . , rk0 1 } with r00 = 0, rk0 = 1 we get

0 + r10 0 r10 + r20 rk0 1 + 1


✓10 = , ✓2 = , . . . , ✓k = , (55)
2 2 2
then,
r10 r10 r20
+ +
r11 = 2 2 2
,..., (56)
2
i.e.
r00 r10 r20 r10 r20 r30
r11 = + + , r2 = 1
+ + ,...
4 2 4 4 2 4
ri 1 +ri ri +ri+1
+ 2 ri 1 ri ri+1
rinext = 2
= + + . (57)
2 4 2 4
Write the iterations in matrix from. Analyze the eigenstructure of
the operator matrix M, in order to prove that M n where n ! 1
leads to threshold levels ri = i/k.
See Row straightening via local interactions by Wagner and Bruckstein, CSSP 1997.
49 / 102
Hence, for uniformly distributed Xw the optimal decision levels to
which the Lloyd-Max algorithm converges are uniformly spread
over [0, 1]. That is, ri = i/k for k = 0, 1, . . . , k.
r +r
Here, for ✓i = i 12 i and Pr {ri 1  x  ri } = k1 , the expected
mean squared error is
Z xH k ✓
X ◆2
1 ri 1 + ri
ESE (opt) = x2 dx Pr {ri 1  x  ri }
xL xH xL 2
i=1
Xk ✓ ◆2
1 xH3 xL3 1 i 1+i 2
= (xH xL )
xH xL 3 k 2k
i=1
= ... (58)

workout the rest of the expression at home (note that there are
inconsistencies/inaccuracies in the handwritten notes!)

50 / 102
Note, that for the specific example of uniform distribution we could
search for a solution of 2k 1 equations with 2k 1 unknowns,
✓1 , . . . , ✓k , r1 , . . . , rk 1 , with fixed r0 = 0 and rk = 1,
1 1
✓1 = r0 + r1
2 2
1 1
✓2 = r1 + r2
2 2
..
.
1 1
✓k = rk 1 + rk
2 2
1 1
r1 = ✓1 + ✓2
2 2
..
.
1 1
rk 1 = ✓k 1 + ✓k , (59)
2 2
1
j
that is solved uniquely by ri = i/k and ✓j = k
2
.

51 / 102
The “best” mapping function Q(·) replaces a random variable Xw
with another, defined by

✓w ⌘ Q(Xw ) 2 {✓1 , ✓2 , . . . , ✓k }, (60)

that takes values in a finite set of representation levels. The error


random variable is

Ew ⌘ Xw ✓w , (61)

and we have that

E (Ew2 ) = ESE (opt)


Z k
X
= x 2 p(x)dx ✓i2 Pr {✓w = ✓i }
i=1
⌘ E (Xw2 ) 2
E (✓w ). (62)

52 / 102
Hence,

E (Ew2 ) ⌘ E ((Xw ✓w )2 )
= E (Xw2 2Xw ✓w + ✓w
2
)
2 2
= E (Xw ) E (✓w ), (63)

yielding

E (Xw2 ) 2
2E (Xw ✓w ) + E (✓w ) = E (Xw2 ) 2
E (✓w ), (64)

or
2
2E (Xw ✓w ) 2E (✓w ) = 0 (65)

or

E (Xw ✓w )✓w = 0. (66)

53 / 102
This shows that while ✓w clearly dependent on Xw (in fact,
✓w ⌘ Q(Xw )) we see that the error Ew is “orthogonal” to ✓w in
the sense that Ew (Ew ✓w ) = 0.
We just proved that ✓w is the orthogonal “projection” of the
random variable Xw on the space of random variable defined by
Qr0 ,r1 ,...,rk 1 ,rk ,✓1 ,✓2 ,...,✓k (x|x drown according to Xw ).

Fig. 2.25 - The error Ew is orthogonal to ✓w .

54 / 102
Approximately optimal high k-quantization
Suppose that the number of quantization levels is large, and the
range of Xw is finite in [xL , xH ], and we are given p(x) over
[xL , xH ]. If k is large, we have small intervals defined between the
decision levels r0 = xL , r1 , r2 , . . . , rk 1 , rk = xH . Then, the expcted
squared error for Q(·) designed with these levels and the
representation levels ✓1 , ✓2 , . . . , ✓k is given by
Z xH
ESE (Q) ⌘ (x Q(x))2 p(x)dx
xl
k Z ri
X
= (x ✓i )2 p(x)dx
i=1 ri 1
Xk Z ri ✓ ◆
ri 1 + ri
⇡ (x ✓i ) 2 p dx
ri 1
2
i=1
Xk ✓ ◆ ri
ri 1 + ri (x ✓i ) 3
= p
2 3
i=1 ri 1
k ✓ ◆
1X ri 1 + ri
= p (ri ✓i ) 3 (ri 1 ✓i )3 . (67)
3 2
i=1
55 / 102
In this case we can determine the optimal ✓i ’s and ri ’s as follows,
✓ ◆
@ ESE 1 ri 1 + ri
=0 ) p 3(ri ✓i )2 + 3(ri 1 ✓i )2
@✓i 3 ✓ 2 ◆
ri 1 + ri
=p (ri 1 + ri 2✓i )(ri 1 ri ) = 0.
2

This yields
ri 1+ ri
✓i⇤ ⌘ ✓iopt = . (68)
2

56 / 102
Next, setting ✓i⇤ in the expression for the expected squared error,
yields ESE (✓1⇤ , ✓2⇤ , . . . , ✓k⇤ ) =

Xk ✓ ◆✓ ◆
1 ri 1 + ri ri 1+ ri 3 ri 1+ ri 3
= p (ri ) (ri 1 )
3 2 2 2
i=1
Xk ✓ ◆✓ ◆
1 ri 1 + ri ri ri 1 3 ri 1 ri
= p ( ) ( )3
3 2 2 2
i=1
Xk ✓ ◆✓ ◆
1 ri 1 + ri 2(ri ri 1 )3
= p
3 2 8
i=1
X k ✓ ◆
1 ri 1 + ri 3
= p (ri ri 1) . (69)
12 2
i=1

where r0 ⌘ xL and rk = xH .

57 / 102
It yields ESE (r0 = xL , r 1 , r 2 , . . . , r k 1 , rk = xH ) =
k ✓ ✓ ◆ ◆3
1 X 1/3 ri 1 + ri
= p (ri ri 1)
12 2
i=1
Xk
1
= µ3i , (70)
12
i=1

where
✓ ◆
ri 1 + ri
µi ⌘ p 1/3 (ri ri 1 ). (71)
2

Note, that we have


k
X k
X ✓ ◆
ri 1 + ri
µi = p 1/3 (ri ri 1)
2
i=1 Zi=1xH
⇡ (p(x))1/3 dx = a constant. (72)
xL

58 / 102
We are led to determine the decision levels {ri } via µi by
minimizing
k
1 X 3
ESE = µi
12
i=1
k
X
subject to µi = a constant, say M. (73)
i=1
Using Lagrange multipliers, we obtain
k k
!
1 X X
argµi min µ3i + (M µi ) (74)
12
i=1 i=1
which is solved by
k k
!
@ 1 X X
µ3i + (M µi ) = 0, (75)
@µi 12
i=1 i=1
yielding
1 2 p
µ = 0 ) µi = 2 . (76)
4 i
59 / 102
To satisfy the constraint for all i’s we need
k
X p
µi = M or 2 k = M, (77)
i=1
p
which yields 2 = M/k.
From here the optimal µi ’s are

µ⇤i = M/k = constant . (78)

We therefore obtained the optimality conditions for the decision


levels as
✓ ◆ R xH 1/3
1/3 ri 1 + ri xL p (x)dx
p (ri ri 1 ) = . (79)
2 k

60 / 102
Define a function
Z r
(r ) ⌘ p 1/3 (x)dx, (80)
xL

and consider the r ’s for which


Z
i xH 1/3 i
(ri ) = p (x)dx = (xH ). (81)
k xL k

We have clearly,
Z ri Z xH
1
1/3 i 1
(ri 1) = p (x)dx = p 1/3 (x)dx (82)
xL k xL

and
Z ri Z xH
i
(ri ) = p 1/3 (x)dx = p 1/3 (x)dx. (83)
xL k xL

61 / 102
Therefore,
Z xH
1
(ri ) (ri 1) = p 1/3 (x)dx. (84)
k xL
But
Z ri
(ri ) (ri 1) ⌘ p 1/3 (x)dx
ri 1 ✓ ◆
ri 1 + ri
⇡ p 1/3 (ri ri 1 ). (85)
2
We see that the decision levels r1 , . . . , rk 1 are Rthe places where
x
the function (r ) crosses the levels iM/k or i xLH p 1/3 (x)dx/k for
i = 1, . . . , k 1.

Fig. 2.33 - (r ) with M and iM/k corresponding to ri . 62 / 102


The “approximately optimal” quantizer is

Xw =x v = (x)2[0,M] i
x ! (x) ! Uniform k-level Q( ) !

ri + ri 1
for i the output is ✓i⇤ = ! from [0, M] to numbers {1, . . . , k}
2
This quantizer achieves the (approximately) optimal mean squared
error
k
1 X ⇤ 3
ESE (opt) = (µi )
12
i=1 R xH 1/3 3
1 M3 1 xL p (x)dx)
(
= k 3 =
12 k 12 k2
1 M3 constant
= = . (86)
12 k 2 k2

63 / 102
Suppose that the number of quantization level is high and the
range of values for a random variable Xw is [xL , xH ] and we have
p(x) over the range.
Assume that we simply divide the range [xL , xH ] into k intervals
given by
xH x L xH xL
xi = [x L + (i 1) , x L + i ] for i = 1, 2, . . . , k.(87)
k k
It is clear that with this division the least mean squared error will
be
Z xH
u u
ESE (Q ) = (x Q u (x))2 p(x)dx
xL
k
X Z ri
ri 1 + ri
⇡ p( ) (x ✓i ) 2
2 ri 1
i=1
k
1 X ri 1 + ri
= p( ) (ri ✓i ) 3 (ri 1 ✓i ) 3 (88)
3 2
i=1

for which the best ✓i ’s are midpoints, as before


xH xL 1 x H xL
✓iopt = xL + (i 1) + . (89)
k 2 k
64 / 102
Setting these values for ✓iopt in the expression for the expected
squared error,
u
ESE ⌘ E (Xw Q u (Xw ))2
k
1 X (xH xL ) 3
= p(midpoint of i)
12 k3
i=1
k
1 (xH xL )2 X
= p(midpoint of i )| i|
12 k2
2 Zi=1xH
1 (xH xL )
⇡ p(x)dx
12 k2 xL
1 (xH xL ) 2
= . (90)
12 k2
Note that optimizing the decision levels with respect to a
nonuniform quantizer adapted to p(x) we obtained (for high k)
✓Z x H ◆3
1 1 1/3
ESE = p (x)dx . (91)
12 k 2 xL

65 / 102
This result raises the question: are we indeed doing better for the
non-uniform “adapted” quantizer? That is, do we have
opt u uniform
ESE (Q ) ESE (Q ) (92)

In other words, do we have


✓Z x H ◆3
1 1 1/3 1 (xH xL )2
p (x)dx  (93)
12 k 2 xL 12 k2

The answer is that indeed we do better. Due to Hölder’s


inequality, by which, for two functions f , g , we have
Z ✓Z ◆1/p ✓Z ◆1/q
|fg |dx  |f |p |g |q , (94)
S S S

1 1 1 1 p 1
for p, q 2 [1, 1), satisfying p + q = 1, that is, q =1 p = p ,
or q = p/(p 1).
p q
By Hölder’s inequality |fg |1  |f |p |g |q for p1 + q1 = 1. The proof is based on Young’s inequality ab  ap + bq .

66 / 102
We take f = p 1/3 (x) and g = 1 for x 2 [xL , xH ], and obtain
Z xH ✓Z ◆1/3 ✓Z ◆2/3
p 1/3 (x) · 1dx  |p 1/3 (x)|3 dx 13/2 dx (95)
,
xL

that yields
1
✓Z ◆3 z✓Z }| ◆{
xH
p 1/3 (x)dx  p(x)dx (xH xL ) 2 (96)
xL

or
✓Z xH ◆3
p 1/3 (x)dx  (xH xL ) 2 , (97)
xL

as expected.

67 / 102
Equality happens only if there are real numbers ↵, > 0,
Z xH
↵ ⌘ |g |q dx = (xh xL ) (q = 3/2)
ZxLxH Z xH
⌘ |f |p dx = p(x)dx = 1 (p = 3) (98)
xL xL

so that

(xH xL )p(x) = 1 almost everywhere, (99)

that is, p(x) = xH 1 xL is a uniform distribution.


In other words - we are always doing better than a uniform
quantizer, expect for the case when p(x) is uniform, for which we
know that the uniform quantizer is the best!

68 / 102
More of the same

Next, in light gray, is a repetition of the last part of the lecture.


Just to make things a little bit unclear ;-)

69 / 102
Quantization
Our quantization function replaces one random variable with
another, ✓! ⌘ Q(X! ) 2 {✓1 , ✓2 , . . . , ✓k }. The error is give by
E! ⌘ X! ✓! , such that

E (E!2 ) = ESE (optimal)


Z k
X
= x 2 p(x)dx ✓i2 Pr {✓! = ✓i }
i=1
⌘ E (X!2 ) 2
E (✓ ), (100)

yielding

E (X!2 ) 2E (X!2 ✓! ) + E (✓!2 ) = E (X!2 ) E (✓!2 ) (101)

or

2E (X!2 ✓! ) 2E (✓!2 ) = 0 (102)

70 / 102
i.e.

E {(X! ✓! )✓! } = 0
E (E! ✓! ) = 0. (103)

In other words E! ? ✓! .

71 / 102
Approximately optimal quantization (large k)
We divide the interval [xL , xH ] into intervals s.t.
r0 = xL , r1 , . . . , rk = xH that correspond to representation levels
✓1 , . . . , ✓k . Then,
Z xH
ESE = (x Q(x))2 p(x)dx
xL
k Z ri
X
= (x Q(x))2 p(x)dx
i=1 ri 1
Xk Z ri
ri 1 + ri
⇡ (x Q(x))2 p( )dx
2
i=1 ri 1

Xk ri
ri 1+ ri (x ✓i )3
= p( )
2 3 ri
i=1 1
k
1 X ri 1 + ri
= p( ) (ri ✓i ) 3 (ri 1 ✓i ) 3 .
3 2
i=1

72 / 102
d
Setting ESE
d✓i = 0, we solve
1 ri 1 + ri
p( ) ⇥ 3(ri ✓i )2 + 3(ri 1 ✓i )2
3 2
ri 1 + ri
= p( )(ri 1 + ri 2✓i )(ri 1 ri ) = 0.
2
r +r
That yields (as before) ✓i⇤ = i 12 i . In which case
1 X ri 1 + ri ⇣ ⌘
k
⇤ ⇤ ⇤ ri 1+ ri 3 ri 1 ri 3
ESE (✓1 , ✓2 , . . . , ✓k ) = p( ) (ri ) (ri 1 )
3 i=1 2 2 2
1 X ri 1 + ri ⇣ ri + ri 3 ⌘
k
ri 1 3 ri 1
= p( ) ( ) ( )
3 i=1 2 2 2
k ✓ ◆
1 X ri 1 + ri 2(ri ri 1)
3
= p( )
3 i=1 2 8
k
1 X ri 1 + ri 3
= p( )(ri ri 1)
12 i=1 2
1 X ⇣ 1/3 ri 1 + ri ⌘3
k
= p ( )(ri ri 1)
12 i=1 2
k
1 X 3
= µi
12 i=1
73 / 102
ri 1 +ri
with r0 ⌘ xL and rk ⌘ xH , and µi ⌘ p 1/3 ( 2 )(ri ri 1 ).
But
k
X ri 1 + ri
µi = p 1/3 ( )(ri ri 1 )
2
i=1 Z xH
⇡ p(x)1/3 dx = M a computable constant.
xL

We could compute the optimal {ri }’s by minimizing


k k
1 X 3 X
ESE = µi s.t. µi = M. (104)
12
i=1 i=1

Using Lagrange multipliers


k k
!
1 X 3 X
arg{µi } min µi + (M µi ) (105)
12
i=1 i=1

74 / 102
whose derivative
k k
!
d 1 X 3 X
µi + (M µi ) =0
dµi 12
i=1 i=1

1 2
p
leads to4 µi = 0, thus µi = 2 , 8i.
The constraint is,
k
X p p
µi = M ) 2k =M)2 = M/k.
i=1

Thus optimal µi are µ⇤i = M/k, the same constant for all i.
The optimality condition for the decision levels is
R xH 1/3 dx
1/3 ri 1 + ri xL (p(x))
p ( )(ri ri 1 ) = . (106)
2 k

75 / 102
Rr
Define the function (r ) = xL p 1/3 (x)dx and find r s.t.
Z xH
i i
(ri ) = p 1/3 (x)dx = (xH ).
k xL k
R ri i 1
R xH
Clearly, (ri 1 ) ⌘ xL 1
p 1/3 (x)dx = xL p
k
1/3 (x)dx, and
R ri 1/3 i
R xH 1/3
(ri ) ⌘ xL p (x)dx = k xL p (x)dx.
Thus, Z
1 xH 1/3
(ri ) (ri 1 ) = p (x)dx,
k xL
but
Z ri
ri + ri 1
(ri ) (ri 1) ⌘ p 1/3 (x)dx ⇡ p 1/3 ( )(ri ri 1 ).
ri 1
2

76 / 102
The decision levels {r . , rk 1 } are those at which (r ) crosses
R 1xH, . . 1/3
the value iM/k or i xL p (x)dx/k for i = 1, . . . , k 1
Graph of (r ) with uniform quantization along the y axis.

Graph with optimal quantizer. X! = x as input, v = (x), and


uniform quantization of v by Q(v ) into k levels. Then,
r +r
✓i⇤ = i 2i 1 .

77 / 102
k
1 X ⇤ 3 1 M 3 1 M3
ESE (optimal) ⌘ (µi ) = k( ) =
12 12 k 12 k 2
i=1
R xH 1/3 3
1 ( xL p (x)dx)
= ,
12 k2
where, given p(x), M 3 is a constant.

78 / 102
Uniform Quantization for Large k
Same conditions as before but now assume we uniformly sample
[xL , xH ] into
xH xL xH xL
xi = [xL + (i 1) , xL + i ] i = 1, . . . , k.
k k

Z xH
u u
ESE (Q ) = (x Q u (x))2 p(x)dx
xL
k
X Z ri
ri 1 + ri
⇡ px ( ) (x ✓i )2 dx
2 ri 1
i=1
k
1X ri 1 + ri
= px ( ) (ri ✓i ) 3 (ri 1 ✓i ) 3 .
3 2
i=1

and as before ✓iopt = xL + (i 1) xH k xL + 1 xH xL


2 k .

79 / 102
The estimated ESE for that ✓opt is
u
ESE ⌘ E (X! Q u (X! ))2
k
1 X (xH xL ) 3
⇡ px (midpoints of i)
12 k3
i=1
k
1 (xH xL )2 X
= px (midpoints of i )| i |
12 k2
2 Zi=1xH
1 (xH xL ) 1 (xH xL )2
⇡ px (x)dx = .
12 k2 xL 12 k 2

With the previous nonuniform quantizer we obtained


R xH 1/3 3
opt 1 xL p(x) dx)
(
ESE (Q )= .
12 k2
Can we prove that opt )  u (Q u )?
ESE (QR ESE
xH
( p(x) dx)3
1/3
1 xL 1 (xH xL )2
We need to prove that 12 k2
 12 k 2 .
80 / 102
Hölder’s Inequality

Proven by Rogers in 1888 and a year later by Hölder (based on


Young’s inequality). For two functions f (x) and g (x), we have
Z ✓Z ◆1/p ✓Z ◆1/q
|f (x)g (x)|dx  |f (x)|p dx |g (x)|q dx
S S S

for p, q 2 [1, 1) and p1 + q1 = 1, or q = p p 1 .


Choose f (x) = p 1/3 (x) and g (x) = 1, defined for x 2 [xL , xH ], and
obtain
Z xH ✓Z x H ◆1/3 ✓Z xH ◆2/3
p(x)1/3 dx  |p 1/3 (x)|3 dx 13/2 dx
xL xL xL

81 / 102
R xH
We proved that ( xL p(x)1/3 dx)3  (xH xL )2 as expected.
Equality is obtained i↵ p(x) = 1/(xH xL ) almost everywhere. I.e.
uniform distribution for which uniform quantization is the best.
Since then,
Z xH Z xH
1
( p(x)1/3 dx)3 = ( ( )1/3 dx)3 = (xH xL )2 .
xL xL xH x L

82 / 102

You might also like