Professional Documents
Culture Documents
Signal Processing, Representation, Modeling, and Analysis: Alfred M. Bruckstein & Ron Kimmel
Signal Processing, Representation, Modeling, and Analysis: Alfred M. Bruckstein & Ron Kimmel
Winter 2019-2020
1 / 102
Chapter 2
Representations with choices
34 / 102
Representative numbers
How to find more than one “representative” number for a group of
objects and to use such sets of representatives.
Consider the list of ordered numbers
x 1 x 2 · · · xN xi 2 R. (37)
x 1 x2 . . . xr 1 xr1 +1 . . . xr2 xr k 1 +1
. . . xN
l l l l l l l (38)
✓1 ✓1 . . . ✓ 1 ✓2 . . . ✓2 ✓k . . . ✓k .
35 / 102
Note that we mapped the nondecreasing sequence xi ’s to a strictly
increasing sequence ✓r by selecting r1 , r2 , . . . , rk 1 breakpoints.
To determine ✓1 , ✓2 , . . . , ✓k we need the rj ’s, while in order to
determine the rj ’s we need the data and the optimization criterion.
Consider the MSE loss function. Then, for a sequence {rj } and
numbers {✓i } we can write
0
r1 rl+1
1 @X X
MSE ({rj }, {✓i }) = (xi ✓1 )2 + · · · + (xi ✓l+1 )2 +
N
i=1 i=rl +1
1
XrN
··· + (xi ✓k )2 A
i=rk 1 +1
rl+1
k 1 X
X
= (xi ✓l+1 )2 , (39)
l=0 i=rl +1
where r0 ⌘ 0 and rk = N.
36 / 102
Given the data x1 , x2 , . . . , xN we want to determine {rj } and
numbers {✓i }’s to minimize the MSE loss function.
Fixing rl and rl+1 , we have that
rl+1
opt 1 X
✓l+1 = xi , (40)
rl+1 rl
i=rl +1
37 / 102
It is difficult to optimize for the breakpoints as integers. Hence,
consider the real line with numbers x1 , x2 , . . . , xN as the range in
which a random variable Xw takes values, and assume its
realizations are xi with probability 1/N for each i.
The probability distribution function is
N
X 1
p(x) = (x xi ),
N
i=1
Z (x) = 0 8x 6= 0
+1
(x)dx = 1.
1
38 / 102
We clearly have
Z N
X N
+1
1 1 X
E {Xw } = xp(x)dx = xi = xi . (41)
1 N N
i=1 i=1
39 / 102
We restate the problem of representing x1 , x2 , . . . , xN as one of
representing the random variable Xw with k values ✓1 , ✓2 , . . . , ✓k
“optimally” - by a proper choice of r1 , r2 , . . . , rk 1 and
✓ 1 , ✓2 , . . . , ✓ k .
The optimality criterion will be the (expected w.r.t. the
distribution p(x)) mean squared error,
Z +1
ESE = (x Q(x))2 p(x)dx
1
X Z ri
k
= (x ✓i )2 p(x)dx, (43)
i=1 ri 1
40 / 102
Note that if we fix the values r1 , r2 , . . . , rk 1 , then the optimal
choices for ✓1 , ✓2 , . . . , ✓k are completely determined, since clearly
Z ri Z ri
min (x ✓)2 p(x)dx = min (x 2 2✓x + ✓2 )p(x)dx
✓ ri 1
✓ ri 1
(44)
Z ri
1
) ✓iopt = R ri xp(x)dx
ri 1 p(x)dx ri 1
1 X 1
= xj .
(num of xj falling in the range (ri 1 , ri )) N1 ri
N
1 <xj <ri
41 / 102
Important observation.
If we set the representation values ✓1 < ✓2 < · · · < ✓k , and we
have a value x 2 R as a realization of Xw , for minimizing E MSE
we must assign to x the closest representation value. That is, we
must ask which ✓i is the closest in |x ✓i | sense.
Thus, the decision levels r1 , r2 , . . . , rk 1 are determined by the
representation levels ✓1 , ✓2 , . . . , ✓k to be
✓i + ✓i+1
ri = i = 1, 2, . . . , k 1. (45)
2
These are the midpoints of the intervals between the
representation levels, regardless of the distribution of the x’s!
42 / 102
Therefore, if we know the thresholds r1 , r2 , . . . , rk 1 , we can
determine the representation levels, and if we know the
representation levels ✓1 , ✓2 , . . . , ✓k we can determine the thresholds.
Yet, we have to determine both, jointly optimizing for the function
Q(x) w.r.t. the criterion ESE ,
k Z
X ri
ESE (r1 , . . . , rk 1 , ✓1 , . . . , ✓k ) = (x ✓i )2 p(x)dx. (46)
i=1 ri 1
44 / 102
Suppose we get the solution of the optimization problem. What
would the expected mean squared error be?
k Z
X ri
ESE (opt) = (x 2 2✓i x + ✓i2 )p(x)dx
i=1 ri 1
Z rk =+1 k
X Z ri k
X Z ri
2
= x p(x)dx 2 ✓i xp(x)dx + ✓i2 p(x)dx
r0 = 1 i=1 ri 1 i=1 ri 1
Z +1 k
X Z ri
2
= x p(x)dx ✓i2 p(x)dx
1 i=1 ri 1
Z k Z !2
+1 X 1 ri
= x 2 p(x)dx R ri xp(x)dx
1 i=1 ri 1
p(x)dx ri 1
Z +1 Xk
= x 2 p(x)dx ✓i2 Pr {ri 1 x < ri }. (50)
1 i=1
R ri R ri
where we used ri 1
xp(x)dx = ✓i ri 1
p(x)dx
45 / 102
How should we determine the optimal representation levels
✓1 , ✓2 , . . . , ✓k and corresponding decision levels r1 , r2 , . . . , rk 1?
A awful–wonderful idea.
I Start with an “arbitrary” decision levels {r10 , r20 , . . . , rk0 1 }, say
uniformly spaced between x1 and xN .
I Compute the corresponding optimal representations over the
regions defined by {r10 , r20 , . . . , rk0 1 }, and obtain
{✓10 , ✓20 , . . . , ✓k0 }.
I Next, with this representation levels we can compute new
decision levels
✓i0 + ✓i+1
0
ri1 = i = 1, . . . , k 1. (51)
2
I Proceed as before (iterate the above steps).
How can we prove this procedure reduces the error?
46 / 102
At each step we select variables with respect to which we optimize.
That is, each step reduces ESE .
The process is called coordinate descent which is also known in
this context as the Lloyd-Max algorithm, and in its more general
setting as the k-means clustering method.
The process decreases the error each step, as
ESE (step t + 1) ESE (step t) and we can stop iterating when
there is no (or little) improvement assuming we are at a local
minimum.
47 / 102
HW:
Assume that Xw is a random variable uniformly distributed over
[xL , xH ], or [0, 1] w.l.o.g.. The pdf over [0, 1] is
⇢
0 x2 / [0, 1]
p(x) = (52)
1 x 2 [0, 1]
Let us start the Lloyd–Max process with arbitrary decision levels
r1 , r2 , . . . , rk 1 2 (0, 1).
For the given pdf p(x) we have
Z ri
1
✓i = xdx
ri ri 1 ri 1
r
1 x2 i 1 ri2 ri2 1 ri + ri 1
= = = (53)
ri ri 1 2 ri 1 2 ri ri 1 2
and
✓i + ✓i+1
ri = . (54)
2
48 / 102
Hence, given {r10 , r20 , . . . , rk0 1 } with r00 = 0, rk0 = 1 we get
workout the rest of the expression at home (note that there are
inconsistencies/inaccuracies in the handwritten notes!)
50 / 102
Note, that for the specific example of uniform distribution we could
search for a solution of 2k 1 equations with 2k 1 unknowns,
✓1 , . . . , ✓k , r1 , . . . , rk 1 , with fixed r0 = 0 and rk = 1,
1 1
✓1 = r0 + r1
2 2
1 1
✓2 = r1 + r2
2 2
..
.
1 1
✓k = rk 1 + rk
2 2
1 1
r1 = ✓1 + ✓2
2 2
..
.
1 1
rk 1 = ✓k 1 + ✓k , (59)
2 2
1
j
that is solved uniquely by ri = i/k and ✓j = k
2
.
51 / 102
The “best” mapping function Q(·) replaces a random variable Xw
with another, defined by
Ew ⌘ Xw ✓w , (61)
52 / 102
Hence,
E (Ew2 ) ⌘ E ((Xw ✓w )2 )
= E (Xw2 2Xw ✓w + ✓w
2
)
2 2
= E (Xw ) E (✓w ), (63)
yielding
E (Xw2 ) 2
2E (Xw ✓w ) + E (✓w ) = E (Xw2 ) 2
E (✓w ), (64)
or
2
2E (Xw ✓w ) 2E (✓w ) = 0 (65)
or
53 / 102
This shows that while ✓w clearly dependent on Xw (in fact,
✓w ⌘ Q(Xw )) we see that the error Ew is “orthogonal” to ✓w in
the sense that Ew (Ew ✓w ) = 0.
We just proved that ✓w is the orthogonal “projection” of the
random variable Xw on the space of random variable defined by
Qr0 ,r1 ,...,rk 1 ,rk ,✓1 ,✓2 ,...,✓k (x|x drown according to Xw ).
54 / 102
Approximately optimal high k-quantization
Suppose that the number of quantization levels is large, and the
range of Xw is finite in [xL , xH ], and we are given p(x) over
[xL , xH ]. If k is large, we have small intervals defined between the
decision levels r0 = xL , r1 , r2 , . . . , rk 1 , rk = xH . Then, the expcted
squared error for Q(·) designed with these levels and the
representation levels ✓1 , ✓2 , . . . , ✓k is given by
Z xH
ESE (Q) ⌘ (x Q(x))2 p(x)dx
xl
k Z ri
X
= (x ✓i )2 p(x)dx
i=1 ri 1
Xk Z ri ✓ ◆
ri 1 + ri
⇡ (x ✓i ) 2 p dx
ri 1
2
i=1
Xk ✓ ◆ ri
ri 1 + ri (x ✓i ) 3
= p
2 3
i=1 ri 1
k ✓ ◆
1X ri 1 + ri
= p (ri ✓i ) 3 (ri 1 ✓i )3 . (67)
3 2
i=1
55 / 102
In this case we can determine the optimal ✓i ’s and ri ’s as follows,
✓ ◆
@ ESE 1 ri 1 + ri
=0 ) p 3(ri ✓i )2 + 3(ri 1 ✓i )2
@✓i 3 ✓ 2 ◆
ri 1 + ri
=p (ri 1 + ri 2✓i )(ri 1 ri ) = 0.
2
This yields
ri 1+ ri
✓i⇤ ⌘ ✓iopt = . (68)
2
56 / 102
Next, setting ✓i⇤ in the expression for the expected squared error,
yields ESE (✓1⇤ , ✓2⇤ , . . . , ✓k⇤ ) =
Xk ✓ ◆✓ ◆
1 ri 1 + ri ri 1+ ri 3 ri 1+ ri 3
= p (ri ) (ri 1 )
3 2 2 2
i=1
Xk ✓ ◆✓ ◆
1 ri 1 + ri ri ri 1 3 ri 1 ri
= p ( ) ( )3
3 2 2 2
i=1
Xk ✓ ◆✓ ◆
1 ri 1 + ri 2(ri ri 1 )3
= p
3 2 8
i=1
X k ✓ ◆
1 ri 1 + ri 3
= p (ri ri 1) . (69)
12 2
i=1
where r0 ⌘ xL and rk = xH .
57 / 102
It yields ESE (r0 = xL , r 1 , r 2 , . . . , r k 1 , rk = xH ) =
k ✓ ✓ ◆ ◆3
1 X 1/3 ri 1 + ri
= p (ri ri 1)
12 2
i=1
Xk
1
= µ3i , (70)
12
i=1
where
✓ ◆
ri 1 + ri
µi ⌘ p 1/3 (ri ri 1 ). (71)
2
58 / 102
We are led to determine the decision levels {ri } via µi by
minimizing
k
1 X 3
ESE = µi
12
i=1
k
X
subject to µi = a constant, say M. (73)
i=1
Using Lagrange multipliers, we obtain
k k
!
1 X X
argµi min µ3i + (M µi ) (74)
12
i=1 i=1
which is solved by
k k
!
@ 1 X X
µ3i + (M µi ) = 0, (75)
@µi 12
i=1 i=1
yielding
1 2 p
µ = 0 ) µi = 2 . (76)
4 i
59 / 102
To satisfy the constraint for all i’s we need
k
X p
µi = M or 2 k = M, (77)
i=1
p
which yields 2 = M/k.
From here the optimal µi ’s are
60 / 102
Define a function
Z r
(r ) ⌘ p 1/3 (x)dx, (80)
xL
We have clearly,
Z ri Z xH
1
1/3 i 1
(ri 1) = p (x)dx = p 1/3 (x)dx (82)
xL k xL
and
Z ri Z xH
i
(ri ) = p 1/3 (x)dx = p 1/3 (x)dx. (83)
xL k xL
61 / 102
Therefore,
Z xH
1
(ri ) (ri 1) = p 1/3 (x)dx. (84)
k xL
But
Z ri
(ri ) (ri 1) ⌘ p 1/3 (x)dx
ri 1 ✓ ◆
ri 1 + ri
⇡ p 1/3 (ri ri 1 ). (85)
2
We see that the decision levels r1 , . . . , rk 1 are Rthe places where
x
the function (r ) crosses the levels iM/k or i xLH p 1/3 (x)dx/k for
i = 1, . . . , k 1.
Xw =x v = (x)2[0,M] i
x ! (x) ! Uniform k-level Q( ) !
ri + ri 1
for i the output is ✓i⇤ = ! from [0, M] to numbers {1, . . . , k}
2
This quantizer achieves the (approximately) optimal mean squared
error
k
1 X ⇤ 3
ESE (opt) = (µi )
12
i=1 R xH 1/3 3
1 M3 1 xL p (x)dx)
(
= k 3 =
12 k 12 k2
1 M3 constant
= = . (86)
12 k 2 k2
63 / 102
Suppose that the number of quantization level is high and the
range of values for a random variable Xw is [xL , xH ] and we have
p(x) over the range.
Assume that we simply divide the range [xL , xH ] into k intervals
given by
xH x L xH xL
xi = [x L + (i 1) , x L + i ] for i = 1, 2, . . . , k.(87)
k k
It is clear that with this division the least mean squared error will
be
Z xH
u u
ESE (Q ) = (x Q u (x))2 p(x)dx
xL
k
X Z ri
ri 1 + ri
⇡ p( ) (x ✓i ) 2
2 ri 1
i=1
k
1 X ri 1 + ri
= p( ) (ri ✓i ) 3 (ri 1 ✓i ) 3 (88)
3 2
i=1
65 / 102
This result raises the question: are we indeed doing better for the
non-uniform “adapted” quantizer? That is, do we have
opt u uniform
ESE (Q ) ESE (Q ) (92)
1 1 1 1 p 1
for p, q 2 [1, 1), satisfying p + q = 1, that is, q =1 p = p ,
or q = p/(p 1).
p q
By Hölder’s inequality |fg |1 |f |p |g |q for p1 + q1 = 1. The proof is based on Young’s inequality ab ap + bq .
66 / 102
We take f = p 1/3 (x) and g = 1 for x 2 [xL , xH ], and obtain
Z xH ✓Z ◆1/3 ✓Z ◆2/3
p 1/3 (x) · 1dx |p 1/3 (x)|3 dx 13/2 dx (95)
,
xL
that yields
1
✓Z ◆3 z✓Z }| ◆{
xH
p 1/3 (x)dx p(x)dx (xH xL ) 2 (96)
xL
or
✓Z xH ◆3
p 1/3 (x)dx (xH xL ) 2 , (97)
xL
as expected.
67 / 102
Equality happens only if there are real numbers ↵, > 0,
Z xH
↵ ⌘ |g |q dx = (xh xL ) (q = 3/2)
ZxLxH Z xH
⌘ |f |p dx = p(x)dx = 1 (p = 3) (98)
xL xL
so that
68 / 102
More of the same
69 / 102
Quantization
Our quantization function replaces one random variable with
another, ✓! ⌘ Q(X! ) 2 {✓1 , ✓2 , . . . , ✓k }. The error is give by
E! ⌘ X! ✓! , such that
yielding
or
70 / 102
i.e.
E {(X! ✓! )✓! } = 0
E (E! ✓! ) = 0. (103)
In other words E! ? ✓! .
71 / 102
Approximately optimal quantization (large k)
We divide the interval [xL , xH ] into intervals s.t.
r0 = xL , r1 , . . . , rk = xH that correspond to representation levels
✓1 , . . . , ✓k . Then,
Z xH
ESE = (x Q(x))2 p(x)dx
xL
k Z ri
X
= (x Q(x))2 p(x)dx
i=1 ri 1
Xk Z ri
ri 1 + ri
⇡ (x Q(x))2 p( )dx
2
i=1 ri 1
Xk ri
ri 1+ ri (x ✓i )3
= p( )
2 3 ri
i=1 1
k
1 X ri 1 + ri
= p( ) (ri ✓i ) 3 (ri 1 ✓i ) 3 .
3 2
i=1
72 / 102
d
Setting ESE
d✓i = 0, we solve
1 ri 1 + ri
p( ) ⇥ 3(ri ✓i )2 + 3(ri 1 ✓i )2
3 2
ri 1 + ri
= p( )(ri 1 + ri 2✓i )(ri 1 ri ) = 0.
2
r +r
That yields (as before) ✓i⇤ = i 12 i . In which case
1 X ri 1 + ri ⇣ ⌘
k
⇤ ⇤ ⇤ ri 1+ ri 3 ri 1 ri 3
ESE (✓1 , ✓2 , . . . , ✓k ) = p( ) (ri ) (ri 1 )
3 i=1 2 2 2
1 X ri 1 + ri ⇣ ri + ri 3 ⌘
k
ri 1 3 ri 1
= p( ) ( ) ( )
3 i=1 2 2 2
k ✓ ◆
1 X ri 1 + ri 2(ri ri 1)
3
= p( )
3 i=1 2 8
k
1 X ri 1 + ri 3
= p( )(ri ri 1)
12 i=1 2
1 X ⇣ 1/3 ri 1 + ri ⌘3
k
= p ( )(ri ri 1)
12 i=1 2
k
1 X 3
= µi
12 i=1
73 / 102
ri 1 +ri
with r0 ⌘ xL and rk ⌘ xH , and µi ⌘ p 1/3 ( 2 )(ri ri 1 ).
But
k
X ri 1 + ri
µi = p 1/3 ( )(ri ri 1 )
2
i=1 Z xH
⇡ p(x)1/3 dx = M a computable constant.
xL
74 / 102
whose derivative
k k
!
d 1 X 3 X
µi + (M µi ) =0
dµi 12
i=1 i=1
1 2
p
leads to4 µi = 0, thus µi = 2 , 8i.
The constraint is,
k
X p p
µi = M ) 2k =M)2 = M/k.
i=1
Thus optimal µi are µ⇤i = M/k, the same constant for all i.
The optimality condition for the decision levels is
R xH 1/3 dx
1/3 ri 1 + ri xL (p(x))
p ( )(ri ri 1 ) = . (106)
2 k
75 / 102
Rr
Define the function (r ) = xL p 1/3 (x)dx and find r s.t.
Z xH
i i
(ri ) = p 1/3 (x)dx = (xH ).
k xL k
R ri i 1
R xH
Clearly, (ri 1 ) ⌘ xL 1
p 1/3 (x)dx = xL p
k
1/3 (x)dx, and
R ri 1/3 i
R xH 1/3
(ri ) ⌘ xL p (x)dx = k xL p (x)dx.
Thus, Z
1 xH 1/3
(ri ) (ri 1 ) = p (x)dx,
k xL
but
Z ri
ri + ri 1
(ri ) (ri 1) ⌘ p 1/3 (x)dx ⇡ p 1/3 ( )(ri ri 1 ).
ri 1
2
76 / 102
The decision levels {r . , rk 1 } are those at which (r ) crosses
R 1xH, . . 1/3
the value iM/k or i xL p (x)dx/k for i = 1, . . . , k 1
Graph of (r ) with uniform quantization along the y axis.
77 / 102
k
1 X ⇤ 3 1 M 3 1 M3
ESE (optimal) ⌘ (µi ) = k( ) =
12 12 k 12 k 2
i=1
R xH 1/3 3
1 ( xL p (x)dx)
= ,
12 k2
where, given p(x), M 3 is a constant.
78 / 102
Uniform Quantization for Large k
Same conditions as before but now assume we uniformly sample
[xL , xH ] into
xH xL xH xL
xi = [xL + (i 1) , xL + i ] i = 1, . . . , k.
k k
Z xH
u u
ESE (Q ) = (x Q u (x))2 p(x)dx
xL
k
X Z ri
ri 1 + ri
⇡ px ( ) (x ✓i )2 dx
2 ri 1
i=1
k
1X ri 1 + ri
= px ( ) (ri ✓i ) 3 (ri 1 ✓i ) 3 .
3 2
i=1
79 / 102
The estimated ESE for that ✓opt is
u
ESE ⌘ E (X! Q u (X! ))2
k
1 X (xH xL ) 3
⇡ px (midpoints of i)
12 k3
i=1
k
1 (xH xL )2 X
= px (midpoints of i )| i |
12 k2
2 Zi=1xH
1 (xH xL ) 1 (xH xL )2
⇡ px (x)dx = .
12 k2 xL 12 k 2
81 / 102
R xH
We proved that ( xL p(x)1/3 dx)3 (xH xL )2 as expected.
Equality is obtained i↵ p(x) = 1/(xH xL ) almost everywhere. I.e.
uniform distribution for which uniform quantization is the best.
Since then,
Z xH Z xH
1
( p(x)1/3 dx)3 = ( ( )1/3 dx)3 = (xH xL )2 .
xL xL xH x L
82 / 102