Professional Documents
Culture Documents
(Lecture Notes in Mathematics 1730) Siegfried Graf, Harald Luschgy (Auth.) - Foundations of Quantization For Probability Distributions (2000, Springer-Verlag Berlin Heidelberg) PDF
(Lecture Notes in Mathematics 1730) Siegfried Graf, Harald Luschgy (Auth.) - Foundations of Quantization For Probability Distributions (2000, Springer-Verlag Berlin Heidelberg) PDF
Editors:
A. Dold, Heidelberg
E Takens, Groningen
B. Teissier, Paris
Springer
Berlin
Heidelberg
New York
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo
Siegfried Graf Harald Luschgy
Foundations of
Quantization for
Probability Distributions
~ Springer
Authors
Siegfried Graf
Faculty for Mathematics and Computer Science
University of Passau
94030 Passau, Germany
E-mail: graf@ fmi.uni-passau.de
Harald Luschgy
FB IV, Mathematics
University of Trier
54286 Trier, Germany
E-maih luschgy@uni-trier.de
Graf, Siegfried:
Foundations of quantization for probability distributions / Siegfried
G r a f ; Harald Luschgy. - Berlin ; Heidelberg ; New York ; Barcelona ;
Hong Kong ; London ; Milan ; Paris ; Singapore ; Tokyo : Springer,
2000
(Lecture notes in mathematics ; 1730)
ISBN 3-540-67394-6
This work is subject to copyright. All rights are reserved, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, re-use
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other
way, and storage in data banks. Duplication of this publication or parts thereof is
permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from
Springer-Verlag. Violations are liable for prosecution under the German Copyright
Law.
Springer-Verlag is a company in the BertelsmannSpringer publishing group.
© Springer-Verlag Berlin Heidelberg 2000
Printed in Germany
The use of general descriptive names, registered names, trademarks, etc. in this
publication does not imply, even in the absence of a specific statement, that such
names are exempt from the relevant protective laws and regulations and therefore
free for general use.
Typesetting: Camera-ready TEX output by the author
Printed on acid-free paper SPIN: 10724973 41/3143/du 543210
Contents
List of Tables IX
Introduction 1
2 C e n t e r s a n d m o m e n t s of p r o b a b i l i t y d i s t r i b u t i o n s ........... 20
2.1 U n i q u e n e s s a n d c h a r a c t e r i z a t i o n of c e n t e r s . . . . . . . . . . . 20
2.2 M o m e n t s of b a l l s . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 B a s i c p r o p e r t i e s of o p t i m a l q u a n t i z e r s . . . . . . . . . . . . . . . . . . 37
4.3 Q u a n t i z a t i o n e r r o r for b a l l p a c k i n g s . . . . . . . . . . . . . . . 50
4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 U n i q u e n e s s a n d o p t i m a l i t y in o n e d i m e n s i o n .............. 64
5.1 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.3 A s y m p t o t i c o p t i m a l i t y in o n e d i m e n s i o n ............ 99
9 R a n d o m q u a n t i z e r s a n d q u a n t i z a t i o n coefficients . . . . . . . . . . . . 127
10 A s y m p t o t i c s for t h e c o v e r i n g r a d i u s . . . . . . . . . . . . . . . . . . . 137
10.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Bibliography 215
Symbols 225
Index 229
List of Figures
5.4 E x p o n e n t i a l d i s t r i b u t i o n E ( 1 ) , r -- 2 .................. 72
5.5 G a m m a d i s t r i b u t i o n F ( ~ , 2), r = 2 .................. 72
7.1 P r o b a b i l i t y d i s t r i b u t i o n s Pr ....................... 98
The term "quantization" in the title originates in the theory of signal processing. It
was used by electrical engineers starting in the late 40's. In this context quantization
means a process of discretising signals and should not be mistaken for the same term in
quantum physics. As a mathematical topic quantization for probability distributions
concerns the best approximation of a d-dimensional probability distribution P by a
discrete probability with a given number n of supporting points or in other words,
the best approximation of a d-dimensional random vector X with distribution P by
a random vector Y with at most n values in its image. It turns out that for the error
measures used in this book there is always a best approximation of the form f ( X ) , a
"quantized version of X". The quantization problem can be rephrased as a partition
problem of the underlying space which explains the term quantization.
Much of the early attention in the engineering and statistical literature was concen-
trated on the one-dimensional quantization problem. See Bennett (1948), Panter and
Dire (1951), Lloyd's 1957 paper (published 1982), Dalenius (1950), and Cox (1957).
Steinhaus (1956) was apparently the first who explicitly dealt with the problem and
formulated it for general (3-dimensional) spaces. Since then quantization occurred in
various scientific fields, for instance
The aim of the present book is to describe the mathematical theory underlying the
different applications of quantization. The emphasis is on absolutely continuous as
2 Introduction
P~(P1,/°2) = inf
{(f [Ix - Y[r d#(x, y) : # probability on R d x R a with
marginals P1 and P2 }
E ~nllX - all"
is called r-th quantization coefficient and can be expressed in terms of the r-th quan-
tization coefficient
Qr([o, 1] = dimoo (u
= inf n"/aV,~,r(U ([0, lld))
n>l
of the uniform distribution on the unit cube of R a and the density of the absolutely
continuous part of P with respect to Lebesgue measure. Quantization coefficients pro-
vide interesting parameters for probability distributions. They can be evaluated for
univariate distributions and some of them also for multivariate distributions. Funda-
mental work is due to Fejes Tdth (1959) and Zador (1963). Next we define asymptot-
ically optimal sequences of quantizers and sets of centers for nonsingular probability
distributions P and investigate their properties. It is proved that the empirical mea-
sures corresponding to asymptotically optimal sets of centers converge weakly to a
probability on R d which is explicitly given using P. Furthermore, the asymptotic
performance of certain classes of quantizers is compared to that of (asymptotically)
optimal quantizers. In particular, we consider regular quantizers which are based on
space-filling figures in R d, lattice quantizers, product quantizers, and random quan-
tizers. The results provide bounds for the quantization coefficients Q~ ([0,1]a).
All these considerations concern the ease 1 _< r < +oo and arbitrary norms on R e.
The rest of the chapter is devoted to the study of similar results for a geometric
covering problem which corresponds to the case r = c~. Here the quantization error
of a probability P (with compact support) and n c N is defined to be
e~,oo(P) = aca
dinf min { s > 0 : _ U B(a, s ) D supp(P)}.
I~l<__n ae~
The limit of the sequence (nl/ae,~,,,o(P)),,>_:exists in (0, c~) provided supp(P) is com-
pact Jordan measurable with positive (d-dimensional) volume. The limit
Chapter III deals with the asymptotic behaviour of the quantization error for prob-
abilities P on R d which are singular with respect to Lebesgue measure. Following
Zador (1982) we introduce the concept of quantization dimension of order r. For
r E [1, +col define
fV~,r(P) 1/r if 1 _< r < co,
en r(P) "l
for all balls B(x, s) whose center lies in the support of P and whose radius s is
smaller then a certain value So. Examples of this type of measures are the normalized
Lebesgue measure on a convex compact set, the normalized surface measure on a
convex compact set or a smooth compact manifold, and the normalized Hausdorff
measure on certain self-similar sets. For each regular probability P of dimension D
the quantization dimension of order r, r E [1, +co], is proved to be D, and moreover
P satisfies the strong separation property if the $1,..., Sly above can be chosen to
satisfy Si(supp(P)) 71Sj(supp(P)) = 0 for i ¢ j. If ( s l , . . . , sN) are the contractions
numbers corresponding to ($1,... , S~) then the similarity dimension is the unique
D E [0, +co) with s~ + . . . + s D = 1.
If the probability vector (pl,... ,ply) equals ( s ~ , . . . , s D) then the corresponding self-
similar probability P equals the normalized D-dimensional Hansdorff measure on the
support of P. If, in additon, P satisfies the strong separation condition, then the
quantization dimension Dr(P) of order r equals D for all r E [1, +co] and, moreover,
If (pl,... ,PN) ¢ ( s D , - - - , S° ) and the strong separation condition holds then the
quantization dimension Dr(P) of the corresponding P satisfies
N
= 1
i=1
and D~(P) < Dr(P) if r < t. Still, for every r E [1, +co],
Thus, self-similar probabilities constitute a class of probabilities for which the quanti-
zation dimensions of different orders do not all agree. It remains an open problem for
which probabilities lim nen,r(P) °~(e) exists, but it can be shown that for the classical
n--~oo
Cantor distribution this limit does not exist if r -- 2.
In the present book we do not intend to give a complete overview over the large
subject of quantization. We will focus on the quantization problem as stated earlier,
the so-called fixed rate quantization problem, and develop the underlying theory in
a mathematically rigorous way. For a comprehensive recent survey of the theory of
quantization including its historical development we refer the reader to the article
of Gray and Neuhoff (1998). This article also contains an extensive list of papers
published in electrical engineering journals on the subject.
1 Voronoi partitions
Voronoi partitions of R d will play a central role as optimal quantizing partitions for
probability distributions on R d. In this section we introduce Voronoi regions, Voronoi
diagrams and Voronoi partitions with respect to discrete point sets and describe some
of their basic properties.
by
(a) (b)
Proof
Let x E R d. Since locally finite subsets of R d are closed, there exists a E ~ such t h a t
IIx - all -- d(x, ~) and thus x E W(a[cQ. This proves t h a t the Voronoi d i a g r a m is a
1. Voronoi partitions 9
covering of R a, that is
[ J W ( a l ~ ) = R a.
aC~
This gives 7 c B(O, 3s) and hence, 7 is finite. Thus the Voronoi diagram is locally
finite. []
The o p e n V o r o n o i r e g i o n generated by a E ~ is defined by
These regions axe pairwise disjoint but do not provide a covering of R a. A Borel
measurable partition (A~ : a E a} of R d is called V o r o n o i p a r t i t i o n of R a with
respect to (~ (and P ) if
The Voronoi regions are closed and star-shaped relative to their generator point, that
is, the line segment joining any x E W(alce ) and the point a is contained in W(aice ).
In case d = 1, where the underlying norm is throughout the absolute value, Voronoi
regions are closed intervals. For a, b E R a, let
1.2 P r o p o s i t i o n
(a) W(al~ ) is closed and star-shaped relative to a.
(b) int W(alee ) = N int H(a, b) and Wo(alc~ ) is an open subset ofint W(alce ) which
bEce
is star-shaped relative to a. In particular, a E int W(alce) .
10 I. Genera] properties of the quantization for probability distributions
Proof
(a) Let x E W(ala) and 0 < s < 1. The point y = sx + (1 - s)a on the line segment
joining a and x satisfies
t]x - all < IIx - btt < IIx - Yll + IlY - bll for every b E a,
It remains to show that Abe, int H(a, b) is open. This is clearly true if ~ is finite. So
assume a is not finite. Let x E Abe, int H(a, b) and set 7 = {b C o~ : lix-all = IIx-bI]}.
Since 7 c ~ A B(x, lix - all),3' is finite by the local finiteness of a. Therefore,
N i n t H ( a , b ) A W 0 ( a i ( a \ 7 ) U{a})
be'y
= U OH(a, b) n W(~l~).
ben
[]
By the preceding proposition, one can find Voronoi partitions of Ra with respect to
a consisting of Borel sets A~ which are star-shaped relative to a E (~. In fact, let
a = {al, a z , . . . } be an enumeration of a and set, for instance,
A1 = W(alla),
Ak = W(aela) \ U W(aj I°~)
j<k
= W(akla) nWo(akl(al,... ,ak}), k >_ 2.
1. Voronoi partitions 11
Here ties are broken in favour of smaller indices. Some difficulties arise from the
fact that the intersection of different Voronoi regions may have interior points. This
corresponds to the fact that the separator of two points may have interior points; see
the subsequent Example 1.4. However, if the underlying norm is strictly convex this
cannot happen.
For a, b • R d, a ~ b, the s e p a r a t o r is defined by
The separator contains the midpoint (a + b)/2 but no other point from the line
through a and b. The norm II II is said to be strictly convex if Ilxll = IlYll -- 1, x ¢ y
implies Ilsx + (1 - s)yll < 1 for every s • (0, 1). The/p-norms are strictly convex for
1 < p < oo, while the/1-norm and the/oo-norm are not strictly convex.
1.3 P r o p o s i t i o n ( S t r i c t l y c o n v e x n o r m s )
Suppose the underlying norm is strictly convex.
Ily - a - s(b - a)ll = I1(1 - s)(y - a) + s(y - b)ll < IlY - all-
This gives
a contradiction.
12 I. General properties of the quantization for probability distributions
Let x • int H(a, b) with x ¢ a and choose e > 0 such that B ( x , c) C H(a, b). Let
t = 1 + c/]]x - ai] and z = a + t(x - a), Then
provided the underlying norm is strictly convex. The following example shows that
all assertions of Proposition 1.3 (and of (1.8)) can fail if [[ ]] is an arbitrary norm.
/
a xI
iiiiiiiii
Figure 1.2: Voronoi region and separator with respect to the/1-norm
1.4 Example
Let the underlying norm on R 2 be the /1-norm. For a = (1, 0) and b -- (0, 1), we
obtain
and
Thus H(a, b) is the union of three halfspaces and the separator is the disjoint union
of two quarterspaces and a line segment; see Figure 1.2. Clearly, all assertions of
Proposition 1.3 fail for (~ = (a, b}.
Under various conditions, Voronoi regions are geometrically regular. Let [A[ denote
the cardinality of a set A and let Ad denote the d-dimensional Lebesgue measure.
1.5 T h e o r e m ( B o u n d a r y t h e o r e m )
Each of the following conditions implies
Ad(OW(alc~)) = O, a • a.
Proof
According to Proposition 1.2 (c) it is sufficient to show that A~(OH(a, b)) = 0 for
a ¢ b. Since H(a, b) = H(a - b, 0) + b, we may assume without loss of generality that
b = 0. Set
(1.9) < 1
Hence we get
Yn = sna + t n x , sn, tn E ~.
1
a + l_---L-~n(y,~ - a) • S ( a , O)
and hence
1
--(y,~ - a) e S ( - a , o).
i -- sn
Thus the convex function hn : N --+ N given by hn(s) = IlYn - a + sail satisfies
Since hn(1) = Ily-II < Ily= - all, one gets 1 < 1 - sn, n > no and our claim is proved.
Since t,~ > - s ~ > O, it follows from (1.10) that
tn
s~ + t~ x E S ( - s~ a, O), n > no.
Sn + tn
Hence
- - z e S( a, O)
Sn + tn n+
and therefore
1 t= t= all - t=
II~y=-all=lls.+t= x s~+ s. T t l Izll, n___no.
This yields
1
s= + t~ y~ E S(a, 0), n > no,
We know from the Example 1.4 that the Voronoi diagram of a, in general does not
provide a tesselation of the space R d. (Notice that the Voronoi diagrams in Figure
1.1 provide tesselations.) According to (1.8) and Theorem 1.5, the Voronoi diagram
of (~ with respect to a strictly convex norm is a tesselation of R a.
Two further properties of Voronoi regions concerning neighbouring regions and
equivariance under similarity transformations are of interest. A bijective mapping
T : ]Rd --+ R d is called s i m i l a r i t y t r a n s f o r m a t i o n if there exists c E (0, co), the
s c a l i n g n u m b e r , such that HTx - Tyll = c]lx - Yll for every x , y E R d. Let
T ( a ) = { T a : a E a}; T((~) is locally finite.
16 L Genera/properties o f the quantization for probability distributions
1.6 L e m m a
Let T : R ~ --+ R d be a similarity transformation. Then
W(TaIT(~)) = TW(a[a).
Proof
Obvious. []
Voronoi regions are determined by their neighbouring regions in the following sense.
1.7 L e m m a
For a E c~, let
Proof
Clearly we have W(al~) c W(al~). To prove the converse inclusion, let x • R a \
W(alc~) and consider the line segment {y, : s • [0, 1]} joining a and x, where y, =
sx + (1 - s)a. Since W(a[a) is closed, a • int W(ala), and W(ala) is star-shaped
relative to a, we obtain
Choose b • V with Yso • W(bl~). Then b • ~ and Yt • W(bl~) for some t • (So, 1].
Since Yt • W(al~), the point Yt satisfies Yt ¢ W(aIZ). This implies x ¢ W(alfl ). []
Notice that bounded Voronoi regions have a finite number of neighbouring regions by
the local finiteness of Voronoi diagrams.
Voronoi regions with respect to euclidean norms exhibit some special features. Let
( , ) be any scalar product on R d and ]lxll = (x, x) 1/2. Then for a ¢ b, H(a, b) is the
closed halfspace
1
(1.11) H(a,b) = {x E R d : ( a - b , x - ~(a + b ) ) > 0}
1. Voronoi partitions 17
The hyperplane S(a, b) contains the midpoint (a ÷ b)/2 and is perpendicular (with
respect to ( , ) ) to the line through a and b. Thus the Voronoi regions W(a[a) are
convex, ff a is finite, then W(a]a) is a polyhedral set, that is, a finite intersection of
closed halfspaces in R ~. In the sequel a (convex) polytope means a compact polyhedral
set. By Lemma 1.7, bounded Voronoi regions are polytopes. The following example
shows that, in general, unbounded Voronoi regions are not polyhedral sets.
1.8 E x a m p l e
Let the norm on R 2 be the/2-norm. Consider the set ~ = {a, : n > 0} with a0 = (2, 0)
and a , -- (0, n) for n > 1. Then ~ is locally finite and the points
are extreme points of W(a0]~); see Figure 1.3. Since polyhedral sets in R d have a
finite number of extreme points, W(a0[c~) is not polyhedral.
a 5
a4
a3
a 2
a 1
Figure 1.3: Voronoi region with respect to/2-norm which is not a polyhedral set
1.9 R e m a r k
As indicated above, the Voronoi regions are convex in the euclidean case. It is an
interesting fact that the convexity of Voronoi regions even characterizes euclidean
norms. More precisely, if W(a]c~) is convex for every finite subset a of R d and every
a E (~, then the underlying norm is euclidean. This is a classical result of Mann
(1935). See also Gruber (1974).
1.10 Proposition
Let IIxH ----(x,x) U2 for some scalar product ( , ) on R d and let a E c~. Then W(alc~ )
is bounded if and only if a E int cony a.
Proof
For u E •d, u ~ 0, consider the halfline Lu = {a + su : s >_ 0} with initial point a.
We have Lu C W(a[(~), that is,
1.11 Example
Let the underlying norm on R 2 be the/l-norm. Consider a -- ((0, 0), (0, - 1 ) , (2, 1),
( - 2 , 1)} and let a = (0,0). Then a E int cony a, but W(a[a) is unbounded since, for
instance, the halfline (s(0,1) : s _~ 0} is contained in W(ala); see Figure 1.4.
Notes
For detailed treatments of Voronoi diagrams of finite point sets we refer to the no-
table book by Okabe et al. (1992), the review article by Aurenhammer (1991), and
the book by Klein (1989). A discussion of random Voronoi tesselations with respect
to the/2-norm may be found in Moiler (1994). Theorem 1.5 (iii) on the geometric
regularity of Voronoi regions is certainly known but we are not aware of a reference.
1.12 C o n j e c t u r e
The assertion of Theorem 1.5, that is, Ad(OW(ala)) = 0 for every a • a, holds for
arbitrary norms and arbitrary dimensions.
20 I. General properties o f the quantization for probability distributions
EIIX[Ir <
Let C r ( P ) denote the set of all centers of P of order r. Centers of order 1 are usually
called (spatial) medians. The r - t h ( a b s o l u t e ) m o m e n t o f P about the center is
defined by
(2.3) M r ( A ) = V~(U(A))
2.1 L e m m a
Let T : l~~ -~ R d be a similarity transformation with scaling number c > O.
(a) C r ( T ( X ) ) = T C ~ ( X ) ,
V~(T(X)) = crVr(X) .
Proof
(a) is obvious.
(b) If X is U(A)-distributed, then T(X) is U(T(A))-distributed. Prom (a) it follows
that
V~(T(X)) crV~(X) = Mr(A).
M r ( T ( A ) ) - M(T(A))~/d = (¢tM(A))~/~
[]
Proof
By convexity and continuity of Or, the level sets are convex and closed. Since
2.3 E x a m p l e
(a) If P is symmetric (about the origin), then 0 E C~(P) and thus, V~(P) = E[[X[[ r.
In fact, Cr(P) is symmetric by Lemma 2.1 (a) and from convexity of Cr(P) follows
o c c~(P).
(b) For the /2-norm, we obtain C2(X) = { E X } and V2(X) = ~-~=1 VarX,, where
VarXi denotes the variance of Xi. If the underlying norm is the /l-norm, then
C I ( X ) = X~ Med(Xi), where Med(Xi) is the set of medians of the real random
variable Xi.
The center of a probability distribution need not be unique; think of the median of
one-dimensional distributions. Conditions for the uniqueness of the center are derived
in the following theorem. Condition (iii) is due to Milasevic and Ducharme (1987)
(for euclidean norms) and Kemperman (1987).
(iii) The underlying norm is strictly convex, r = 1, and P ( L ) < 1 for every
line L C R d.
Proof
We show that ¢~ is strictly convex. This yields the assertion. Let a, b E R d, a ~ b,
and0<s<l. Then
Let
In case r > 1, A is contained in the separator S(a, b) since t ~-+ t r is strictly convex.
If, additionally, the norm is strictly convex, then A -- 0.
If r = 1 and the norm is strictly convex, then A is a subset of the line L through a and
b given by L --- (ta + ( 1 - t)b : t c R}. To see this, let x E A and assume x ¢ b. By
strict convexity of the norm, there exists t E R+ such that s(x - a) = t(1 - s)(x - b).
This gives
s ts - t
x-- .a + - - b
s-t+st s-t+st
2. Centers and moments of probability distributions 23
for every y E R d. I f the underlying norm is smooth, this condition takes the form
Proof
V + ¢ r ( a , y) ___ 0
The function g: R --~ 1~ given by g(t) = tla - z + tyll ~ is convex and thus satisfies
:( f f Hx-allr-lllYHdP(x)
(xCa} {z=a}
(b) Suppose the underlying norm is smooth. Then ¢~ is differentiable on R d for r > 1
while ¢1 is differentiable at every point a • R 4 with P((a}) = 0. The derivative is
given by
{z~a}
2.6 L e m m a
(a) (Euclidean norms) Let ]]xl] = (x,x) 1/2 for some scalar product ( , on ~d"
Then
Cr(P) C cl conv(supp(P)).
Proof
and therefore,
2.7 E x a m p l e
Let the underlying norm on R 2 be t h e / o o - n o r m . Consider P = ½(5(-1,0) + 5(1,0)),
where 5~ is the point mass at x. Since P is symmetric about E X = (0, 0), this point
belongs to Cr(P) and thus, Vr(P) = EIOX[]r = 1 for every 1 < r < oo. We find
see Figure 2.1. Clearly, the assertion of Lemma 2.6 does not hold for P.
26 L General properties of the quantization for probability distributions
Figure 2.1: CI(P) and C r ( P ) , r > 1, with respect to the loo-norm for a discrete
probability P with two supporting points
2.2 M o m e n t s of balls
Balls have minimal moments for measures # which vanish on spheres, i.e.
tz(OB(a,s)) = 0 for every a E R g and every s ~ 0. (Note t h a t OB(a,s) = ( x E
R d : [Ix - a I [ ----s}.) This statement is meant in the following sense.
2.8 L e m m a
Let # be a Borel measure on R d that is finite on compact sets and vanishes on spheres.
Then, for every bounded set A E B(R d) with #(A) > 0 and every a E R d there is an
s >_ 0 with #(B(a, s)) = #(A). Moreover, for such an s,
f
[Ix - a[[r d#(x) > / [Ix - at[r dl~(x).
*g
A B(a,s)
Proof
Since A is bounded there exists an So > 0 with A C B(a, So), hence 0 < #(A) _<
# ( B ( a , so)) < co. Since the m a p R+ --+ R+, s ~-+ #(B(a, s)) is continuous under the
assumptions for ~ the intermediate value theorem yields the existence of an s > 0
with s _< So and ~(B(a, s)) = #(A). Then ~(A \ B(a, s)) = #(B(a, s) \ A) and we
2. Centers and moments of probability distributions 27
have
- / IIx-alrd#(x).
B(a,s)\A
Obviously
and
f Hx_a[[rd#(x) - / IIx-alrd#(x)
A\B(a,s) B(a,s)\A
> g(#(A \ B(a, s)) - #(B(a, s) \ A)) = O.
Hence, the lemma is proved. []
We can deduce a well known fact about the moments of unffom distributions on balls.
2.9 L e m m a
We have
M~(B(O, 1)) = min{M~(A) : A E B(R d) bounded, Ad(A) > 0}
and B(O, 1) is the essentially unique minimizer of Mr in that any bounded set A E
B(R a) with Mr(A) = Mr(B(O, 1)), Ad(A) = An(B(0, 1)), and 0 C C~(U(A)) satisfies
Ad(A A B(0, 1)) -- 0.
If, additionally, A is regularly dosed (that is, A = cl(int A)), then A = B(0, 1).
Moreover,
d
(2.5) Mr(B(O, 1)) = (d + r)Ad(B(0, 1)) r/d"
Proof
The first assertion follows from Lemma 2.8 with the choice # = Ad and Lemma 2.1
(b). As for uniqueness, let A be a set with the above properties. Then
/ ''x''rdx= / ['x'[rdx"
A B(0,1)
28 I. General properties of the quantization for probability distributions
It follows that
f
)~d(B(0, 1) \ A) = Aa(A \ B(0, 1)) _< ] Ilzllr dx
. 1
A\B(0,1)
Therefore,
f (llxlr-l) dx = 0
A\B(0,1)
Therefore, Ad({[[x[[ < 1}) = Aa(int A) which implies {llxil < 1} = int A. From regular
closedness of A follows A = B(0,1).
Moreover, in view of the symmetry of U(B(0, 1)) one gets
--
f
0
(1 - t d/r) dt - d + r"
(2r(1 + ~)V
(2.6) ~(z(0,1)) -
r(1 + ~)
(cf. e.g. Pisier, 1989, p. 11).
2. Centers and moments of probability distributions 29
Notes
Among spatial centers the spatial medians have received special attention. We refer
to the survey article by Small (1990) for a discussion of several notions of spatial
medians. A good source for norm-based medians as defined in (2.1) is Kemperman
(1987).
30 L General properties of the quantization for probability distributions
V,~,~(P) = E I I X - f(X)lr.
al
_A
-w
= vA = ~(x)
3.1 Lemma
V,,,r(P)= inf
aCR d
E~nlJX-alV.
[ai_<n
3. The quantization problem 31
Proof
For f E ~-~, let a = f ( R d) and A~ = { f = a}, a E a. Then
= E ~ n IIX - blL
Conversely, for a c N d with Ic~l < n, let {A~ : a E c~} be a Voronoi partition of R d
with respect to a and let f = ~ alA.. Then f E ~ and
aE~
EminllX-all~=a~e~fllx-allrdP(x)=EIIX-aea f(X)l[r"
[]
A set a C R d with lal _< n is called n - o p t i m a l s e t o f c e n t e r s f o r P o f o r d e r r if
V,~,r(P) = E ma Ei n~ IIX - a l l
The following equivariance, scaling and invariance properties extend those of Lemma
2.1.
3.2 Lemma
Let T: R d --+R d be a similarity transformation with scaling number c > 0.
(a) G , , ( T ( X ) ) = TC~,~(X) ,
V~,~(T(X)) = ~'V,~,(X).
32 L General properties of the quantization for probabifity distributions
Proof
Obvious. []
Next, we show that the quantization problem is equivalent to a partitioning problem
for the space Rg.
3.3 L e m m a
V~,,(P) =i~f~)-~V~(P(.IA))P(A),
AC~
where the infimum is taken over all Borel measurable partitions ,4 ofR d with ]~4] _< n.
Proof
For f E ~-, let a ----f(R d) and A~ -- ( f -- a}. Then (An: a E a} is a partition of R ~
and
EiIX-f(X)ir=~f[ix-ail~dP(x)
>- ~ V~(P('IA~))P(Aa).
aEa
[]
A Borel measurable partition ¢4 of ]Rd with I~4] <_ n is called n - o p t i m a l p a r t i t i o n
for P o f o r d e r r if
V~,r(P) =~V~(P(.IA))P(A ).
AEA
The proof of the preceding lemma shows that if f is an n-optimal quantizer, then
{{f = a} : a E f(R~)} is an n-optimal partition. Conversely, if .4 is an n-optimal
partition and aA E Cr(P(.IA)) for A E .4, then f -- ~ aA1A is an n-optimal
AEA
quantizer.
3. The quantization problem 33
where the infimum is taken over all Borel probabilities # on R d x R d with fixed
marginals P1 and P2. The L r - m i n i m a l m e t r i c p~ ( L r - W a s s e r s t e i n m e t r i c or Lr-
K a n t o r o v i c h m e t r i c ) is appropriate for the quantization problem. This has been
observed by Gray et al. (1975), Gray and Davisson (1975) and Pollard (1982a). By
P~ denote the set of all discrete probabilities Q on R d with I supp(Q)I < n.
3.4 L e m m a
Proof
Given f E ~'~, let # I denote the image measure of P under the m a p R ~ -+ ]R~ x
R d, x ~-> (x, f(x)). Then
If Q E P~ with Q(a) = 1, ]a[ < n, then for every Borel probability tt on R d × R d with
marginals P and Q
: f 2tlx- all"dP(x),
hence
where the supremum is taken over all functions g: R d -+ N satisfying the Lipschitz
condition Ig(x) - g(Y)l <- IIx - Yl[ for all x,y E ]Rd In case d = 1, p~ admits the
representation
1
and
f
px(Pl, P~) = ] IF,(t) - F2(t) ldt,
where Fi denotes the distribution function and Fi- t the quantile function of P~
(F~-1(t) = inf{x e R: Fi(x) _> t},t e (0, 1)). For this background on Lr-minimal
metrics we refer to Rachev (1991) and Rachev and Riischendorf (1998, Chapters 2.5
and 2.6).
The empirical counterpart of quantization is cluster analysis. Somewhat more pre-
cisely, partitioning methods of cluster analysis for a finite sample according to a
norm-based optimality criterion correspond to quantization for the empirical mea-
sure.
3.5 E x a m p l e ( E m p i r i c a l v e r s i o n , c l u s t e r a n a l y s i s ) k
Let x l , . . . , xk E R d with xi = (xil,... , xid) and let P = ~ ~ 5~, denote the empirical
i=1
measure. We obtain from Lemma 3.3
1
Vn,r(P) = - min ~ min E Hx~ - air,
k c C~C~CaERaieC
3. The quantization problem 35
where the infimum is taken over all partitions C of { 1 , . . . , k} with ]C] _< n. If the
underlying norm is the/2-norm, then
1 .
V~,2(P) = ~ m~n ~ ~ ]]x~ - ~(C)]]2,
CEC iEC
where E(C) = ~ ,~c X,. This is the variance criterion for optimal grouping of data
x l , . . . , Xk. If the underlying norm is the/t-norm, then
1 .
V,,,,(P) = ~-m~n~ ~ I1=, - med(C) ll,
CEC iEC
The n-optimal sets of centers for P of order r correspond to global minima of the
function
Notice that ¢1,r = Cr. While Cr is convex, Cn,~ is typically not convex for n > 2.
Therefore, local minimum points of Cn,r may not be global minimum points of Cn,r.
The lack of any straightforward solution for the quantization problem (at least for
d > 2) is a result of the difficulty in dealing with the nonconvex nature of quantization.
Notes
Occasionally, it may be preferable to use other measures for the quantization error
than Lr-metrics as in (3.1). The limiting case of the Loo-metric ("worst-case error")
is studied in Graf and Luschgy (1999a). While the first metric requires X to be
bounded, the latter does not. The Ky Fan error measure leads to the approximation
problem for P with respect to the Prohorov metric (in the sense of Lemma 3.4). An
investigation of the quantization problem based on the geometric mean error
e x p E log t l X - f(X)tl
as measure of performance can be found in Graf and Luschgy (1999b). Input weighted
error measures of the form
where B(x) is a positive definite matrix for every x C R a, have proved useful in speech
and image compression systems. For various aspects of the quantization problem
based on this error see e.g. Gray and Karnin (1982), Gardner and Rao (1995), Li et
al. (1999) and Linder et al. (1999).
Basically different quantization problems have been treated by Elias (1970) and more
recently by Bock (1992) and PStzelberger and Strasser (1999).
4. Basic properties of optimal quantizers 37
The following two theorems provide necessary conditions for n-optimality of quantiz-
ers. They provide the gateway to most available algorithmic solutions.
4.1 T h e o r e m ( N e c e s s a r y c o n d i t i o n s for o p t i m a l i t y )
Let ~ C C~,r(P) and let {An : a E c~} be a Voronoi partition o f N d with respect to a
and P . Then
[a[ = n, P(Aa) > 0 for every a C a,
e I U Ao)) eorovery C with I 1--m.
aEB
In particular,
(4.1) P(W(ala)) > O, a e C,(P(.[W(ala))) for every a E ~.
Proof
Let 7 = {a E a : P(Aa) > 0} and assume ]7I < n. Obviously, 7 C C~,,(P). Since
P ~ Pn-1, there exists a C 7 such that P(.IAa) is not a point mass. We can conclude
that
P ( g ( a , b) c M An) > 0
for some b e R d. (Recall H(a,b) = {x E R d : [ix -- ail _< ]ix -- bi]}.) In fact, we have
P(A~ \ {a}) > 0 and hence, there is a compact set K C A~ \ {a} with P ( K ) > O.
Since K C U H(a, b) c and H(a, b)c is open, we can find a finite subset B of K such
bcK
that K C U H(a, b)c. This gives the existence of a point b C B with the required
bEB
property. It follows that
V~,r(P) = E m i n [ i X - a[[r > E min [IX - a[[r > V~,(P),
aE7 aETt2{b} '
a contradiction.
As for the assertion concerning/3, assume/~ ¢ C,~,~(P(. I U A~)). Then there exists
aE/~
5 C R d with 15[ < m and
It follows that
a contradiction. []
We know from (1.8) and Theorem 1.5 that the Voronoi diagram of every finite subset
of tt[d is a P-tesselation provided the underlying norm is strictly convex and P is
absolutely continuous with respect to Ad. So the following result is of interest for
probability distributions P which are not absolutely continuous with respect to Ad.
(Such probabilities are considered in Chapter III.)
4.2 T h e o r e m ( N e c e s s a r y c o n d i t i o n for o p t i m a l i t y )
Let o~ • C~,~(P) and let r > 1 or P ( a ) = O. Suppose the underlying norm is strictly
convex and smooth. Then the Voronoi diagram of a is a P-tesselation o f R d.
Proof
We have to prove
P(W(al~) n W(bl~)) = 0
for every a,b • a , a 76 b. Fix a,b • oi, a ¢ b and assume P ( W ( a [ o 0 M W(blot)) > O.
Choose a Voronoi partition {Ac : c • a} with respect to a such that A~ = W(ala ) \
W(bla ). Then by Theorem 4.1,
a • Cr(P(.IA~)) n Cr(P(.iW(aloO) ).
and
which yields
Q has two different centers a and b of order r. By Theorem 2.4, this can happen only
ff r = 1 and Q(L) = 1, where L is the line through a and b. Since
a contradiction. []
A set ~ c R d with ](~] -- n satisfying condition (4.1) is called n - s t a t i o n a r y set o f
c e n t e r s for P o f o r d e r r. Let S,~,~(P) denote the set of all these n-stationary sets
for P and denote by SS,~,r(P) the subset of S,~,~(P) consisting of all c~ E S,~,r(P) such
that the Voronoi diagram of (~ is a P-tesselation. Then by Theorem 4.1,
Cn,r(P) C S,~,r(P).
Note that any Voronoi partition {An : a E a } with respect to (~ E SS,~,r(P) and P
satisfies Aa = W(al~ ) P-a.s., a E c~. We also write S,~,~(X) and SS,~,r(X) instead of
S~,r(P) and SS~,~(P), respectively.
4.3 C o r o l l a r y
(a) Let A be an n-optimal partition for P of order r. Then [AI = n , P ( A ) > 0 for
every A E A, C~(P. IA)) n C~(P(.IB)) = 0 for every A, B E A, A • B, and ,4
is a Voronoi partition o f R d with respect to ~ {aA : A C A } and P for any
=
choice of aA C Cr(P(.IA)).
(b) Let f E Y=,~be an n-optimal quantizer for P of order r and let a = f(Rd). Then
a E Sn,~(P), { { f : a} : a E a ) is a Yoronoi partition o f R d with respect to o~
and P, P ( { f = a}) > 0 and a • Cr(P('l{f -- a})) for every a • ~.
Proof
(a) We have
>_ V,,,r(P).
4.4 L e m m a
Suppose C,~,r(P) C SS,~,~(P), that is, the Voronoi diagram of every a E Cr~,r(P) is a
P-tesselation o f R a. Then the set of n-optimal quantizing measures for P of order r
coincides with the set
Proof
Let Q = )-~aea PaPa be an n - o p t i m a l quantizing measure of order r. Choose a Borel
probability # on R d × R d with marginals P and Q such t h a t p~(P,Q)~ = f IIx -
yll ~ d#(x, y) and let f = ~ a e a alAa, where {Aa : a E a } denotes a Voronoi partition
of R d with respect to ~. Then f is an n - o p t i m a l quantizer and a E Cnx(P).
Therefore
r
J IIx - alV dlz(x, y) = llx - yll" d,(z, y)
,~eo ~ x {,q Raxa
= pr(P, Q)r = Vn,,(P)
= /min
j bea
IIx - b i t alP(x)
= f
ae aRd x {a}
minb~.IIx - bll r d,(x, y).
This implies
4.5 E x a m p l e
Let the underlying norm on R 2 be the lot-norm. Consider P = ¼(5(-1,0) +5(0,U +5(1,0) +
5(0,-1)) and let n -= 2, r = 1. It is geometrically rather obvious t h a t V2,1(P) = 1/2
4. Basic properties of optimal quantizers 41
and C2,1(P) consists of all sets {a,b} with a,b • {x • R 2 : Ixll + Ix~l = 1} such that
the line segment joining a and b meets the liae {xl = 0}; see Figure 4.1.
Now let a = ( - 1 , 0 ) and b -- (1,0). Then {a,b} e C~,I(P) and S(a,b) contains
the line through ( 0 , - 1 ) and (0, 1). One obtains P(S(a, b)) = 1/2 > 0 and hence,
the Voronoi diagram {H(a, b), H(b, a)} of {a, b} is not a P-tesselation. Furthermore,
the probability Q = ~SaS+ 35b is a 2-optimal quantizing measure for P. In fact, let
X l : ( - 1 , 0), X2 ---- ( 0 , 1), x3 = (1, 0), X 4 = ( 0 , - - 1 ) , and define a discrete probability/z
onR2 × R: by
#({(xl,a)})=~({(xa, b)})=#({(xd,a)})=l/4,
~({(x2, a)})=~({(x~,b)})=l/S.
Then the marginals of # are P and Q, respectively, and
X2
X1
Figure 4.1: 2-optimal centers of order 1 with respect to the lo~ norm
4.6 R e m a r k ( E u c l i d e a n n o r m s )
Let Ilxll = (x,x> 1/2 for some scalar product ( , > on R ~.
(a) We have
aP(W(a[o~)) = E X .
aE~
In case d --- 1, a simple but sometimes useful consequence is that m i n ~ < E X <
m a x ~ holds for o~ C SS~,2(P) (n > 2).
(c) If ~ C SSn,2(P), then
-- ½ ( x ) + IlEX[I 2 - ~ IlaIlZP(W(alod).
o,Eot
4.7 L e m m a
Let T : R ~ --+ ~d be a similarity transformation. Then
S,~,~(T(X)) = TS,~,,(X),
SS,~,r(T(X)) = TSS,~,~(X).
Proof
Easy consequence of the equivariance properties of Voronoi regions and Cr(X) given
in Lemmas 1.6 and 2.1 (a). []
Stationary product quantizers are discussed in the following lemma.
4.8 L e m m a ( P r o d u c t q u a n t i z e r s )
Let the underlying norm be their-norm. Let ni E 1N, t3i C R with [fli[ ~ hi, 1 < i < d,
and ~ = ~d= 1 Zi.
Proof
(a) Let Pi denote the d i s t r i b u t i o n of Xi, 1 < i < d. For a = ( b l , . . . ,bd) E a with
b~ C/~i for every i, we have
d
W(aloO = x W(b, lZ,).
d
1-I P~(W(bd,6~)) = P ( W ( a l a ) ) > 0, a = (b~,... ,bd) C o~
i=1
J=lw(a[a)
f f
wC~la)
IIx - all" dP(x)
and hence
Since for ci C R
f
f ]xi - cd'-dP~(xi)/P~(W(bil~i)) = [ [x~ - c~l'-dP(x)/P(W(alt~)),
W(bd~O w(~l~)
Conversely, assume j3i E Sn,,~(Xi) for every i. Then [a[ = n and P(W(a[a)) > 0. Let
One obtains
C ---- ( C 1 , . . . , Cd) E N d.
w(ala) ~=1w(bil~)
d
f
I,:1 W(bil,5. )
Ix'-c~[r dPi(x~)/P'(W(b~[/3~))
= / IIx-cll~dP(x)
w(~l,~)
and hence a E C~(P(.IW(alo~))). Therefore, oc E Sn,r(X).
(b) We have
d
E rain IIX - all ~ = ~ E min IXi - b r
aEa ~ bE~i
i=1
d
=
i=1
[]
The n-stationary sets for P are related to the stationary points of the function Cn,~
(see (3.4)).
4.9 L e m m a
¢~,r is continuous on (Rd) ~.
Proof
Immediate consequence of the continuity of (al,... , an) ~-~ minl<i<_~ IIx - aill r for
every x E R d and Lebesgue's dominated convergence theorem. []
4.10 L e m m a
Let a l , . . . , am E R d with a~ ~ aj for i ~ j. Suppose the Voronoi diagram of ~ =
( a l , . . . , an} is a P-tesselation o f R d. Then ~n,r has a one-sided directional derivate
at a = (at,... ,an) in every direction y = (Yl,... ,Yn) E (Rd) n given by
/ f the underlying norm is smooth and furthermore, r > 1 or P(a) = O, then Cn,r is
differentiable at the point a with derivative
Proof
Recall t h a t
For b = (bl,...,b,~) • (Re) '~ set d(x,b) = minl_<i_<,~I1= - bdl. By assumption, the
V o r o n o i d i a g r a m of a is a P - t e s s e l a t i o n of R e which gives P ( 0 Wo(ail°~)) = 1.
i=I
Furthermore, we have
< m a x Ilbdl
- l<i<,~
and since
we obtain
x E Rd,b E (Rd) '~ with maxt<i<,~ [Ibi[I < 1 for numerical constants C1,C2 >_ 0 not
depending on x and b.
Let y = ( Y l , - - - , Y,) • (Rd) '~. Then
t-l(¢~,r(a+ty)-¢~,r(a)) = ~ / t-l(d(x'a+ty)r-d(x'a)~)de(x)"
i=lw0(ad~)
Thus the assertion about the one-sided directional derivative of ¢~,r follows from
Lebesgue's dominated convergence theorem in view of (4.2).
Now assume t h a t the underlying norm is smooth. Then we have
46 I General properties o f the quantization for probabifity distributions
/ .I,,I'I°,-.l,-I.l,,,))
~=i W(~la)\{~}
P
(lmi~,~_._. ]lbjH) -1 ~ J/ (d(x, a + b) r - d(x, a) r - (V H H"(a, - x),b,)) d P ( x )
d
where ( x , z ) = ~ x j z j , x , z • R d. For x • Wo(aila), there exists ~ > 0 such that the
j=l
Rd-components of a + b are pairwise different and
In view of (4.2), the assertion about VCn,r(a) follows from Lebesgues's dominated
convergence theorem. []
Consequently, in view of Lemma 2.5, n-stationary sets a E SSr,,,.(P) of centers provide
stationary points of ¢~,r, i.e. V+¢~,r(a, y) > 0 for every y E (Rd)% The following
example taken from Lloyd (1982) shows that a n-stationary set of centers does not
necessarily yield a local minimum point of ¢~,~.
4.11 Example
Let P = c t U ( [ - 1 , 0]) + c2U([0, 1]) with c2 > c~ > 0, cl + c2 = 1, and let n = 2, r = 2.
T h e n E X = $ ( 21 c - c l ) , E X 2 = -~
1 and {-½,½} e $22(P).
. . For. - 1 . < al < a2 < 1, we
have
{{ :}} if..,:
4. Basic properties of optimal quantizers 47
S2,2(P) = C2,2(P) U
{{ 11}}
- , if cz > ~.
The next theorem ensures the existence of n-optimal quantizers. We follow the lines
of Pollard's (1982a) proof for the euclidean case and r -- 2.
4.12 T h e o r e m ( E x i s t e n c e )
We have Vn,,(P) < Vn-l,r(P). The level set {¢mr <- c} is compact for every 0 <_ c <
Vn-l,,(P). In particular, Cn,,(P) is not empty and U { a : a e C,~,,(P)} is a bounded
subset of R d.
Proof
By L e m m a 4.9, the level sets of Cn,r are closed. Choose 0 < s < S (depending on
n, r, P and c) such that
P(B(O, s)) > 0, (S- s)'P(B(O, s)) > c,
2" / IlxllrdP(x) < Vn-l,r(P) - c.
B(O,2S)c
Let ( a l , . . . , an) C {¢~,r _< c}. Since c < V~-l,r(P), we have ai ~ aj for i ~ j. Assume
without loss of generality ]lax[[ ~ . . . <~ I[an[[.Then Hall[ ~ S. Otherwise
Hence, if Vu,r(P) < c < V~-I,~(P), the level set (¢~,~ < c) is not empty and compact.
This implies that (¢~,r = Vn,r(P)} is not empty and compact and, in particular,
C,~,r(P) ¢ @provided Vu,r(P) < V,~-I,r(P).
Finally, we observe that this condition holds. We have Cr(P) ¢ q} by L e m m a 2.2.
Therefore, V2,r(P) < Vr(P), since otherwise there exists an 2-optimal set ~ of centers
with ]~1 = 1 which contradicts Theorem 4.1. Proceed inductively: if Vm,~(P) <
Vm-t,r(P) for some 2 ~ m _< n - 1, then C,~,~(P) ¢ q} by the preceding part of the
proof and hence, Vm+I,~(P) < Vm,r(P) again by Theorem 4.1. []
From Example 4.5 we know that there may be more than one n-optimal set of centers
for P. Here is another example of this fact for an univariate symmetric distribution.
4.13 E x a m p l e
Let P denote the uniform distribution on [ - 2 , - 1 ] U [1,2] and let n = 3, r = 2. Then
The following simple properties of the n-th quantization error functional turn out to
be useful.
4.14 L e m m a
Let P = siPi, st >_ O, ~ si = 1, f Ilxllr dP~(x) < oo.
i=1 i=l
i=1 i=1
m
Proof
(a) Let a • C~,r(P). T h e n
= ~ s, fminllz-a[I r dPi(x)
/=1 J age
__
i=1
gt~
i=1
[]
Let X = ( X 1 , . . . , Xd).
d
<
i=t
and equality holds if and only if there exists a E Cn,r(X) of the type a = Xi.d=lfli with
fli c R and Iflil = ni for every i. Moreover, such n-optimal product sets ~ satisfy
fli ~ C~,,r(X~) for every i.
Proof
d
For i < i < d, let fli C R with Iflil <- ni and let ol = Xid=l fli . T h e n [c~l _< 1-Ini _< n
i=1
and
d
Y~,r(X) < E ~ n l l X - alF = ~-" E min I X / - b F.
i=l
50 L General properties of the quantization t'or probability distributions
d
<
(cf. L e m m a 4.8 (b)) and if equality holds, then a E C,~,~(X). In particular, In[ = n
by Theorem 4.1 which gives lfli[ = ni for every i. Conversely, assume a ~ C,~,r(X).
Then
d d
d
implying Vn,r(X) = ~ Vm,~(Xi ) and fli C Cm,r(Xi ) for every i. []
i=1
Ball packings consisting of n translates of a ball minimize the normalized n-th quanti-
zation error for bounded sets. This observation extends the corresponding statement
of L e m m a 2.9 for balls to the case n > 2. By a / t - p a c k i n g in R ~ we mean a countable
family {Cj: j 6 /5/} of Borel sets Cj c N ~ such that #(Ci M Cj) = 0 for i # j. i
Ad-packing is simply called p a c k i n g .
Moreover,
i=,r(B) = 1)),
(al,..., a=} e
Proof
Let A c B(R d) be bounded with A4(A) > 0 and denote by Q the uniform distribution
U(A). Let C be an n-optimal partition for Q of order r. By Corollary 4.3 we know
that IC]-- n a n d Q ( C ) > 0 for every C E C. Note that Q(-]C) = U(A;3C). One
4. Basic properties of optimal quantizers 51
and hence
~_, Q(C) (~+r)/d >_n-~l d.
C~C
This implies
V,~,r(Q) >_ (Ad(A)/n)r/4Mr(B(O, 1))
and
M,~,r(A) >_n-r/dir(B(O, 1)).
Now let B = [J,~=l B(a~, s) and denote by P the uniform distribution U(B). Note
1 n
that P = ~ ~ U(B(ai, s)). By Lemma 4.14 (b), we have
i=1
= Vr(U(B(O, s)))
= (A~(B)/n)~/dM~(B(O, 1)).
The last equality follows from the scale invariance of Mr (see Lemma 2.1). Thus we
obtain
v.(p): f min ilx- ai[Ir dP(x) = Vr(U(B(O, s)))
J 1<i<,~
and
4.4 Examples
We present some examples of optimal quantiziers and stationary sets of centers for
dimensions d > 2. Optimal quantizers for several univariate distributions are given
in the next section.
4.17 E x a m p l e ( U n i f o r m d i s t r i b u t i o n on a c u b e a n d t h e c u b e q u a n t i z e r )
Let P = U([0,1]) d) and consider a tesselation of [0,1] d consisting of n = k d trans-
lates C 1 , . •. , C ~ of the cube [0,1d ~] . Denote by ai the midpoint of Ci. Then
• , L 2~ : i-- 1 , . . . , k } ~ and by the symmetry of Ci about ai,
tl
we have a~ ~ Cr(U(C~)) = Cr(P(.tCi)). Let f~ = ~ a~lc~; see Figure 4.2. From scale
i=l
and translation invariance of Mr it follows that
II /*
= y ~ Mr(C~)P(C~)(d÷r)/~
i=1
= n-r/~Mr([O, 1]d).
• • • $
0 1
We see that n-level quantization for P reduces Vr(P) = Mr(J0, 1] d) at least by a factor
n -rId. Note that
Mr([0,1] a) = / [[xl[rex.
2~2 J
and for t h e / ~ - n o r m
d
M~([0, 1] d) = M~(B(O, 1)) -
(d + r)2 r
provided Ci = [0, 1]dA W(ail~) for every i. This condition is satisfied, for instance, if
the underlying norm is the/p-norm for 1 < p < c~.
The error of the cube quantizer fu is of optimal order n -r/d but the constant
Mr([0, 1] d) is conjectured to be not optimal for common norms with one exception.
(This will be seen from the asymptotics for the n-th quantization error as n -+ c~
treated in Chapter II.) The exceptional case concerns the/co-norm in arbitraxy di-
mensions. In this case, we have C~ = B(a~, ~ ) and therefore, by Theorem 4.16, fu is
(P-a.s. equal to) an n-optimal quantizer of order r and a E Cu,r(P) for every r > 1.
In particular, for the/co-norm we obtain
4.18 E x a m p l e ( S p h e r i c a l distributions)
Let the underlying norm be the/2-norm on R d and let P be a spherical probability,
that is, P is invariant under the orthogonal group O(Rd). Consider the case r -- 2
and n :- 2. Suppose EIIXII 2 < c~. If (~ = (ai, a2) E SS2,2(X) then 0 = E X -=
2
aiP(W(ailo~)) by Remark 4.6 (b) and hence, one can find T E O(R d) such that
i=1
Ta~ = ( c 1 , 0 , . . . , 0 ) a n d T a 2 = ( c 2 , 0 , . . . , 0 ) w i t h c l < 0 < c2. By L e m m a 4 . 7 ,
T(o~) E SS2,2(X). Since
one gets (cl, c2} C SS2,2(X1). Note that X1 is symmetric (about the origin). Now we
use a uniqueness result which is discussed in the next section. Suppose the distribution
of X1 is strongly unimodal. Then it follows from Theorem 5.1 that S2,:(X1) =
{{-E]X1], E]Xll}). Therefore, cl = -SIX1] and c2 = E]X1]. This yields
4.19 E x a m p l e ( U n i f o r m d i s t r i b u t i o n o n a n e u c l i d e a n b a l l )
Let the underlying norm be t h e / s - n o r m on R d and let P -- U(B(0, 1)). Then P is a
spherical distribution. Consider the case n = 2 and r = 2. The distribution function
of X1 is given by
F(t) = P ( [ - 1 , t ] × R ~-1)
t
_ 1 /Aa-I(Bd_,(0, l~T~--y2))dy
~qB.(0, 1))
-1
t
P
= )~d-l(Bd_l(O , 1))/(1 - yS)(d-W: dy
~d(B~(O, 1))
-1
t
r(i+3) f
- ¢-~r(½ + 3) J(1 - ys)(d-,)/s dy, ttl <_ 1.
-1
We thus see that X1 has a Beta distribution. Since log[(1 - y2)(d-1)/s] is concave on
( - 1 , 1), the distribution of X1 is strongly unimodal. So Example 4.18 applies. We
have
1
2F(1 + ~) f
E[XIJ- v~C(½ + ~) 5 y(1 - y2)(d-WSdy
2F(1 + ~-)
-- (d + 1)v~F(½ + ~)'
1
E X ~ = "d + 2
and hence
v~,s(x) - d
a + 2 (EIX~I)~"
4.20 E x a m p l e ( d - d i m e n s i o n a l s t a n d a r d n o r m a l d i s t r i b u t i o n )
Let the underlying norm be t h e / s - n o r m and let P = Nd(O, I~) where I~ is the unit
matrix. Let r = 2. Here Example 4.18 applies and we obtain
determined 3-optimal set of centers for X, of order 2 (cf. Theorem 5.1). By Lemma
4.8, al 6 $3,2(X) and
Z m i n H X - a H 2 = 1.1902
a6at
2 '
with b > 0. Then conv a2 is a equilateral triangle and P(W(a[o~)) = 1/3, a 6 a2. If b
satisfies
W((O,b)]a2)
oo
Note that a2 is considerably better than al. Flury (1990) provides numerical evidence
that a2 6 C3,2(X).
For n = 4, we find three types of 4-stationary sets of centers. Let fll = 3' × {0}, where
7 -= { - c 2 , - c l , cl,c2} with 0 < Cl < c~ is the uniquely determined 4-optimal set of
centers for X1 of order 2 (cf. Theorem 5.1). Then by Lemma 4.8, fll 6 $4,2(X) and
E m i n [ [ X - a[[ 2 ---- V 4 2 ( X 1 ) -5 1.
a6~1
b/2 b/z
Here • denotes the distribution function of N(0, 1). Then for a E/32, a ¢ (0, 0)
aP(W(ai&)) = f xdP(x),
W(al~2)
P(W(al&)) = P(W((O, b)l&) )
oo
= - 1) dN(O,1)(y).
hi2
Since ~ a = (0, 0), this implies
aE~
E maE~
i n l l X - all 2 = 2 - ~ Ilall2P(W(al&))
aGfl2
oo
G r a y and Karnin (1982) provide some numerical evidence for their conjecture that
/3i, i = 1, 2, 3 are the only 4-stationary sets of centers of order 2 (up to 12-isometries).
4. Basic properties of optimal quantizers 57
But a formal proof of this conjecture has not yet been given. Figure 4.3 shows the
above stationary sets and the corresponding Voronoi tesselations. (Instead of f13, a
rotated version of fla is used.)
In three dimensions the product quantizer { - X / ~ , vf2-~} 3 can be improved upon.
For n = 8, Gray and Karnin (1982) give three different configurations that beat
the product quantizer. The authors report simulation results to show that these
quantizers are superior. Iyengar and Solomon (1983) provide similar results based on
numerical integration.
J
f min
acA
IIx-all" dR(x)) ~/~- ( jfminllx-bll"
~B dP(x))'/"l
(4.5)
<- - \j,(lminlxa---E A aH- min
bEB
IIx- bll rdP(x))'/r
<_dH(A, B).
58 I. General properties of the quantization for probability distributions
," "X
j ................................
-'/ "\,
Figure 4.3: 3- and 4-stationary sets of centers for P = N2(0,12) of order r = 2 and
Voronoi diagrams with respect to t h e / 2 - n o r m
4. Basic properties o f optimal quantizers 59
Let Dn,r(P) denote the set of n-optimal quantizing measures for P 6 Y)~ of order r.
4.21 T h e o r e m
Let Pr(Pk, P) -+ 0 for Pk, P 6 !3Rr and suppose Isupp(P)I > n, n 6 N.
(a) Let Qk 6 D~,r(Pk), k 6 N. Then the set of pr-cluster points of the sequence
(Qk)k>l is a nonempty subset of D~,r(P) and
(b) Let (~k 6 C~,r(Pk), k 6 N. Then the set of dg-cluster points of the sequence
((~k)k>l is a nonempty subset of Cn,~(P) and
The preceding theorem can be derived from a simple statement for arbitrary metric
spaces.
4.22 L e m m a
Let (M, d) be a metric space and let N C M be a nonempty subset.
(a) Let f : N -+ R+ be a lower semicontinuous function and suppose the level set
L(c) = {y 6 N : f ( y ) ~ c}
D = {y • N : f ( y ) = i g f(z)}
and let (Yk)k>l be a minimizing sequence in N for f , i.e., f(Yk) --~ inf f ( z ) .
-- z6N
d(yk, D) -+ O, k -4 oo.
Proof
(a) Let y • M be the limit of a convergent subsequence (Yk,)n_>l of (Yk)k_>l. Then
y • L(c) c N and
i g f ( z ) = lim f ( y k , ) >_ f(y).
(Yk,)n>_l of (Yk)k>l satisfying d(yk.,D) _> ¢ for every n _> 1 and limya~ E D, a
contradiction.
(b) Since
4.23 L e m m a
Let B C R d be nonempty and compact. Then
is dH-compact and
(Q e ~,,: Q(B) = 1}
is pr-compact.
Proof
The dH-compactness of {o~ C Rd: 1 ~ [o4 < n, ol C B} follows immediately from
the dH-continuity of the m a p ( a l , . . . , an) ~ { a l , . . . , a~}, ai e R d. Set Q = {Q e
f~n: Q(B) = 1). It is clear that Q is relatively pr-compact. To show that L~ is
pr-closed, let (Qk)k>l be a sequence in L~ and Q c !)~ such that Pr(Qk, Q) --~ O. Set
~k = supp(Qk) and let ~ C R d, 1 < lal < n, a C B be the limit of a dg-convergent
subsequence ( ~ ) j > l of (~k)k>l. Then for e > 0 there is a J0 E N such that
U cU
j>_jo aea
Since
limsupQkj ( U B ( a , ~ ) )
3-~°° aE~
Q(U
<_
aE~
B(a,c)),
P r o o f o f T h e o r e m 4.21
(a) To show that the assertion follows from L e m m a 4.22 applied to the metric space
(ffJtr, pr), N = ~,~, and f = pr(P, "), it suffices to verify that
is pr-compact for some c > V,~,r(P)1/r. Choose c such that V,~,~(p)Vr < c <
V,,_l,r(P) V~, where Vo,r(P) = c~ (cf. Theorem 4.12). For Q E L(c) and a -- supp(Q),
we have
Note first that f is dH-continuous. This follows from (4.5). Next, consider the level
set
L(c) = {a E N: f(o~) < c}
for Vn,~(P) < c < V,~-I,r(P). By Theorem 4.12 (or L e m m a 2.2 in case n = 1), there
is a compact set B c R g such that
L(c) c {~ ~ N: o~ c 13}.
k
1 inf ~ min [[Xi - a[[L
62 L General properties of the quantization for probability distributions
4.24 C o r o l l a r y ( C o n s i s t e n c y )
Let P C ff)tr.
Proof
Since Pr(Pa, P) -~ 0 a.s. by the Glivenko-Cantelli theorem for Pr, the assertions
follow from Theorem 4.21 and (4.4). []
Rates of convergence in empirical quantization can be found in Rhee and Talagrand
(1989a), Linder et al. (1994), Bartlett et al. (1998) and Graf and Luschgy (1999c).
Notes
Some material on the issue of this section is contained in Gersho and Gray (1992)
and Graf and Luschgy (1994a). Theorem 4.1 belongs to the folklore of this area.
Theorem 4.2 seems to be new. The characterization given in Lemma 4.4 is due to
Pollard (1982a) for the/2-norm and r --- 2. The Counterexample 4.5 is new. In case
the underlying norm is the/2-norm, the differentiability of ¢n,r (cf. Lemma 4.10) has
been proved by Pollard (1982b) for r = 2 and for arbitrary r a proof is contained
in Pages (1997). Theorem 4.16 is new. Examples 4.18-4.20 on the quantization
of spherical distributions and the d-dimensional standard normal distribution are
essentially taken from Gray and Karnin (1982), Iyengar and Solomon (1983), Flury
(1990), Tarpey et al. (1995), and Tarpey (1995). See also Tarpey (1998).
Let us mention that n-stationary sets of centers are sometimes called self-consistent
sets.
The central limit problem for n-optimal empirical centers of order r -- 2 with respect
to the/2-norm has been solved by Pollard (1982b) under a uniqueness condition for the
n-optimal population centers. A central limit result in a nonregular setting has been
given by Serinko and Babu (1992) for the univariate case, d = 1, and an extension
to non-i.i.d, sampling can be found in Serinko and Babu (1995) for d = 1. Hartigan
(1978) has conjectured the asymptotic distribution of the empirical quantization error
for a special population distribution where the uniqueness condition fails but has given
no proof.
Consistency results for a quantization (clustering) procedure based on a projection
pursuit technique can be found in Stute and Zhu (1995). Stability and consistency
4. Basic properties of optimal quantizers 63
results for a trimmed version of the quantization problem are contained in Cuesta-
Albertos et al. (1997) and a central limit theorem for trimmed quantizers has been
given by Garci£-Escudero et al. (1999).
Theorem 4.1 provides the basis for the famous Lloyd algorithm used to design quan-
tizers. To construct an approximation to an n-stationary set of centers for P of order
r the iterative method proceeds as follows:
Let ~ > 0 be given.
Step 1.
Choose an initial set a(0) of n points in R ~, calculate Co = E min IIX - aiI r.
aEa(o)
Step 2.
Determine a VoronoLpartition .A(i) with respect to a(i).
Step 3.
For each set A E ,4 (~) with P(A) > 0 choose a center a A for the conditional probability
P(.I A) of order r and set ~(i+1) (aA: A E ,A(~)}.
__
Step 4.
Calculate ei+l = E rain ]lX - a l l r. If (ci - e i + i ) < e e i then stop. Otherwise increase
aE~(i+i)
i by one and repeat Step 2,3 and 4.
This algorithm was independently discovered by Steinhaus (1956) and Lloyd in 1957
(see Lloyd 1982). It is often called Lloyd's method I, since Lloyd developed a second
type of algorithm (Method II) to design quantizers in the one-dimensioned case.
Many people rediscovered Lloyd's method later on. For a description of the history
of the algorithm we refer the reader to Gray and Neuhoff (1998). As it stands the
algorithm is hard to use in practice. But if P is a discrete probability with finite
support then the above algorithm can immediately be applied. The properties of
Lloyd's algorithm in the context of general deterministic descent algorithms have been
discussed in Sabin and Gray (1986). Recently Bouton and Pages (1997) thoroughly
investigated a constant step stochastic gradient descent algorithm for the design of
quantizers which is closely related to the Kohonen algorithms used in the theory of
neural networks.
Mentioning just these two algorithms for the design of quantizers is an arbitrary act
since there exists a vast amount of literature concerning this subject. For a survey
we refer the reader again to Gray and Neuhoff (1998).
64 I. General properties of the quantization for probability distributions
5.1 Uniqueness
The following theorem is due to Kieffer (1983). See also 2Yushkin (1984).
5.1 T h e o r e m ( U n i q u e n e s s )
I f P is strongly unimodal, then [Sn,r(P)[ = 1 for every n E iN, 1 < r < co.
Strongly unimodal distributions are unimodal about some mode a E /R, i.e., the A-
density h of P is increasing on ( - c o , a) and decreasing on (a, co). Example 4.11 (as
well as the subsequent Example 5.2) shows that the assertion of Theorem 5.1 may
fail for unimodal distributions.
In view of Lemma 4.7, the unique n-optimal set of centers of order r for a symmetric,
strongly unimodal distribution is symmetric. It is a surprising fact that symmetric,
2-stationary sets of centers may fail to be 2-optimal for symmetric, unimodal (abso-
lutely continuous) distributions. This is illustrated by the following example taken
from Abaya and Wise (1981). The same phenomenon occurs for truncated Cauchy
distributions, hyper-exponential distributions and for certain variance mixtures of
normal distributions. See Karlin (1982), Tarpey (1994) and Flury (1990).
5.2 Example
Let P = hA with
/ ¢-~"--~
3 - - 12 ~ fzl<l,
h(x) = ~ 7-1xl 1 _< Ixl 7,
| 72 ~ <
(0, Ixl > 7.
P is symmetric and unimodal about 0. Let n = 2 and r = 2. Then V2(P) = V a r X --
~s~ = 5.611... and it is easily verified that $2,2(P) = {a1,~2,~3) with
One obtains
47
EminIX - al 2 = EminIX - al 2 = -- = 2.611..
aeal aea2 18 '
5. Uniqueness and optimality in one dimension 65
3551
E m i n l X - a] 2 = - - = 2.739...
,e~3 1296
(Use the formula of Remark 4.6 (c).) Hence, C2,2(P) = {a1,(~2} and ½,2(P) = 4_71s,"
see Figure 5.1. We see that the symmetric, 2-stationary set (~3 is not 2-optimal. It is
the sharp peak of the density which causes asymmetric optimal sets of centers (and
prevents P from being strongly unimodal).
v v
a c (0, co). This follows from Theorem 4.1 since U W(ala u ( - a ) ) = [0, co).
aE~
(b) We have V,~,,.(P) <_ Vk,r(Q), n = 2k, and equality holds if and only if there exists
a symmetric set fl E C~,r(P). In this case, a E Ck,T(Q) implies a u ( - a ) E C,~,r(P).
In fact, for a C (0, co), I~1 -< k,
= f min tX - al r dQ(x).
, ] aCa
Choosing a E Ck,r(Q) gives V,~,r(P) <_ Vk,r(Q) and if V,~,~(P) = Vk,r(Q), then fi --
a U (-o 0 E C,~,r(P). Conversely, if j3 E C,~,r(P) is symmetric and ~ = ~ A (0, co),
66 I. General properties of the quantization for probabifity distributions
then
As yet only few examples of distributions P which are not strongly unimodal but
satisfy IS~,r(P)I = 1 are known. See Fort and Pages (1999) and for the particular
simple case n -- 2 and r = 2, Yamamoto and Shinozaki (1999).
w(all ) =
W(ai[a) = [mi-l,mi], 2 < i < n - 1,
w(a.l ) = Ira._,, oo).
We assume that P is continuous so that the boundaries of the Voronoi regions have
P-measure zero. Let F denote the distribution function of P. By Lemma 2.5, we
have a c S~,r(P) if and only if P ( W ( a i I a ) ) > 0 for every i and
al ml
--00 ~l
f
mn-1 an
2F(al) = F(rn,),
(5.2) 2F(ai) = F(rai) + F ( m , - l ) , 2 < i < n - 1,
2F(a,~) = 1 + F(m,~_t).
5. Uniqueness and optimality in one dimension 67
One obtains, for instance, {-F-l(¼), F -1 (3) } • $2,1 (P) for symmetric probabilities
p . ( F - l ( y ) = inf{x e In : F ( x ) > y}, y • (0,1).)
In case r = 2, (5.1) takes the form
alF(ml) = / xdP(x),
--00
mi
a~[1 - F ( m , _ l l ] = / xdP(x).
~ t * - - !.
both cases it is enough to solve the first (or last) k equations of (5.1).
A remarkable property appears for the exponential distribution. It is the content of
part (a) of the following proposition.
5.4 Proposition
(a) Let P = E(c) and let a = { a l , . . . , as} • C~,r(P) with al < . . . < a,~. Then
v~,r(P) = a r.
(b) Let P = D E ( c ) and let a = { a l , . . . , a,,} C C,,,~(P) with al < ... < am. Then
V,~,r(P) = ak+l,
~ if n = 2k,
ak+2/2c
V~,~(P) = rc r / x r - l e -x dx, i f n = 2k + 1.
0
Proof
We may assume without loss of generality that c -- 1. By Theorem 5.1, c~ is the
unique n-stationary set of order r.
68 L General properties of the quantization for probability distributions
(a) We have
Vn,~(P) = E min IX - a, F
l<_i<_n
ml u-i rni
= f o [ x - a l l r e - ~ d x + i ~ 2f ~ _ ._ 'x-ai]~e-~dx
+ j I x - a, lre -xdx
ran--1
al O4
n--1 rnri oo
i=1
ai an
Therefore
q- r ~ / ( x - ai)r-le-X dx q- r / ( x - an)r-le-X dx -
i:1 o4 an
al ai
V,,,r(P) = a r1.
5. Uniqueness and optimality in one dimension 69
(b) ff n = 2k, the assertion follows from (a) and R e m a r k 5.3. Now let n = 2k + 1.
Then
mk+l k mk+i
+ i Ix - anlre-x dx
mk+l ]g~-I ak+i
- i'"-'"+z i
0 i~2m.k+i_ 1
k rn~÷i oo
q-i~2 ] (x-ak+i)re-xdxq- f
= ak+i an
5.5 E x a m p l e ( U n i f o r m d i s t r i b u t i o n )
12i-a
Let P -- U([0, 1]) and let a ----L-~-n : i = 1 , . . . , n}. By Example 4.17, (~ E C~,r(P)
for every r _> 1 and
1
V,~,,(P) - nr(1 + r)2"
Since P is strongly unimodal, a is the unique set of n - o p t i m a l centers of order r.
5.6 E x a m p l e ( D o u b l e e x p o n e n t i a l d i s t r i b u t i o n )
Let P = D E ( l ) , t h a t is, the A-density h of P is given by h(x) = ½ e x p ( - I x ] ) . Then
F(x) = ½exp(x) for x _< 0 and P is symmetric and strongly unimodal. Let r -- 1 and
note that V~(P) = E[X i = 1. For this distribution there exists a closed-form solution
of (5.2). Let Yi = exp(ai/2). For n -= 2k, the first k equations of (5.2) take the form
2yl = Y2,
2yi = yi+l + Yi-1, 2 < i < k - 1,
2y2 = 1 + Yk-lYk (Yo = 0).
The solution of this difference equation is given by Yi = iyl, 1 < i < k, with Yl =
(k 2 + k) -1/2. One obtains
i
a~ ----2 l o g ( ~ ) , 1 < i < k,
, vf~+ k,
ai = 2 1 O g t n ~ ] - - ~ ) , k+l < i < n.
2yl = Y2,
70 I. General properties o f the quantization for probability distributions
ai = 2 log(~--~-]-),
i 1 < i < k,
k+l
ai = 21Og( n + l _ ~ ), k + 2 < i < n.
In both cases {a~,... , a,~} is the unique set of n-optimal centers for P of order 1. For
the quantization errror we have by Proposition 5.4
Then { a l , . . . , am} is the unique set of n-optimal centers for P of order r = 1. This is
a consequence of Example 5.6 and Remark 5.3 (and the scaling property of C~,I(P)
given in Lemma 3.2 (a)). Furthermore, by Proposition 5.4
1
V,~,I(P) = clog(1 + n).
The equations (5.2) and (5.3) have been solved using MATHEMATICA for various
strongly unimodal distributions. The numerical solutions are given in the Tables
5.1-5.12. (In case P -- E ( 1 / l o g 2 ) , r = 1 and P = D E ( l ) , r ---- 1, we obtained
coincidence with the exact solutions given in Examples 5.6 and 5.7 up to 5 decimal
places.) The behaviour of V~,r(P) reflects the value of the r-th quantization coefficient
of P introduced in Chapter II.
Notes
In the case of smooth densities and r -- 2, Theorem 5.1 is due to Fleischer (1964),
who provided the first uniqueness result. A proof of Theorem 5.1 (for r -- 2) based
5. Uniqueness and optimality in one dimension 71
on the "mountain pass theorem" has been given by Lamberton and Pag6s (1996).
See Cohort (1997) for a detailed exposition. The property of the n-th quantization
error for the exponential distribution described in Proposition 5.4 is new. In the
non-quantization setting n = 1 it has been noticed by Gilat (1988). Example 5.6 is
essentially contained in Williams (1967). However, Williams intends to find n-optimal
centers of order r = 2, but he deals with the equations (5.2) which correspond to r = 1.
Example 5.7 is new. Tables of n-optimal sets of centers of order r = 2 for the normal
distribution N(0,1), the double exponential distribution D E ( 1 / x / ~ ) , the exponential
distribution E(1), and the Rayleigh distribution W(x/~, 2) (cf. Tables 5.1, 5.3-5.6)
can also be found in Cox (1957), Max (1960), Lloyd (1982), Fang and Wang (1994),
Adams and Giesler (1978), and Pearlman and Senge (1979).
n 1 2 3 4 5 6 7 8
al 0 -0.7979 -1.2240 -1.5104 -1.7241 -1.8936 -2.0334 -2.1520
a2 0.7979 0 -0.4528 -0.7646 -1.0001 -1.1882 -1.3439
aa 1.2240 0.4528 0 -0.3177 -0.5606 -0.7560
a4 1.5104 0.7646 0.3177 0 -0.2451
a~ 1.7241 1.0001 0.5606 0.2451
a6 1.8936 1.1882 0.7560
a7 2.0334 1.3439
as 2.1520
V~,2 1 0.3634 0.1902 0.1175 0.0799 0.0580 0.0440 0.0345
Table 5.1: n-optimal centers and n-th quantization error for the normal distribution
N(0,1) of order r - - 2
n 1 2 3 4 5 6 7 8
at 0 -0.7643 -1.2621 -1.6382 -1.9422 -2.1978 -2.4185 -2.6129
a2 0.7643 0 -0.4569 -0.7947 -1.0671 -1.2971 -1.4971
aa 1.2621 0.4569 0 -0.3270 -0.5862 -0.8033
a4 1.6382 0.7947 0.3270 0 -0.2548
a5 1.9422 1.0671 0.5862 0.2548
a6 2.1978 1.2971 0.8033
a7 2.4185 1.4971
a8 2.6129
V,,2 1 0.4158 0.2307 0.1472 0.1022 0.0752 0.0576 0.0456
n 1 2 3 4 5 6 7 8
al 0 -0.7071 -1.4142 -1,8340 -2.2537 -2.5535 -2.8533 -3.0867
a2 0.7071 0 -0.4198 -0.8395 -1.1393 -1.4391 -1.6725
aa 1.4142 0.4198 0 -0.2998 -0.5996 -0.8330
a4 1.8340 0.8395 0.2998 0 -0.2334
a5 2.2537 1.1393 0.5996 0.2334
a6 2.5535 1.4391 0.8330
a7 2.8533 1.6725
as 3.0867
V,~,2 1 0.5000 0.2642 0.1762 0.1198 0.0899 0.0681 0.0545
n 1 2 3 4 5 6 7 8
al 1 0.5936 0.4240 0.3301 0.2704 0.2290 0.1986 0.1753
a2 2.5936 1.6112 1.1780 0.9305 0.7697 0.6565 0.5725
a3 3.6112 2.3652 1.7784 1.4298 1.1972 1.0305
a~ 4.3652 2.9657 2.2777 1.8574 1.5712
a~ 4.9657 3.4650 2.7053 2.2313
a6 5.4650 3.8926 3.0792
a7 5.8926 4.2665
as 6.2665
Vn,2 1 0.3524 0.1797 0.1090 0.0731 0.0524 0.0394 0.0307
n 1 2 3 4 5 6 7 8
a1 1.4142 0.9271 0.7108 0.5847 0.5009 0.4407 0.3950 0.3590
a2 2.7353 1.8420 1.4269 1.1798 1.0136 0.8932 0.8014
a3 3.5501 2.4815 1.9577 1.6363 1.4157 1.2537
a4 4.1445 2.9772 2.3842 2.0119 1.7523
a5 4.6135 3.3829 2.7420 2.3325
a6 5.0012 3.7266 3.0505
a7 5.3318 4.0248
a8 5.6199
Vn,2 1 0.3565 0.1836 0.1120 0.0755 0.0544 0.0410 0.0321
n 1 2 3 4 5 6 7 8
al 1.9131 1.2657 0.9772 0.8079 0.6947 0.6130 0.5508 0.5016
a2 2.9313 2.1140 1.7010 1.4421 1.2615 1.1270 1.0222
a3 3.4848 2.6325 2.1738 1.8745 1.6599 1.4966
a4 3.8604 3.0025 2.5237 2.2032 1.9688
a5 4.1425 3.2882 2.8001 2.4675
a6 4.3670 3.5197 3.0277
a7 4.5529 3.7136
a8 4.7111
V~,2 1 0.3408 0.1724 0.1042 0.0698 0.0501 0.0377 0.0294
n 1 2 3 4 5 6 7 8
al 0 -0.8453 -1.2898 -1.5864 -1.8067 -1.9810 -2.1244 -2.2460
a2 0.8453 0 -0.4734 -0.7974 -1.0412 -1.2353 -1.3959
a3 1.2898 0.4734 0 -0.3303 -0.5819 -0.7838
a~ 1.5864 0.7974 0.3303 0 -0.2540
a5 1.8067 1.0412 0.5819 0.2540
a6 1.9810 1.2353 0.7838
a7 2.1244 1.3959
as 2.2460
Vuj 1 0.5931 0.4258 0.3331 0.2739 0.2327 0.2024 0.1791
n 1 2 3 4 5 6 7 8
al 0 -0.7925 -1.2716 -1.6218 -1.9000 -2.1314 -2.3298 -2.5037
a2 0.7925 0 -0.4609 -0.7925 -1.0542 -1.2716 -1.4581
a3 1.2716 0.4609 0 -0.3265 -0.5817 -0.7925
a4 1.6218 0.7925 0.3265 0 -0.2531
a5 1.9000 1.0542 0.5817 0.2531
a~ 2.1314 1.2716 0.7925
aT 2.3298 1.4581
as 2.5037
V~,I 1 0.6226 0.4569 0.3620 0.3001 0.2564 0.2239 0.1988
n 1 2 3 4 5 6 7 8
al 0 -0.6931 -1.3863 -1.7918 -2.1972 -2.4849 -2.7726 -2.9957
a2 0.6931 0 -0.4055 -0.8109 -1.0986 -1.3863 -1.6094
aa 1.3863 0.4055 0 -0.2877 -0.5754 -0.7985
a4 1.7918 0.8109 0.2877 0 -0.2231
a5 2.1972 1.0986 0.5754 0.2231
a~ 2.4849 1.3863 0.7985
aT 2.7726 1.6094
a8 2.9957
V,~,I 1 0.6931 0.5000 0.4055 0.3333 0.2877 0.2500 0.2231
n 1 2 3 4 5 6 7 8
al 1 0.5850 0.4150 0.3219 0.2630 0.2224 0.1926 0.1699
a~ 2.5850 1.5850 1.1520 0.9069 0.7485 0.6374 0.5552
a3 3.5850 2.3219 1.7370 1.3923 1.1635 1
a4 4.3219 2.9069 2.2224 1.8074 1.5261
a5 4.9069 3.3923 2.6374 2.1699
a6 5.3923 3.8074 3
aT 5.8074 4.1699
as 6.1699
V,,1 1 0.5850 0.4150 0.3219 0.2630 0.2224 0.1926 0.1699
n 1 2 3 4 5 6 7 8
al 1.5958 1.0650 0.8299 0.6922 0.6001 0.5344 0.4826 0.4423
a2 2.9008 1.9829 1.5572 1.3029 1.1310 1.0058 0.9099
a3 3.6870 2.6090 2.0824 1.7586 1.5355 1.3709
a4 4.2542 3.0882 2.4987 2.1281 1.8688
a5 4.6988 3.4774 2.8449 2.4405
a6 5.0646 3.8053 3.1416
a7 5.3754 4.0887
a8 5.6457
Vn,1 1 0.5883 0.4195 0.3265 0.2675 0.2266 0.1966 0.1737
n 1 2 3 4 5 6 7 8
at 2.2501 1.5347 1.2121 1.0204 0.8906 0.7958 0.7229 0.6647
a2 3.3040 2.4268 1.9835 1.7041 1.5079 1.3608 1.2455
a3 3.8697 2.9631 2.4770 2.1592 1.9304 1.7555
a4 4.2516 3.3430 2.8389 2.5014 2.2539
a5 4.5375 3.6352 3.1233 2.7748
a6 4.7648 3.8714 3.3567
a7 4.9527 4.0688
58 5.1124
1 0.5778 0.4091 0.3173 0.2594 0.2194 0.1902 0.1678
In this section we derive the exact asymptotic first order behaviour of V,~,,.(P) (up to
constants) as n --+ c~ in case P is not singular with respect to Aa.
Let X be a Rd-valued random variable with distribution P, let [I II denote any norm
oo
on R d, and let 1 _< r < oo. Lemma 6.1 reflects the fact that [.J ~-n is a dense subset
of the Banach space Lr(P, Ra).
6.1 L e m m a
Z f ~ l l X l l ~ < oo, then
lim V, r(P) = 0.
78 H. Asymptotic quantization for nonsingular probability distributions
Proof
Let {aba2, a3,...} be a countable dense subset of ]R~ with al = 0. For ~ > 0,
{B(ak, (e/2) l/r) : k E N} is a covering of ~d. Therefore, one can find a Borel measur-
able partition {Ak : k C N} of R d satisfying Ak C B(ak, (¢/2) l/r) for every k. Choose
n E N such that
7b
and let fn = ~ akl&. Then fn E 9rn and
k =l
[]
The following theorem in its present general form is stated in Bucklew and Wise
(1982) for the/2-norm. Under some additional assumptions the result is due to Zador
(1963, 1982) (who is also dealing with the/a-norm). See also Fejes T6th (1959) for a
special case.
For a Borel measurable function h : R d + R and 0 < p < oo let
The proof is given below. For singular distributions, (6.3) only yields Vn,r(P) =
o(n -r/d) provided the above moment condition holds. An investigation of the exact
order of Vn,r(P) for several classes of singular (continuous) distributions P is contained
in Chapter III.
6. Asymptotics for the quantization error 79
6.3 R e m a r k
(a) The moment condition EHXII r+~ < c~ ensures that the limit in (6.3) is finite.
In fact, h c Ll(Ad), h >_ O, and filxilr+~h(x) dx < ec for some 5 > 0, implies
(r+~)d P = -d.~r
h C La/(a+r)(Ad). To see this, let s = d-~r, t = 4--~7--, d+r Then
- and q = --7-"
f h(x) s dx < c~
B(0,1)
/
B(0,1) c
h(x)S dx = f
B(0,1) c
h(x)Sllxlltllxlrt dx
B(O,1) c B(0,1) c
h(z) -- 2~(1+~)n2+
1 ~ if x E [2~, 2~+1), n C N.
but
2n-i-t
f ¢¢ l / [x[r+~ dx
~=1 2n
2n(~+e)2~
-> 2n(i+~)n2+~
2n6
7?.2+ r (X:)
n,=l
6.4 Example
Let Xk = 3 • 2k-1 and
C
/P(X =xk) - 2~klog2k, k > 2
3rc~ ~ k 1
EXr = ')---~- 1~,1o~2k < co,
k=2
EXr+~ = 3r+~2
r ~ k ,s~o~2
2k~ k = co, (f > 0.
k=2
and hence
oo
E min IX - al r = m i n l x k - - al r¸ c
k=2 aea 2krk log2 k
c 1
kEI
oo
c 1
k=,~+2 k log2 k
c f __1 dx
>- --7
2 x log2 x
n+2
C
2 r log(n + 2)
This gives
Here the order of convergence to zero of V,~,r(X) is (logn) -1. In fact, for f~ =
{x2,... , x=+l} one obtains
and hence
It follows that
C
- - < lim inf log nV,~ ,.(X)
3re
<_ limsuplognVn,r(X) <_ -~-.
In case Pa # 0 and EIIXII "+' < oo for some (f > 0, the number
We shall need the following lemmas for the proof of Theorem 6.2.
6.5 L e m m a
Let P = sP1 + (1 - s)P2, 0 < S < 1, f Ilxllr dP~(x) < c<). Suppose
.'/dY.,.(P1) -~ c e [0, ~ ] a s . -+ ~.
Then
(a)
Proof
(a) The first inequality follows immediately from L e m m a 4.14 (a). For 0 < ~ < 1, let
nl = nl(n,~) = [(1 - ¢ ) n ] and n2 = n2(n,¢) = [¢n]. ([x] denotes the integer part of
x E R.) T h e n by L e m m a 4.14 (b)
Proof
First, assume P((1, oo)) = 1. We use a r a n d o m quantizer argument. Consider i.i.d.
Pareto-distributed r a n d o m variables Y1,... , Y~ independent of X with distribution
function
1 - y-(~/r), y > 1
G(y) = O, y <_ l.
Set b = 6/r. T h e n
x--1
= r f [1 - (x - t) -b + (x + t ) - b ] n t r - 1 dt
0
oG
Since
and thus
one gets
OO
Furthermore,
CO
This implies
Proof
For p > 1, let I[ [[p denote the /p-norm on R ~. Recall that all norms on R d are
equivalent. Hence
where the constants cx, c2, ca > 0 depend only on 5 and r. Furthermore,
d
for a constant c4 > 0 depending only on r, 6 and [[ [1. This proves the assertion. []
We further need an elementary lemma.
6.8 L e m m a
For m E N and numbers 8i > O, let
and
s~/(d+r)
ti-- m , l<i<m.
E s~l(d+~)
j=l
satisfies
Proof
By HSlder's inequality (for exponents less than 1) with p = d/(d + r) and q = - d / r ,
one obtains
= F(tl,...,t,~)
(and sufficient) that ~ vi = 1 and si = cv~/p, 1 < i < m, for some constant c > 0.
i=1
This implies ti = v~ for every i. []
P r o o f o f T h e o r e m 6.2
The proof is given by a sequence of steps from the uniform case to the general case.
S t e p 1. Let P -- U([0,1]d). Let m , n E N, m < n and let k = k ( n , m ) -- [(~)l/d].
Choose a tesselation of the unit cube [0,1] d consisting of k d translates C1,. • • , C ~ of
kd
the cube [0, ~]d. Then P = k-d~_, U(C~). By Lemma 4.14 (b) and the translation
i=l
86 H. Asymptotic quantization for nonsingular probability distributions
= k-d E Mm,~(Ci)k-r
= k-rM,~,~([0,11 d)
= k-rVm,~(P)
and hence
j=l
This implies
= Q~([0, 1]a)llhlld/(d+r).
Then
= E si f be~,u-y,miIIx
n - b[I~ dx l -a
i=1 Ci,e
This implies
Since 0 < ~ < I/2 is arbitrary and by Lemma 6.8, one obtains
= Q,([o, 1]~)llhlla/(d+,).
Hence, (6.3) holds in this setting.
S t e p 3. Let P be absolutely continuous with respect to ,kd and assume that P has a
compact support. Let supp(P) C C for some closed cube C whose edges are parallel
to the coordinate axes with edge-length l(C) = l. For k E N, consider a tesselation of
C consisting of k d closed cubes C b . . . , Ckd of common edge-length l(Ci) = l/k. Set
k~
P~ = ~ P(c~)u(c~),
i=1
dad - = ~ (cd
Since
this implies
Furthermore, by Step 2
For n • N and 0 < e < 1, let nl = nl(n,e) = [(1 -e)n] and n2 = n2(n,e) = [(en)l/d] d.
Consider a tesselation of C consisting of n2 closed cubes of edge-length I/n~/d and
6. Asymptotics for the quantization error 89
let 7 = 7(n2) denote the set of its n2 midpoints, that is, 7 corresponds to the cube
n2-quantizer for C. Then
for some constant cl > 0 depending only on the underlying norm. Let ~(nl, k) •
C,~l,r(Pk) and 5 = 5(n, k,e) = c~(nl, k) U 7(n2). Then 151 < n and
n ~14 f
J
min
ae6(n,k,e)
Ilx--allrdPk(x) - f min
J ae~(u,k,e)
IIx-aHrdP(x)
- j a~5(n,k,~)
This implies
< n r/d jf
-
rain
¢,e,~(m,~)
[ i x - ~lrdPk(x) +c~llhk-- hll~
= n~/dv.~,~(Pk) + c211hk - hll~, k • N, n > m a x ~ 1, ~ }
- ~e 1-e
To prove the converse estimate, let j3(nl) • Cm,r(P ) and ~- = 7-(n, e) = ~(nl)UT(n2).
Then H -< n and as above
nr/d f f
_<c2IIh~ hL11,
J a~(,~,~) J ~(,~,~)
k E N , n>_max e , ~ _ e .
1}
90 H. Asymptotic quantization for nonsingular probability distributions
This implies
Therefore
(1 - ~)-'`/"lim~,~;/%,,'`(P) > Q'`([0, lld)llh~ll~(d÷'`) -- c:llh~ -- hill, k • N.
Letting k tend to infinity and then letting c tend to zero yields
lira inf n'lav,~,'` ( P) = lim inf n~ldvm,, (P)
(6.11) ~-~oo
> Q'`([o, 1]d)llhlla/(a+'`).
Hence, (6.3) holds in this setting.
S t e p 4. Let P be singular with repect to /~d and assume that P has a compact
support. Let supp(P) C C for some open cube C whose edges are parallel to the
coordinate axes with edge-length l(C) = l. For any ~ > 0, there is an open set A C C
such that P ( A ) = 1 and AS(A) <_ ~. Moreover, there exists a countable partition
{Ci : i • N} of A consisting of half-open cubes C~ ¢ ~ with edges parallel to the
coordinate axes (cf. Cohn, 1980, Lemma 1.4.2). Choose m ~ 51 such that
cl
min I I z - a l l < - -
aea(,~) -- (n12) lid'
xEC, n>2
--
for some constant c > 0 depending only on the underlying norm. This implies
and hence
limsupnr/aV,~,r(P) < 2cr2r/%r/d.
~ - + oo
S t e p 6. Now suppose EHX[[ r+6 < (x) for some 6 > 0. Set h = dPa/dA d. For k E N,
let C~ = [ - k , k] a. Let 0 < e < 1. Using the decomposition P = P(Ck)P(']Ck) +
P(C~)P(.IC~) , it follows from (6.13) and L e m m a 6.5 (a) that
lim sup n~/dv,,,r(P) < (1 -- e)-~/dQ~([0, 1]d) ]]h lck ]Id/(a+~)
r~--+oo
By Corollary 6.7,
lim f ]lxl[~+~dP(x) = O.
k-~oo , ]
c~
Therefore, letting k tend to infinity in (6.15) and then letting ~ tend to zero yields
Notes
The approximation (6.3) of the n-th quantization error occured apparently for the
first time in Panter and Dite (1951) for univariate absolutely continuous distributions
and r = 2.
The proof of Theorem 6.2 for distributions with compact support is a simplified ver-
sion (and an extension to arbitrary norms) of Bucklew and Wise (1982). The crucial
point in their treatment of distributions with unbounded support is a "compander" re-
sult whose proof is not complete (cf. Linder, 1991). As shown above, the unbounded
case can be resolved via the Pierce-Lemma 6.6 and its generalization to arbitrary
dimensions (Corollary 6.7).
The asymptotics for empirical versions of the quantization problem (or related lo-
cation problems) when both the level n and the sample size tend to infinity were
studied in Hochbaum and Steele (1982), Wong (1984), Zemel (1985), Rhee and Ta-
lagrand (1989a), McCivney and Yukich (1997), Yukich (1998), PStzelberger (1998b),
and Graf and Luschgy (1999c).
7. Asymptotically optimal quantizers 93
provided Pa ~ 0 and EIIXL[ ~+~ < c~ for some 5 > 0. Here Q~(P) denotes the r-th
quantization coefficient of P as defined in (6.4). Notice that if (an)n>1 is asymptoti-
cally n-optimal of order r and {Aa : a c an} denotes a Voronoi partition o f / R d with
respect to an, then (fn)n>~ with fn = ~ alA~ E J:~ is an asymptotically n-optimal
o*E~n
quantizer of order r, that is
(7.2) lim
n--+OO
n~/dEIIx - fn(X)ll ~ -- Q~(P).
7.1 L e m m a
Let P = s~p~, s~ > O, ~ s~ = l, f llxllr+~dP~(x) < co for some S > O, P~,, ¢ O for
i=1 i=1
every i.
(b) Suppose
/ ,n \ Cd+~)ld
(7.3) Q~(P)= (i~=l(siQ~(Pi))d/(d+~)) .
Proof
(a) and (b). By Lemma 4.14(5)
This implies
[]
For P with P~ = h)t d # 0 and h E Ld/d(d+r)(Ad), define
hd/(d+r)
(7.5) h~ = f hd/(d+r) dad, Pr = h~A~.
7.2 L e m m a
Suppose P~ ~ 0 and EIIXI] r+~ < c~ for some ~ > O. Let {A1,... ,Am} be a P-
packing in j ~ d (i.e., P(Ai n A3) = 0 for i # j) such that P~(Ai) > 0 for every i and
P ( O A,) = 1. Then for the mixture P = ~ P(Ai)P(.]Ai), (7.3) is satisfied and (7.4)
i~-I i=-i
takes the form ti = Pr(Ai) for every i.
7. Asymptotically optimal quantizers 95
Proof
Let P~ = hA d. We have P('[Ai)a = hlA~Ad/P(Ai) ~ 0 and thus
= Q~(P)Pr(Ai)(d+r)/d/P(Ai).
Therefore
i=1 i=1
7.3 C o r o l l a r y ( I n v a r i a n t d i s t r i b u t i o n s )
Let G be a finite group of bijective isometries on IRa. Suppose further P is G-
invariant, Pa ~ 0 and EIIXII ~+~ < co for some 5 > O. Suppose there exists A E B(1Ra)
such that {T(A) : T E G} is a P-packing in 1Ra and P ( ~J T(A)) = 1. Let n i =
TEG
n,(n) = [n/[G}] and a~ • Cm,~(P(.IA)). Then ( (.J T(a~))n> , is an asymptotically
TcG
n-optimal set of centers for P of order r.
Proof
For T E G we have
p = p T = pT + pT,
t~CO~ n
7.5 T h e o r e m
Suppose P is absolutely continuous with respect to Ad and EI]XJl r+6 < co for some
5 > O. Let (o~,~),~>_t be an asymptotically n-optimal set of centers for P of order r.
Then
1
n co,
aC~n
Proof
The proof relies on Theorem 6.2 and the equality case of HSlder's inequality (cf.
Lemma 6.8). Since certainly lirn~_~oo Io~,~l/n = 1, we may assume without loss of
generality that Ic~nl = n for every n > 1. Let
1 Z(ia"
Izn= n
aEan
7. Asymptotically optimal quantizers 97
#n -+ # vaguely
p~(A) --+ # ( A )
a n d hence
# ~ ( A ~) -+ 1 - # ( A ) .
P , ( B ~ ) > P , ( A ~ ) - ~, i = 1, 2.
Say h'~l -- k. T h e n we o b t a i n
2 P
f/ rain IIx - all" dPi(x)
m i n IIx - all" dP(x) = E s i j / aean
J aEan i=1
2
> E
--
s,
J
f a Emin
~nUTi
IIx - ~11" dP,(x)
i=1 Bi
2
We have
Qr(P(.IB~))P(Bi) = Q~(p)pr(Bi)(d+r)/~
>_Qr(P)(Pr(Ai) - e) (d+~)/d.
Since 0 < ~ _< mini=l,2 Pr(A~) is a r b i t r a r y and by Lemmas 6.8 and 7.2, one obtains
2
Qr(P) >- E s'v[r/dQ~(P)Pr(Ai)(d+r)/d/P(Ai)
i=l
2
= ~ s, Qr(Pi)v~ rid
/=1
2
Using Lemmas 6.8 and 7.2 again, this yields vi = Pr(Ai), i -- 1, 2. Thus #(A) = Pr(A).
If P(A) = 0, then omit the first summands in the above considerations. One gets
v2 = P~(A2) = 1 and thus we have #(A) = Pr(A) = 0. If P(A) = 1, then omit the
second summands. One obtains vl = Pr(A1) --- 1.
Now we have p(A) = P~(A) for every (bounded) d-dimensionai interval A = (b, c]
with #(OA) = 0. This implies ~ = P~. []
C o m p u t a t i o n s of Pr can be found in Tables 7.1. and 7.2.
P p~
d-dimensional Normal
Nd(O, E ) Nd(O, - ~ E )
~ positive definite
Uniform
U(B) U(B)
B E B(IRa) bounded,Aa(B)> 0
d
P (~P)r
1
Logistic
d
L(a) ® aL(a,
i
Double exponential
d
DE(a) N~ DE(~(~+O~
1
Double Gamma
d
Dr(a, b) ® Dr(~(~r), d--~-
~+~ J
1
Hyper-exponential
d
HE(a, b) N HE(a( lib, b)
1
Exponential
d a d+r
E (a ) ® E ( ~ ~-J~-~~2d)
1
Gamma
d
r(a, b) ® r(~(d+r
--~,
b~+r~
d ) ' d+r /
1
Weibull
d
W(a,b) ¢¢~p(~:d+r~llb
"o'~--',~L d I
b~+r b)
,-y~-,
1
Pareto
d
P(a,b),bd>r ® P ( a , bd-~
d÷r /
1
For univariate distributions, the necessary condition of Theorem 7.5 can be turned
around and used to construct asymptotically n-optimal sets of centers.
Let d -- 1 and let P -- hA such that I = (h > 0} is an open (possibly unbounded)
interval and h is continuous on I. Suppose EIXI r+a < oo for some 5 > 0. For
n E /N, let ai denote the ~ - q u a n t i l e of Pr, 1 < i < n, and let rn~ = (ai ÷ ai+i)/2,
100 H. Asymptotic quantization for nonsingular probability distributions
E min
l<__i<_n
I X - a i ] r~- f(al-z)~h(x)dx+ i=2 mi-1
(ai-x)rh(x)dz
--oQ
~-i t'mi
+ ~ j~ (x- ~,)~h(~)dx+
f~ o(~_ ~.)~h(~)~.
I
-- (1 + r)2 r+lh(u')(ai - °4-1)~+I'
a~<vi+l<mi<ai+l,l<i<n-1,
1 2i- 1 2(i- 1 ) - 1
n 2n 2n
04
dx = hr(w,)(ai - ai-1),
= a4-1
f hr(~)
ai-1 < wi < ai, 2 < i < n.
This gives
~1 oo
+ (1 + ~)2~ -h~(w~)
- - - ~ ( ~ - ~-*)
i " h(v,) (ai - ai-1)].
+ ~ ~ h~(~,)~
i-=2
7. Asymptotically optimal quantizers 101
Note that a~, u~, vi, w~ depend on n. Under suitable assumptions on the density h, we
have
~l i- m
~~ h(yi)
h~(wi)r'(ai -- ai-1) =
/ - ( h~ dA = Ilhlll/(t+~),
i:2
yi C {ui, v~}, and the two remaining summands are of order o(n-~). One obtains
1
(7.7) lim nrE min IX - - Ilhlll/c,+r) = Qr(P).
~-~oo 1<~<~ (1 + r)2 ~
For the latter equation see Example 5.5. Thus ({a~,... , an)})~>, is asymptotically n-
optimal for P of order r. The same result holds for the --~f-quantiles of Pr, 1 < i < n.
Sufficient conditions for the above result to be valid can be found in Cambanis and
Gerr (1983) (with a gap in the proof), Linder (1991), PStzelberger and Felsenstein
(1994).
2i n+ i
ai = (1 + r)clog(~--~-), 1 < i < T '
n+l n+l
ai = (l + r)cl°g(2n + 2 - 2i)' 2 < i < n.
In case n odd and r --- 1, they coincide with the n-optimal set of centers for P of
order 1 (cf. Example 5.6).
The error of quantizers of the above type for various distributions is evaluated in
Tables 7.3 and 7.4. These values should be compared with the n-th quantization
error given in Tables 5.1 - 5.12.
102 H. Asymptotic quantization for nonsingular probability distributions
P\n 2 3 4 5 6 7 8
N(0,1) 0.5006 0.2466 0.1456 0.0960 0.0680 0.0506 0.0392
0.3661 0.1913 0.1180 0.0802 0.0581 0.0441 0.0346
L(~) 0.7332 0.3373 0.1978 0.1304 0.0925 0.0690 0.0535
0.4188 0.2319 0.1478 0.1026 0.0754 0.0578 0.0457
DE(,-~) 1.0826 0.3657 0.2418 0.1508 0.1112 0.0811 0.0641
0.5234 0.2648 0.1782 0.1200 0.0903 0.0682 0.0546
E(1) 0.4836 0.2223 0.1282 0.0834 0.0586 0.0434 0.0335
0.6112 0.2763 0.1551 0.0987 0.0680 0.0497 0.0378
r(~,2) 0.4625 0.2232 0.1311 0.0861 0.0609 0.0453 0.0350
0.4761 0.2269 0.1323 0.0866 0.0610 0.0453 0.0350
W(~,2) 0.4214 0.2040 0.1196 0.0784 0.0554 0.0411 0.0318
0.3896 0.1935 0.1151 0.0762 0.0541 0.0404 0.0313
P\n 2 3 4 5 6 7 8
N(0,2 ) 0.6512 0.4613 0.3565 0.2904 0.2450 0.2119 0.1866
0.5965 0.4279 0.3345 0.2748 0.2334 0.2029 0.1795
l
L(21-ET-~2) 0.7284 0.5151 0.3987 0.3253 0.2749 0.2380 0.2099
0.6226 0.4569 0.3620 0.3001 0.2564 0.2239 0.1988
DE(l) 0.8863 0.5556 0,4504 0.3600 0.3091 0.2653 0.2358
0.6998 0.5000 0.4063 0.3333 0.2879 0.2500 0.2232
E(lo-~) 0.6497 0.4459 0.3402 0.2752 0.2310 0.1991 0.1749
0.6890 0.4694 0.3553 0.2856 0.2387 0.2050 0.1796
F(a, 2) 0.6396 0.4476 0.3443 0.2797 0.2356 0.2035 0.1791
a---- 0.9508... 0.6351 0.4434 0.3409 0.2771 0.2335 0.2017 0.1776
W(a, 2) 0.6177 0.4324 0.3323 0.2698 0.2270 0.1960 0.1724
a - - 2.7027... 0.5990 0.4222 0.3260 0.2655 0.2240 0.1937 0.1706
Table 7.4: r -- 1, VI(P) = 1. Quantization error for ~nl-quantiles (first line) and
~-@f+l-quantiles (second line) of P1, 1 < i < n
7.7 L e m m a
Let
d
8 = {(.,, ,vd) (0,co)d: IIv, 1}
i=1
j=l
satisfies
d
F(tl,... te) = min F ( v l , . . . , ve) = d I I s~/~"
) (Vl ,... ,Vd)EB i=1
Proof
By the arithmetic-geometric mean inequality one obtains
d
[..-~ r\ l/d
F(Yi)'--)Yd) ~-- d t H s i v ; )
i:1
d
= F ( t l , . . . ,td)
for ( v l , . . . ,v~) E B. []
7.8 L e m m a
Let the underlying norm be the l~-norm with 1 <_ r < co. Suppose E[]X[] ~+~ < co
for some 6 > 0 and Q~(Xi) > 0 for I < i < d.
d
(a) Qr(X) < d H Qr(Xi) lid
i=-1
d
(b) I f t i = Q r ( X i ) i / r / H Qr(Xi) 1/rd, ni = [til~l/d] for n E z1N, ~i,n C Cni,r(ii) and
/=1
an = Xd=l/~i,~, then
d
lim nr/dS min HX - all ~ = d H Q~(Xi)Wd"
n--~O0 aEotn
i=l
104 II. Asymptotic quantization for nonsingular probability distributions
Proof
The choice of ti comes from L e m m a 7.7. We have
V,~,~(X) < E min IIX - all r
d d
----~-~ E min ]Xi - b]r = E Vm,~(Xi)
i=1 i=1
i:1
d
:
i=[
[]
7.9 Remark
Suppose EIIXII r+~ < co for some 6 > 0. Suppose further t h a t the one-dimensional
marginal distributions Pi of P are absolutely continuous with respect to A, 1 < i < d.
d
Let ni = hi(n) C 1N such t h a t I I ni <_ n, let (fli,n)n>l be an asymptotically ni-optimal
i=l
d
set of centers for Pi of order r and let c~n = Xi= 1 ~i,n. Then as n -+ co
d
1
aCan i=1
In fact, we have
aEan = bEfli,n
7. Asymptotically optimal quantizers 105
~i=l ~r i=l
Notes
The observation in Corollary 7.3 seems to be new. Theorem 7.5 extends considerably
a corresponding result of McClure (1975) for one-dimensional distributions with com-
pact support. See also the review article by McClure (1980). Rates of convergence
in Theorem 7.5 for some one-dimensional distributions P (Pareto, exponential and
power-function distributions) with respect to various local distances have been com-
puted by Fort and Pages (1999). A discussion of the vector quantizer advantage as
defined in (7.8) can be found in Lookabaugh and Gray (1989). Na and Neuhoff (1995)
provide a nice result about the asymptotic performance of suboptimal quantizers (like
product quantizers).
106 H. Asymptotic quantization for nonsingular probability distributions
(a) /fP~ ~ 0 and EIIXII ~+~ < co for some ~ > 0, then
Q~(T(X)) = CQ~(X).
Proof
Immediate consequence of L e m m a 3.2 (or of (6.4)). []
The following lemma shows that the largest quantization error among distributions
concentrated on a fixed bounded set appears (asymptotically) for the uniform distri-
bution.
8.2 L e m m a
Let A E B(R d) be bounded with Ad(A) > 0. Then
m a x ( Q r ( P ) : P(A) = 1, Pa ~ O} = Q~(A)
-- Qr([0, 1]d)&~(A)r/d.
Proof
Immediate consequence of HSlder's inequality. []
Since one may not expect to be able to find the precise values of Qr([0,1] d) for all
dimensions d (and all norms), it is of great interest to find bounds. These bounds
immediately yield bounds for Qr(P).
8. Regular quantizers and quantization coefficients 107
The following lower bound for Q~([0,1] d) indicates that the members of an n-optimal
partition for a uniform distribution tend to look like a ball. For the/2-norm it is due
to Zador (1963, 1982) and for arbitrary norms this bound is contained in Yamada et
al. (1980).
8.3 P r o p o s i t i o n
Proof
By (6.2) and Theorem 4.16
For low dimensions good upper bounds for Qr([0,1] d) can be obtained by the nor-
malized r-th moment of space-filling figures in R d. Here a set A C R d is called
space-filling if A is compact with Ad(A) > 0 and there is a countable family T of
bijective isometries on R d such that {T(A) : T E T} is a tesselation o f ~ d. This notion
depends on the underlying norm. In case the isometries T c T can be chosen of the
form T(x) = x + t, t E R d, then A is called space-filling b y translation. Note that
if S: R d --+ Rd is a similarity transformation and A is space-filling (by translation),
then S(A) is also space-filling (by translation).
Let us mention some properties of space-filling sets.
8.4 L e m m a
Let A C R d be space-filling and let T be a correponding family of bijective isometries.
Choose a • Cr(U(A)) and let aA = ( T ( a ) : T • T}.
Proof
Set u -- diam(A).
(a) Let I = {T • T : T(A) N B(O, s) ~ O} for some s > O. Then
U T ( A ) C B(O,s+u).
TEl
Therefore
IliAd(A) = ~ Ad(T(A)) = Ad(U T(A))
TEI TEI
Od C U (S-1T(A) MA)
T~T
TC,S
and thus
Figure 8.1: Tesselation of [0,1] 2 into m = 6 regular hexagons and a boundary region,
n = 10
8.5 T h e o r e m
Let A C R ~ be space-filling. Then
lim
7/,--~00
f
nr/'~ Itx
J
- A,A(X)II"dU([O, 1]d)(x) ---- Mr(A).
In particular
Proof
By Lemma 2.1, cT(a) E C~(U(cT(A))) and Mr(cT(A)) = M~(A). Therefore
-- ~ f [Ix-cT(a)ll~ dx + fminTezI x - - c T ( a ) l l ~ dx
TCIcT(A) D
r~ d TEl
D
= n -~/4 ~ M ~ ( A ) + f minllx
g TCI
- cT(a)ll~dx.
D
There exists a constant 3' > 1 such that for every x E [0,1] 4 and sufficiently small
s > 0 one can find y C B(x, 7s) satisfying
Otherwise, choose y E B(x, 7cu) such that B(y, cu) C B(x, 7cu) M [0, 1]d and then
choose S C T such that y E cS(A). One obtains
Therefore
A basic question is whether equality holds in (8.3). The regular quantizer problem
consists in finding a space-filler A c R d such that Q(R)([0,1]d) ---- Mr(A). Both
problems are unsolved for d > 3.
One technique for obtaining upper bounds is to select space-fillers in higher dimen-
sional spaces by forming products of two (or more) lower dimensional space-fillers.
8.6 L e m m a
Let the underlying norm be the It-norm, 1 <_ r < oo. Let A c R d and B C R k be
space-filling. Then A x B C R d+k is space-filling and
Proof
Clearly A × B is space-filling in R a+k. Furthermore,
W(alA) = W(0IA) + a, a • i .
8.7 Lemma
Let A C R ~ be a lattice.
Proof
(a) Choose s > 0 such that B(0, s) contains a fundamental parallelotope of A. Then
{B(a, s) : a E A} is a covering of R a. This implies W(01A ) C B(0, s) and hence,
W(0IA) is compact. Furthermore, if B denotes a fundamental parallelotope of A,
then
= n ( B + a))
aEA
= - a) n B )
aEA
> Ad(B) = det(h),
where W = W(0IA ). Here the inequality follows from the fact that the Voronoi
diagram of A is a covering if Rd; see Proposition 1.1.
(b) If A is admissible, then in view of (a), W(01A ) is space-filling by translations
with A as set of translation vectors. This implies Ad(W(01A)) = det(A). If A is not
admissible and Ad(OW(O]A)) = 0, then
for some al,a2 E A, al ~ a2. Hence, there exist xl,x2 c int W(0IA), Xl ¢ x2, such
that Xl - x2 C A. Let b = xl - x2. Choose c > 0 such that B(xi, c) C W(0IA) and
B ( x , , ~) n B(x2, c) = 0. Set A = W(0IA) \ B(xl, ~). Then
B ( x l , ~) = B(x2, c) -t- b C A + b
which yields
W(01A ) C A O (A + b).
This implies that {A + a : a E A} is a covering of Rd, hence Ad(A) _> det(A). We
obtain
8.8 Exaraple
Let the underlying norm on R 2 be the/t-norm and let A = Z ( - 1 , 1) + Z(4, 0). Then
A={a•Z 2:al+a2•4Z}, det(A)=4,
and A2(W(0IA)) = 5; see Figure 8.2. It follows from Lemma 8.7 that A is not
admissible (for the/1-norm).
• ~ • Xl
Figure 8.2: Voronoi region W(0IA) with respect to the/1-norm for a nonadmissible
lattice
As concerns the convexity of W(01A ) one can modify Remark 1.9 as follows: if W(0IA )
is convex for every lattice A C R d, then the underlying norm is euclidean (cf. Gruber,
1974, Theorem 2).
If A c R d is an admissible lattice, then we know from Lemma 8.7 that the Voronoi
region W(0IA) is space-filling by translation with A as set of translation vectors.
Thus Therorem 8.5 applies to the n-quantizer f,,,h -- f~,w(olh) for U([0, lid). Note
that W(0IA) is symmetric (about the origin) and hence
fw(0,A) Ilxll r dx
(8.4) Mr(W(01A)) = det(A)(d+,)/a ;
8.9 T h e o r e m
Let A C N e be an admissible lattice and let X be U([0, 1]e)-distributed. Then
lim n r/a
n--+O0
f
J
min ]Ix - bllrdU([O, 1]d)(x) = Mr(W(01A))
bean, A
and
In particular
Q~([0,1]4) _/,(W(01A)).
Proof
We have
(when f,~,his defined in (8.1) with respect to the center a = 0). So the first assertion
follows from Theorem 8.5. N o w observe that a~,h does not depend on r and
Mr(W(Olh)) = d#(z).
lira nUdE min IIX - b]l~ -- f z r d # ( z ) for every 1 < r < OO,
Then
The lattice quantizer problem consists in finding an admissible lattice A such that
@L)([0, 1]4) = Mr(W(01A)).
8. Regular quantizers and quantization coefficients 115
8.10 R e m a r k
(a) Suppose A C R d is space-filling by translation, where the corresponding set of
translation vectors is an admissible lattice A. Then
In fact, since
E min
bE~n,h
IIX - bll r _< EIIX - A,AIL
where X is U([0, 1]d)-distributed, the above inequality follows from Theorems 8.5 and
8.9.
(b) Suppose A C R d is a convex space-filler by translation. Then A is a centrally
symmetric polytope (i. e. - ( A - x) = A - x for some x E R d) and admits as set of
translation vectors a lattice A (cf. McMullen, 1980). By (a) we have
Mr(W(0IA)) __ Mr(A)
Conversely, if Mr (B (0, 1)) ----Q(L)([0, 1]d) holds (for some r) and if the lattice quantizer
problem has a solution A, then B(0, 1) is space-filling. To see this, choose s > 0 such
that Ad(B(O, s)) = det(A). By (the proof of) Lemma 2.9, we have B(0, s) C W(01A ).
To verify the converse inclusion, assume that there exists x E W(0, A) with s <
[[x[[. Since the distance function d(., A) is continuous on R d, one obtains B(x, ~) C
( U B(a, s)) c for some ~ > 0. Choose a e A such that Ad(B(x,¢) M W(a[A)) > 0.
aEA
Then
Ad(B(O, s) ) < Aa(B(a, s) ) + Ad(B(x, ~) n W(alA))
_< Ad(W(aIA)) = det(A),
8.12 E x a m p l e ( H e x a g o n a l l a t t i c e i n R 2)
Let d -- 2 and let A = Z(1, 0) + Z ( 1 / 2 , v ~ / 2 ) . Here we have det(A) = v/3/2. If the
underlying norm is t h e / 2 - n o r m , then W(01A ) is a regular hexagon,
::~i~iiiil~i~
• O I O • •
Figure 8.3: Voronoi region W(0{A) with respect to the /2-norm for the hexagonal
lattice
We have
8- 2r/2 I/2 (1-=:)lv"~
M , ( W ( 0 1 A ) ) - 3(2+,)/, / / (x~ +x~)'/2dx2dx:.
0 0
In case r = 1 and r = 2 one obtains
3 log(V"-3) = 0.37771...
M:(W(01A)) - 2 + 37/4~
and
5
M2(W(O[A)) - - - -- 0.1603 ....
18v~
8. Regular quantizers and quantization coefllcients 117
If the underlying norm is the/t-norm, then W(0IA) is the (nonregular) hexagon
W(01A ) --- {x • R2: Ixll < 1/2, Ix, I + Ix2[ _< (1 + Vr3)/4}
W(01h) = [ - 1 , 1}, d = 1,
W(01h) = (z • R": ~ I~1 < i for every
iEI
I C {1,... ,d},lil = 2}, d > 2.
f llxll2dx = d/x~dx
W W
1
=a/~a~-~(W~,)dxl,
-1
Using
we deduce
d 1
Ilxll2dx = ~ + d--~?
W
118 H. Asymptotic quantization for nonsingular probability distributions
13
M2(W(0IA)) = ~ = 0 . 3 0 6 4 . . . , d = 4.
If the underlying norm is the /1-norm, then W(0IA ) coincides with the above 12-
Voronoi region and A is thus admissible. Here we obtain for r = 1
1 d 1
MI(W(0]A)) = 2-~/d( ~ + )---------~)'
1 2(d+ d _> 1.
In particular,
MI(W(0IA)) = 21/3----
~ = 0 . 6 9 4 4 . . . , d = 3.
8.14 E x a m p l e ( D u a l l a t t i c e s D~)
The dual lattice of the lattice Da is defined by
Then
D*a = zd + z ( 1 , . . . ,21-)
and
Note that 2D~ = Z and 2D~ = D2. Let A = 2D~. Then det(A) = 2d/det(Dd) = 2 a-1
If the underlying norm is the/2-norm, it is not difficult to verify that
For d = 3, this Voronoi region is a truncated octahedron; see Figure 8.4. It is more
difficult to compute the normalized second moment of W(01A). We obtain
19
M2(W(0IA)) - 21/36------
~ - 0.2356... , d = 3.
8. Regular quantizers and quantization coefficients 119
T : R 4 - ~ R 4, T(x) = ( x l + z 2 , z l - x 2 , ~3 + x~, z 3 - ~)
with scaling factor yr2 satisfies T(D4) = A. Therefore, W(01A ) = TW(OID4) which
yields
13
M2(W(01A)) = M2(W(OID4)) = 21/23------
6 = 0.3064..., d = 4
A general formula for M2(W(01A)) can be found in Conway and Sloane, 1993, pp.
470-471. The above upper bounds for Qr([0, lid), d = 3, 4, are close to the ball lower
bounds given in Proposition 8.3. We have
( 3 ~ 2/33_ = 0.2309 , d = 3,
M2(B(O, 1)) = \47r/ 5 "'"
If the underlying norm is the /1-norm, the Voronoi region generated by 0 does not
change and so A is admissible. For d = 3 and r = 1, we obtain
35
MI(W(0[A)) = 41/33------
~ -- 0.6890..., d -- 3.
A further trivial case concerns the loo-norm. Here B(0, 1) = W(01Z d) is again space-
filling and so
d
(8.10) Qr([0,1] d) = Mr(B(0, 1)) - (d + r)2 ~' l~-norm
8.15 T h e o r e m
Let d = 2 and suppose the underlying norm is the 12-norm. Let A be the hexagonal
lattice. Then
Qr([0,1] 2) = M~(W(0[A)).
Proof
Set A -- W(01A ) and recall that A is a regular hexagon. For every n E / 5 / a n d every
a C A with la] -- n, we have
(cf. Fejes T6th, 1972, p. 81). Let a C Cn,r(U(A)). It follows from Theorem 4.1 and
Lemma 2.6 (a) that (~ C A. Therefore
>n f Ilxllrdz
n-ll2A
: n-rl2Mr(A) det(h)( 2+r)/2
which yields
nr/2V~,r(U(A)) >_Mr(A) det(A) r/2, n c/hr.
This implies
Qr(A) >_Mr(A)det(A) r/2
and hence
Q,([0,1]``) _> Mr(A).
This together with Theorem 8.9 gives the assertion. []
8.16 R e m a r k
The above results allow to prove by a quantization argument that a~,h is uniformly
distributed in [0, 1]`` for every admissible lattice A C R`` with convex Voronoi region
W(01A ) in the sense that
1
I'~,~,AI ~ ~b~ u([0, i]``) as ~ -+ ~.
bEan,A
First, observe that A : W(0tA ) is the unit ball of some norm [[ [[o- Then forget
the underlying norm which was only used to form the Voronoi region W(0[A) and
through this (~,A and proceed with the norm II II0. The Voronoi region W(01A , I] II0)
with respect to II II0 coincides with A. Therefore, by Theorem 8.9 and Proposition
8.3
lim
7t--}~X}
nrl``f
j
min
bEo~, A
IIx - blrodU([O, 1]'~)(x) = Mr(A, II Iio) = Q,.([o, 1]", II Iio).
The assertion now follows from Theorem 7.5.
P Q~(P)
Normal
N(0, a 2) ( ~ ) r / a ( 1 + r) (~-')/2
Logistic
p[ I X2+2r
L(a)
'2"" (I + r)p(:_::)'+~
Double Exponential
DE(a) (a(1 + r)) r
Double Gamma
p/b+r~l+r
Dr(a, b) ar(l+r)b+r-1 -~V~J
p(b)
Hyper-exponential
HE(a,b) (~)~(1 + r) O+r-b)/b
Uniform
1 .b-a, r
U([~,b]) 1+~ ( - ~ )
Triangular
2 +T)(b--
T(a, b; c) (2+r)'+'((1 2 a))~
Exponential
E(a)
Gamma
P Q2(P)
N(0, 1)
V•- -- 2.7206...
2
L(v/'3/~r) F(~)~ - 3.7709...
47r2r(~) 3
9
DE(1/x/~) - -- 4.5
2
3b+ip(b+2~3
Dr(a, b) --~ 3 J
r(b + 21
a 2 - _
1
_
range: (0,3F(2) 3) ----(0, 7.4488...)
b(1 + b)
HE(a, b)
a~_ r(~) r(~)~ 3(~-b)/b
r(~)
range: (1, oo)
U([a, b]), 1
b = a + 2x/3
T(a, b; ~),
27
b=a+2v~ 1.6875 - -
16
9
E(1) 2.25
- -
4
3b+l r(-r)
bd-2 3
r(~, b)
4r(b + 1)
a2 _ 1
range:
( 3r(~)
~- ~ , ~ . 1 = (1.8622...,2.7206...)
W(a,b)
a 2
1 91/br(~) 3
r(~)- r(~F 452[r(~_ _ r(~_~)2]
range: [2.1555, c~)
9b-1 2
P(a, b), b > 2,
a2 = ( b - 2)(b- 1) 2
range: (9, c<))
b
P QI(P)
7[
N(O, ~) - = 1.5707...
2
L ( ~ )1 - 1.7798...
8 log 2
DE(l) 2
Dr(a, b)
2br( )
r(b + 1)
1
range: (0,r)
b
HE(a,b)
br( )
r(~)
a = p(}) range : (1, co)
v([a, hi), 1
b=a+4
4
T(a, b; a+b~
2 Y'
- ----1.3333...
3
b=a+6
E 1 1
---- 1.4426...
log 2
P(a, b), b > 1,
b-1
a - b(21/b _ 1) (b--1)(21/b--1)
1
range: ( ~ , c o )
Notes
The issue of space-filling sets in the quantization setting was raised by Gersho
(1979) for the /2-norm. Expositions concerning tesselations (tilings) can be found
in Griinbanm and Shephard (1986) and Schulte (1993). For a background on lattices,
we refer to the books of Cassels (1959) and Gruber and Lekkerkerker (1987).
Gersho (1979) contains upper bounds for Qr([0,1] ~) of the type (8.3) for the/2-norm.
Theorems 8.5 and 8.9 provide a rigorous derivation. The obervation in Theorem 8.9
concerning the distributional convergence seems to be new. Different proofs for the
Hexagon-Theorem 8.15 can be found in Newman (1982) (for r -- 2), Wong (1982)
(also for r = 2) and Haimovich and Magnati (1988). Discussions and applications
126 II. Asymptotic quantization for nonsingular probabifity distributions
8.17 C o n j e c t u r e
Qr([0,1] a) -- Mr(W(OID~) ) for every r E [1, co) when the underlying norm on R 3 is
t h e / 2 - n o r m (cf. Example 8.14, (8.8), and Remark 10.11(c)).
9. Random quantizers and quantization coefficients 127
Let X be a Rd-valued random variable with distribution P, let [[ [[ denote any norm
onR d,andlet l<r<c~.
9.1 T h e o r e m
Suppose P = Pa. Let Y1, Y2, . . . be i.i.d. Rd-valued random variables with distribution
Q independent of X and let g = dQa/dA d.
l_<i_<n
{01 t_<O
F~(t) = ' - f exp(-Ad(B(O, 1))g(x)d) dP(x), t > O.
(b) Assume P(g = O) > O. Then the sequence (nl/gminl<~<n I[X - Yi[[)n>l is
stochasticaIIy unbounded.
Proof
(a) Let t > 0. For x • R ~, we have
By differentation of measures
as n -+ oo for Ad-almost all x and hence for P-almost all x C R d (cf. Chatterji, 1973,
Chapitre V). Therefore, by Lebesgue's dominated convergence theorem
1P(nWd min []X - Yi[[ _~ t) = f~'(n ~/d min [Ix - Yd] < t) dP(x) -+ F~(t).
l<_i<_n J l<_i<n --
Obviously, IP(nl/dminl<i<n ]IX - Yi]] -- 0) = 0 -- F.(0). Since P ( g > 0) = 1, F , is a
distribution function and we obtain
n 1/a min [[X - Yi[[ D v.
t<i<n
This, together with
(cf. Lemma 6.8). In fact, by H51der's inequality with p = d/(d + r) and q = -d/r,
one obtains
9.2 T h e o r e m
Let A C R a be a compact set with Aa(A) > 0 and tet II1, Y2,... be i.i.d. U(A)-
distributed random variables independent of X . Assume P = Pa, supp(P) C A or
P(int A) = 1. Assume further that there are constants c > 0 and to > 0 such that
Then
and
{01 t<0
F.(t) = ' - exp(-Ad(B(0, 1))td/Ad(A)), t > O.
Proof
Let Q = U(A). In case P = Pa and supp(P) c A, the first assertion follows from
Theorem 9.1. In case P(int A) = 1, we have for t > 0 and x E intA
(9.4) supnS/dE min IIX - Yd[ 8 < co for every s c [1, co).
n>l l_<i<r~
This property implies that (n rid mint _<,<_nIIx- ~llr),~___lis uniformly integrable which
yields the second assertion.
To prove (9.4) note that supp(P) C A. Let x E supp(P). We have
oo
Observe that (9.3) holds (with a different constant c) for every t c (0, diam(A)) if
diam(A) > to. In fact, choose tl E (0, to). Then for every to _< t < diam(A)
Using the inequality (1 - z) ~ _< e -nz, 0 < z < 1, one obtains with cl = c/Ad(A)
diam(A)
_ sF(s/d) _. c2n_8/a.
d(r~Cl)S/d
This gives
[]
The regularity condition (9.3) with supp(P) replaced by supp(U(A)) is discussed in
Section 12. Compact convex subsets A of R d with Ad(A) > 0 satisfy this condition.
Next we will deal with consequences for the quantization coefficients.
Clearly, random quantizers cannot be better than optimal quantizers giving the fol-
lowing upper bound for Qr([0, 1]d). For the/2-norm, it is due to Zador (1963, 1982).
9.3 P r o p o s i t i o n
Proof
Choose A = [0, 1]d and P = U([0, 1]d) in Theorem 9.2. Then
A comparison of the random quantizer and the cube quantizer upper bound shows
for the/2-norm and r = 2
A comparison of the random quantizer upper bound and the ball lower bound given
in Proposition 8.3 can be found in the Tables 9.1 and 9.2.
Table 9.1: /2-norm, r = 2. Ball lower bound and random quantizer upper bound for
Q2([O, 1]d)
132 H. Asymptotic quantization for nonsingular probability distributions
From the ball lower bound and the random quantizer upper bound for Qr([0,1] 6) we
deduce the following approximation for large d.
9.4 C o r o l l a r y
Let the underlying norm be the Ip-norm, 1 <_p < c~. Then
Proof
By (2.5) and (2.6)
g)r/6
dF(1 + p~
Mr(B(O, 1)) =
(d + r)2rr(1 + i)r"
r ( x ) ~ v ~ x * - ½ e -x as x -~ o~,
we deduce
TJ,F(1 rip
9. Random quantizers and quantization coefficients 133
Since lim~_~ F(2 + ~) = P(2) = 1, the assertion follows from the Propositions 8.3
and 9.3. []
dlim
- ~ d-*Ql([O, 1]d) = ~i = 0.1839... ,
1
d-tMl([0, 1]d) = ~, d > 1,
lim d-IMI(W(OIDd)) 1
d-~ 4
The d i f f e r e n t i a l e n t r o p y is defined by
provided the integral exists. Note that in (9.5) and (9.6) the entropies are calculated
in nats and not in bits.
For the/2-norm, the following result is contained in Zador (1963). []
9.5 P r o p o s i t i o n
Let X 1 , X 2 , . . . be i.i.d, real random variables with distribution P. Suppose Pa ~t 0
and E]X1] ~+~ < co for some 5 > O. Let the underlying norm be the lp-norm, 1 <_p <
co. Then
lim Q~(X,,.. . , Xa) = 0 if Pa ~ P,
d--+ oo
134 II. Asymptotic quantization for nonsingular probability distributions
prerH(e)
liraoo d-~/vO~(X1,...
d-~ ,Xd) - 2,.(ePl,./pF(1) r if P~ = P.
Proof
It follows from the assumptions that
d d
(®.)
1 1
,x.)ll
log ~ h :dlogllhlld/(~+~)
1 d/(d+r)
= (d + r) log f h d/(d+r) dA
= rHd/(,+~)(P~).
Therefore
d
lim log (~) h = rH(D~).
Furthermore
9.6 R e m a r k
(a) Proposition 9.5 immediately yields the d-asymptotics for the vector quantizer
advantage as defined in (7.8) in the i.i.d, case:
(Here the underlying norm in t h e / r - n o r m , EIXI[ r+a < c~ for some 5 > 0, and the
distribution of X1 is absolutely continuous with respect to A.)
(b) We know from Table 8.2 that infQ2(P) -- 0 and supQ2(P) = 0% where the
infimum and the supremum are taken over all univariate symmetric (absolutely con-
tinuous) distributions P with variance equal to one. On the other hand, the normal
distribution N(0, 1) is the unique maximizer of the differential entropy among all
probabilities P with mean zero, variance equal to one and supp(P) -- R. For such
distributions P it follows from Proposition 9.5 that
d d
Notes
P H(P)
N(0, o.2) ½log(21ra 2) + 1
L(a) loga + 2
DE(a) log(2a) + 1
Dr(a,b) log(2aF(b)) + (1 - b)¢(b) + b
HE(a,b) log(2ar )/b) ) +
U([a, b]) log(b - a)
E(a) log a + 1
V(a, b) log(aV(b)) + (1 - b)¢(b) + b
W(a,b) log(~) + b-b~:~-+ 1
b+l
P(a,b) log(~) + b
d
Qr(® P)
P Q,~ = exp(rHd/(d+r)( P) )
N(O,o
L(a) t~ ( T !
HE(a, b)
10 AsyInptotics for t h e c o v e r i n g r a d i u s
and
(10.1) e,,,dP) = i~r Ildoll.
oCR d
where do(x) = d(x, ~) = inf IIx - aiI. It follows from L e m m a 3.1 that, for 1 _< r < cx~,
aEo
In this section we consider the case r = oo and discuss its relation to the quantization
problem (r < oo).
max
xEA
minllx-~ll--min{s
aC~
>0:
--
U B(~,8)~A}
aEo
searching for ~ E C~,oo(A) is equivalent to the geometric problem of finding the most
economical covering of A by at most n balls of equal radius. The number e~,oo(A) is
called n - t h c o v e r i n g r a d i u s f o r A . Recall t h a t the Hausdorff metric is given by
10.1 L e m m a
Let n C N.
(c) Let A denote the support of P and suppose A is compact and [A[ >_ n. Let
a~ C C,~,~(P), 1 <_ r < co, and let (rk)k>__t be a sequence in [1, oo) converging to
infinity. Then the set Of dH-cluster points of the sequence (ark)k>_1 is a nonempty
subset of C~,oo(A) and
lim dH(ar, C~,oo(A)) = O.
r-~oo
Proof
(a) follows from the fact that [[ [It -< [[ [Is for r _< s.
(b) and (c). Let u = diam(A) and choose s > 0 such that A c B(0, s). It follows
from Theorem 4.1 and L e m m a 2.6 that
C,~#(P) C {a C Rd: 1 < lal < n, ~ c B(0, s + u)}
for every r E [1, co) provided ]A[ > n. Using L e m m a 4.23 we deduce the existence
of a dH-cluster point for (a~k)k>l. Now let a be the dH-limit of a subsequence of
(ar~)k>l which is again denoted by (ark)k>> We have
[[d~rk - d~[[r~ _< IIda~ - d~[[oo
< sup[d(x,o~rk) -- d(x,a)l
xER d
= dH(a~, ~)
and hence
[Idar, llrk >__[Idal[r, - [Id,*rk - da[[rk
>_ lid, lit~ - dH(ark,a).
By (a), l i m e~,r(P) exists and is less than or equal to e,~,o~(P). Therefore
This gives (b) and the first assertion of (c). The second assertion of (c) follows from
the first one. []
For A C R a compact with Ad(A) > 0, set
In spite of the slight inconsistency in notation we continue to write M ~ (A), M,,,~ (A),
Qo~(P) etc. for the corresponding notions in case r = oo.
10. Asymptotics for the covering radius 139
10.2 L e m m a
Let A c R a be a nonempty compact set and let T: R ~ --+ R d be a similarity trans-
formation with scaling number c > O.
Proof
Obvious. []
The existence of n-optimal sets of centers of order co can be derived from the existence
of n-optimal sets of centers of order r < co (cf. Theorem 4.12) and Lemma 10.1(c).
10.3 L e m m a ( E x i s t e n c e )
If A C R d is a nonempty compact set, then
C,~,~(A) ¢ O.
Proof
We assume without loss of generality that [A[ _> n. To show that the assertion follows
from Lemma 10.1(c) it suffices to note that A -- supp(P) for some Borel probability
measure P on N ~. If A is finite, set P = ~ 5aliA]. Otherwise let B = {bl, b2,... }
aEA
The covering problem can be formulated in terms of the Hausdorff metric and the
Loo-minimal metric.
10.4 L e m m a
Let A C R d be a nonempty compact set. Then
Proof
Let ~ • Cn,oc(A) and set
This yields
e,~,~(A) > inf dH(O~,A).
-lal<~
The converse inequality is obvious. Furthermore, the inclusion
10.5 L e m m a
Suppose supp(P) is compact and let X be a Na-valued random variable with distri-
bution P. Then
e~,~(P) = inf p~(P,Q)
= inf poo(P,PS)
feYn
---- inf esssupIiX - f(X)][.
feY~
Proof
Let A denote the support of P. If Q E P~ with Q(a) = 1, Io4 <_ n, let c > 0 such
that Q(B) < P(d8 <_~) for all B E B(Rd). Then
This implies
Therefore
Poo(P,P f ) (_ c = m a x d ( x , o:).
xEA
This implies
inf p~(P, Pf) < e,o~(P).
fEY~ -- '
= IId lW,
and if {A~ : a E a} is a Voronoi partition of R d with respect to a, then
Proof
(a) Let fl E Cm~(B ). Then
e,,,.(A) < maxmin I l x - bll < e,~(B).
-- xEA bEfl -- '
142 H. Asymptotic quantization for nonsingular probability distributions
(b) Let ai • Cm.~(Ai ) and let ~ = 0 ai. Then I~l -< ~ and therefore,
i=1
m
= max e,~,,oo(Ai).
l<_i<_m
[]
Now we can derive the exact asymptotic first order behaviour of the covering radius
e,~,oo(A) for compact Jordan measurable sets A with Ad(A) > 0.
10.7 T h e o r e m ( A s y m p t o t i c c o v e r i n g r a d i u s )
Let A C R 4 be a nonempty compact set with Ad(OA) = 0. Let
Proof
The proof is given in three steps.
S t e p 1. Let A - - [0,1] d. Let m , n e N, m < n a n d l e t k = k ( n , m ) = [(~)t/d].
Choose a tesselation of the unit cube [0, 1]d consisting of k d translates C1,... , Cad of
the cube [0, ~]
1 d. Then by Lemmas 10.2 and 10.6,
= max Mm,o~(Ci)k -t
1<i<~
= k-lMm,oo([O, 1] e)
= k-le~,~([O, 1] ~)
and hence
This implies
for every rrt • N. Therefore, lin~-,oo nl/den,oo([O, 1]d) exists in [0, cx3) and
nllUe,~,oo([O, 1] d) = [t~11}
n ~ Udnl1/den,,~t[u,
. . . . ±J~,) --+ ml/gQoo([O, 1]d) as n _+ co"
This implies
minllx - ~11 < i~f I1~ - yll for every x E Ci,~, 1 < i < m.
aCTi - - yEC~
Then
m a x em+k,oo([0, 1 ] a ) ( / - 2e).
l~i_<rn
144 H. Asymptotic quantization for nonsingular probability distribu tions
Since ni _< n, we have ~ vi _< 1. Furthermore, vi > 0 for every i. Otherwise, Step
i:1 i:1
1 yields L = co, which contradicts (10.6). By taking a further subsequenee we can
assume without loss of generality that
This implies
L > max Qoo([0, 1]d)v~-t/d(l -- 2C).
l <i<_m
Since 0 < e < I/2 is arbitrary and maxl<i_<m v~ 1/a >_ m l/e, one obtains
Ak = [.J { Ci : A n Ci ¢ O, i < k d}
Since A C Ak, it follows from Step 2 and Lemma 10.6 (a) that
which yields
Now s u p p o s e )~d(OA) ---- 0. To prove that Qoo([0, 1]d))~d(A)t/d is a lower bound for
liminf~_~o~ nt/de,~,oo(A) we may assume that Ad(A) > 0. Set
Bk = LJ {Ci : Ci c A, i < k d}
10. Asymptotics for the covering radius 145
< limsupnl/%m~(A )
Qoo([o, 1]d))~d(A)1/d, i <_r < co,
Here the first inequality follows from the Lemmas 10.1 and 10.6 (a) while the last in-
equality follows from (10.8). Since by assumption l i m ~ Q~([0,1]~) Ur = Qo~([0,1]~),
this implies
lim nl/de~,~ (A) = Q~ ([0,1] d)A~(A)1/d.
146 II. Asymptotic quantization for nonsingular probabifity distributions
10.9 R e m a r k
Let A C ]~d be an infinite compact set with Aa(0A) = 0. For ¢ > O, let N(c, A) be
the minimal number of balls of radius 6 > 0 which are necessary to cover A, i.e.,
(
=min. n>l:3aCR a, l a l < n , C
aE(~
B(a, 1) }
Then
>_eN(~),oo(A),
hence by Theorem 10.7
= Qoo([0, 1]d)dAd(A).
< eN(c)-l,oo(A),
hence again by Theorem 10.7
We obtain
(Actually, this limit result is equivalent to the assertion of Theorem 10.7). From
(10.12) we deduce that Ad(B(0, Qoo([0, lid))) coincides with the density of the thinnest
covering of the whole space by translates of B(0,1) (cf. Gruber and Lekkerkerker,
1987, p. 237, Definition 6). The existence of the limit Iim N(~, A)~ d appears in Gruber
~--+0
and Lekkerkerker (1987, p. 237, Theorem 7) for convex compact bodies A.
As for the quantization coefficients, the covering coeffcient Qoo([0,1] d) is only known
for d = 1, d = 2 (/1-norm, /2-norm), and in "trivial" cases for d > 3. Lower and
upper bounds for Qoo([0,1] d) which correspond to those given in Proposition 8.3 and
Theorem 8.9 can easily be derived.
For A C R d compact with Ad(A) > 0, set
Moo(A) = Ml,oo(A).
10. Asymptotics for the covering radius 147
that is, Moo(W(0IA)) is the normalized covering radius of A (with respect to the
whole space).
10.10 P r o p o s i t i o n
1
(a) lirar~oo Qr([0,1]d) Ur _> M~(B(O, 1)) = Ad(B(0 ' 1))1/d.
Proof
(a) By Proposition 8.3
= io~(B(O, 1)).
c = c(n) = (n det(A))-l/d,
/3~,A = { c ~ : a • A, c W ( ~ l h ) n [0,1] d # 0},
k -- k(n) -- IZ,~,AI
and
A,~ = U W(blcA).
bE,8,,.,A
Furthermore,
This implies
Then
(10.17) = M~(W(0[A))
This result is due to Kersher (1939). Solutions of the lattice covering problem are
known for dimensions i < d < 5 among them the hexagonal lattice for d -- 2 and D~.
We have
(cf. Conway and Sloane, 1993, p. 12 and Chapter 2, Section 1.3 and the subsequent
Remark 10.10 (a)).
A trivial case occurs for the/1-norm and d = 2, where B(0, 1) -- W(OID2) (cf. Example
8.13). Therefore
(b) ( d - a s y m p t o t i c s ) If the underlying norm is the/v-norm, with 1 < p < co, then
P
lira d-~ooinfd-U'Qoo([O, iI d) _> 2(ep)l/pF(}).
Therefore
(10.21) lim d-UPQoo([O, 1]d) = P /p-norm.
d-~oo 2(ep) VVF(1/p)'
Thus we find for the covering coefficient Qoo([0,1] d) exactly the same d-asymptotics
as for the r-th roots Q~([0, lid) 1/~ of the r - t h quantization coefficients, 1 _< r < co
(cf. Proposition 9.4).
(c) Let the underlying norm be the/2-norm. For d = 2, the solution of the lattice
quantizer problem - - given by the hexagonal lattice - - does not depend on r (cf.
Theorem 8.15). For d --- 3, the lattice D~ solves the lattice quantizer problem for
r -- 2 (cf. (8.8)) and the lattice covering problem (r -- co). So possibly D~ solves the
lattice quantizer problem for every r. For d = 4, the solution of the lattice covering
problem is unique up to linear similarity transformations (cf. Baranovskii, 1965) and
differs from the best known lattice quantizer/94 for r = 2 (cf. Conway and Sloane,
1993, p. 12 and p. 61). Therefore, solutions of the lattice quantizer problem must
depend on r.
One may use n-optimal (or asymptotically n-optimal) sets of centers of order r -- co
as quantizers. Their asymptotic performance depends on the covering density of the
unit ball (cf. Remark 10.8).
10.12 P r o p o s i t i o n
Let A c R d be a nonempty compact set with Ad(A) > O, let (O/n)n_>1 be an asymp-
totically n-optimal set of centers for A of order co, i. e., [an[ _< n and
lim n TMmax min [[x - a[] = Qoo([O, 1]d)Ad(A) l/d,
n-+co x c A aEan
limsupn ~/d f min [[x - a[[r dU(A)(x) < o(d+~)/dM~(B(O, 1))Ad(A) r/d,
n--~oo , ] aEan
Proof
Let sn = mea~ d(x, c~,~). Then
-<)-]" S IIx-allrdx/'>'<~(A)
5 nM,.(B(O, 1)),,td(B(O, S,,))(d+")ldlxd(A )
= ns~+';td(B(O, 1))(d+')ldMr(B(O, 1))l;kd(A).
Therefore
nr/d
f
] min IIx - allr dU(A)(x)
J aE O~n
<_ (nsg~d(U(O,1)))(d+r)ldMr(U(O,1))/~d(A)
for every n E N. This yields the assertion. []
The above covering density upper bound for Qr([0,1] d) is better than the "trivial"
bound Q~([0, 1]d) r as long as ~d < (d + r)ld while for the r-th root
Then the set of dH-cluster points of the sequence (ak)k>l is a nonempty subset of
C~,oo(A) and
dH(O~k,C~,oo(A)) -+ 0 as k -+ oo.
Proof
To show that the asserton follows from Lemma 4.22 applied to the space of all
nonempty compact subsets of R d equipped with the Hausdorff metric du, the subset
N = {a C Rd: ]a[ _< n, a ~ 0}, and f = dH(A, .), it suffices to verify that
L(c) C {a e N: a E B(O,c+s)}.
Using Lemma 4.23 we deduce the dH-compactness of L(c). []
If in the preceding theorem the sets ak C Cuoo(Ak) satisfy maxd(a, Ak) < emoo(Ak)
for all large k (such a choice is always possible), then the assumption (10.23) can be
dropped.
Under suitable conditions, weak convergence of probability distributions implies the
dH-Convergence of their (compact) supports. The following special case will be needed.
10.14 L e m m a
Let Pk ~ + P for Borel probability measures on R d with compact supports Aa and
A, respectively. Then
max min [[x - y[[ -+ 0, k -+ oo.
xcA yEAk
Proof
For e > 0, choose a finite subset a of A such that A c U B(a, e). For a E a, define
aEt~
a bounded continuous function f~: R d --+ R+ by
minPk(B(a,c))
aEa
> min f fadPk > 0
-- aEa j
10. Asymptotics for the covering radius 153
and therefore
max min Ila - Yll <
aE~ yEA~
10.15 C o r o U a r y ( C o n s i s t e n c y )
Let A denote the support of P and suppose A is compact.
Proof
Since Pk D> p a.s., the assertions follow from Theorem 10.13, Lemma 10.14, and
(10.22). []
Notice that uniqueness IC~,oo(A)l = 1 implies (10.23). Under this uniqueness condi-
tion, Corollary 10.15 (c) is contained in Cuesta-Albertos et al. (1988, Theorem 12).
Part(a) has been observed by Wagner (1971).
Notes
Some material about the issue of this section for the/2-norm and/co-norm may be
found in Niederreiter (1992), Chapter 6. In particular, the exact order n - l I d of
e~,~(A) is well known ff A~(A) > 0. However, we are not aware of a reference
concerning Theorem 10.7. Examples of n-optimal sets of centers for [0,1] 2 of order co
can be found in Johnson et al. (1990) for the/1-norm and the/2-norm. The covering
density upper bound for the r-th quantization coefficients given in Proposition 10.12
seems to be new. A discussion of the relation between the quantization problem for
r ----2, the covering problem and the packing problem can be found in Forney (1993)
for the/z-norm. For general treatments of the covering problem we refer to Gruber
and Lekkerkerker (1987) and Conway and Sloane (1993).
154 II. Asymptotic quantization for nonsingular probabifity distributions
Consistency and central limit results for a trimmed version of the covering problem
have been proved by Cuesta-Albertos et al. (1998) and Cuesta-Albertos et al. (1999).
Empirical versions of related covering problems and their asymptotics when both the
level n and the sample size k tend to infinity were studied e.g. by Zemel (1985) and
Rhee and Talagrand (1989b) for the/2-norm.
Let us mention that n-optimal sets of centers of order co are often called best n-nets
and Chebyshev-centers in case n -- 1. (cf. Garkavi, 1964, and Singer, 1970, Section
II.6.4).
10.16 C o n j e c t u r e
l i m r ~ Qr([0, lid) Ur = Q~([O, 1]d) (cf. (10.11) and Remark 10.8).
If Conjecture 8.17 can be resolved, then Conjecture 10.16 is true for d = 3 and
/2-norm. Furthermore, if Conjecture 8.17 can be resolved, then the lattice D~ pro-
vides a solution of the covering problem in R a for the /2-norm, i.e., Qo~([0,1] a) --
Moo(W(OID~) ). This is a long standing conjecture in geometry.
Chapter III
11.1 D e f i n i t i o n
D_~ := D__~(P) = liminf ~--~OO
l°g~
-- l o g e n , r
is called the l o w e r q u a n t i z a t i o n d i m e n s i o n o f P
o f o r d e r r.
D--~ := D r ( P ) = lim sup ~- l o g e n , r is called the u p p e r q u a n t i z a t i o n d i m e n s i o n o f
P o f o r d e r r.
If the two numbers D~ and Dr agree then their common value is denoted by Dr
(= Dr(P)) and called the q u a n t i z a t i o n d i m e n s i o n o f P o f o r d e r r.
11.2 R e m a r k
and Dr do not depend on the underlying norm. D__ooand Doo depend only on
the support of P. Using the definition of e,~,oo(K) in (10.3) we also define D__oo(K),
D--~(K), and Doo(K) for an arbitrary (nonempty) compact set K C R d.
156 III. Asymptotic quantization for singular probability distributions
11.3 Proposition
(a) If O <_ t < ~ < s then
Proof
Let us first prove (a). If e~,r = 0 for some n E N then D_~ = 0 and (a) is obvious.
Suppose en,r > 0 for all n E N. For 0 _< t < D_~ choose t' E (t, D__~). Then there exists
an no E N with
en,r < 1 and log n > t'
- log en,~
for all n _> no. This implies
ne~, r > 1
and, hence
t-t'
ll.etn,r > en,r
for all n _> no. Since lim en,r = 0 we deduce
n--+O0
For D_~ < s there is an s' E (Dr, s) and a subsequence (enk,r) of (en,r) with
11.4 C o r o l l a r y
__ m
(a) If l < r < s < oo then D_~ ~_ ~ and Dr ~_ Ds.
11. The quantization dimension 157
then Dr = D.
(c) Let 1 < r < c¢ and suppose E(llZll r÷~) < + c ¢ for s o m e ~ > 0 then Dr < d. I f
the absolutely continuous p a r t P~ of P does not vanish then Dr = d.
(d) Let r = oo. Then -Doo < d. /fAd(supp(P)) > 0 then Doo = d.
Proof
(a) follows from L e m m a 10.1 (a) and Proposition 11.3
(b) follows immediately from Proposition 11.3
(c) and (d) follow from Proposition 11.3, Theorem 6.2, (10.8), and the fact that
ne,,oo(P) d > Ad(supp(P))/Ad(B(O, 1)).
[]
Next we will investigate the connection of the quantization dimension to other types of
dimension. Let us first consider the relationship of D_~(P) to the Hausdorff dimension
of the support of P.
For a set A C R ~ and e > 0 an e-cover of A is a cover of A by sets U~ each of diameter
at most c, i. e., diam(Ui) -- sup{Hx - YH: x, y E A} < e. For s >_ 0 let
in particular
dimH(K) _< D__oo(P).
158 III. Asymptotic quantization for singular probability distributions
Proof
Let an E C~,oo. Then (B(a, e~,oo))aca~ is a cover of K. Let c > 0 be arbitrary. Since
(e~,oo)neN converges to 0 there is an ne E N with en,oo _< ~ for all n ___n~. This implies
11.6 T h e o r e m
For all r > 1,
dimH(P) < D_Q,(P).
Proof.
The proof will be given in Corollary 12.16. []
Since dimH(P) < d, the above inequality can be strict (see Exmaple 6.4). In
the case that r = 2 and supp(P) is compact the above theorem was proved by
PStzelberger(1998a).
is called the u p p e r b o x d i m e n s i o n of K.
If dimB(K ) = dimB(K) this value is denoted by dimB(K) and called the b o x di-
m e n s i o n of K.
This definition suggests that there is a close relationship between D__~(K) and
dim.B(K ) and between D ~ ( K ) and dimB(K). We have the following result.
11.7 T h e o r e m
Let K C R d be compact. Then
Proof
(i) To prove dimB(K ) < D__oo(K) let n > 1 be a natural number. Then
We deduce
log n log N~
D__oo(K) = lim inf > lim inf
n~¢~ - log en,oo - r , - * o o - log en,oo
>_ lira inf log N(e) = dimB(K).
6-*0 - loge
Next we show dimB(K ) > D_oo(K). For e > 0 the definition of N(e) implies
eN(6),oo _~ C,
hence
log n log N(e)
D__~(K) = liminf < liminf
n-*oo - log e~,oo - 6-*0 - log eg(e),oo
_< lim inf log Y ( e ) _ d i m B ( g ) .
6-,0 - log e
for all n _> 1. For n _> 1 choose a E Cu,oo. Let fl C R d be of minimum cardinality
with
d~(x) _< 1
for all x E B(0, 1). Let k ----I~l. For y E R d, e > 0 and/~(y,e) = e~ + y we have
Let N,~ = N(e,~,~). T h e n N,~ _< n and since eu,,~ = e.,oo > 0, we have
1
CkNn,oo ~-- ~eN,,oo < Vn,oo.
T h i s implies n < kN,~.
log I n
> l i m sup
- ~-~o0 - log en,c,~
[ -logk logn ]
= l i m s u p [--]-og--'~n,~ ~ - log en,~ J
= Doo(K).
eN(v)_l,oo > E.
T h i s leads to
11.8 Corollary
Let K c R ~ be compact. I f the box dimension of K exists then the quantization
dimension of K of order co also exists and equals the box dimension.
11.9 Proposition
Let P be a probability on R d with compact support K. Then, for 1 < r < s < co
Proof
The result follows immediately from Corollary 11.4 and Theorem 11.7. []
• R p,(e r )
dima(P)=llmsup "'" .
e-.0 - log
The l o w e r r a t e d i s t o r t i o n d i m e n s i o n (of order r) of P is
. . l m RP'~(¢r)
d i m R ( P ) = .i ,. m
E~o - loge
162 III. Asymptotic quantization for singular probability distributions
11.10 T h e o r e m
Ill < r < oo then
dimn(P ) _< D__~(P).
Proof
Let c > 0 be given. Let n E Nsatisfye~,r <_ e. Let f : R d - + R d be an n-optimal
quantizer of order r and Q the image of P on R d × R d under the map x -+ (x, f(x)).
ThenQl=P, Q2=P/=Pof-land
h(x,y) J'o, y¢
~ l 1{ I = y } ( x ), y E c~.
{f= }
= ~ - ~ P ( ( x : f(x) = a and (x, f(x)) e A})
aE¢~
R~,.(c')< s(f, Q)
= f h(x, y)log h(x, y) dQ, ® Q2(x, y)
=a~ea i logh(x,a)dP(x)
{f= }
--~-~"f log~dP(x)
= - ~_i P(I = a) log P ( f = a)
aE~
___ log ]~].
Since f is n-optimal and a = f(R d) we know that [a] = n (cf. Theorem 4.1). Thus
we have shown that
e~,r < e implies Rp,r(er) <_log n.
Now let ne C N be the smallest natural number with en,,r <_ e. Then we get
This implies
liminf RP'~(vr) <_lim inf logns
e-+0 - l o g e s~0 - l o g e ~ , ~ "
With e~ = e~,r we know that n~. _< n and e. .... = e~,r and get
11.11 Remark
It follows from the preceding theorem that, for 1 < r < cx),
Notes
12.1 D e f i n i t i o n
Let # be a finite Borel measure on R d.
12.2 R e m a r k
A set or a measure which is regular of dimension D in R ~ with one given norm is
also regular with dimension D in R d with any other norm. This follows from the
wellknown fact that any two norms on R d are equivalent, i.e., if [[ [[ and [[[ [][ are
norms on R d then there is a constant c > 0 with ~1]] ]]] -< ][ !t -< cl]l III- The notion of
regularity of dimension D remains unchanged if one uses closed balls instead of open
bails in the definition.
12.3 L e m m a
Let # be a finite measure on •d such that there is a c > 0 and an ro > 0 with
0
Iz(B(a, r)) <_ cr ° for all a • supp(p) and all r • (0, ro). Then there is a d > 0 with
0
~ ( B ( a , r)) < c'r D for all a • R d and a / / r > 0.
166 III. Asymptotic quantization for singular probability distributions
Proof
o
First we will show t h a t there is a 5 > 0 with I~(B(a, r)) <_ cr D for all a E supp(#)
and all r > 0. To this end let a E supp(#) be a r b i t r a r y and define
----max(c, I~(Rd)roD).
If r _> ro then
o
5r z) >_ #(R~)roDr D >_ # ( R d) > I~(B(a, r)).
We claim that, for a r b i t r a r y a E R d and r > 0,
o
I~(B(a, r) ) <_ 2DSrD = dr D.
o
If 0 < r < d(a, supp(#)) then #(B(a, r)) --- 0 and the claim is true.
If d(a, supp(#)) < r choose b E supp(/~) with [[a - bl[ ----d(u, supp(#)). Then we have
o o
[]
12.4 L e m m a
A finite union of regular sets of dimension D is regular of dimension D.
Proof
Let M b . . . , M~ C R d be regular of dimension D and M = M1 O . . . O Mn. Then M
is compact and we have
for all a • R d and all r > O. W i t h o u t loss of generality we may assume d/ > c+ Set
c = d1 + . . . + dn and r0 = min(r~,0,... , rn,o). It follows that for all a • M and all
r • (0, r0)
1 1 °
~ ___rain( , . . . , ± ) r ~ <_ n ~ ( i n B(a, ~))
en
n
<_ ~ U ~ ( M ~ nb(a,~))
i=,
-< ~ c~TD
i=1
< cr D.
[]
12.5 L e m m a
Let M C R u be compact. Then M is regular of dimension D if and only if every
point x of M has a regular neighbourhood of dimension D in M .
Proof
Proof
Let rl = min d(z,g(U)C). Then we have rl > 0. Let r0 > 0 and c > 0 be such that
zeg(M)
for all a • M and r • (0, r0). Define r~ = min(r~, I r 0). For y • g(M) and r • (0, r0)
we obtain
o 1 o o
B(g-l(y), ~ r ) c g - l ( B ( y , r ) ) c B(g-'(~l,c'r)
168 III. Asymptotic quantization for singular probability distributions
and, hence,
1 1 D °
< n '(M n 1))
o
1 D o o
-~--57-l (M N g-l(B(y, r))) < 7-lO(g(M) n B(y, r))
(12.2)
<_ c'DnD(Mn r))).
o
Thus ~/Ig(M) has g(M) as its support and g(M) is regular of dimension D. []
12.7 E x a m p l e ( C o n v e x sets)
Let K C R d be a nonempty compact convex set. The dimension D of K is defined
as the dimension of the affine subspace of Ra spanned by K. We will show that K is
a regular set of dimension D.
Without loss of generality we may assume that K spans R ~ (otherwise we take the
affine subspace generated by K and transform it by an affine isometry onto some Rd).
In this situation D -- d, 7-/~ is just a non-zero multiple of the Lebesgue measure A~K,
and, obviously,
0 < Ad(K) < oo
since i n t K ¢ 0 (cf. Webster, 1994, p. 61, Theorem 2.3.1). By Remark 12.2 we may
assume that R d carries the/2-norm. To prove that K is regular of dimension d it is,
therefore, enough to show that A~K is regular of dimension d. Let r0 > 0 be arbitrary.
o o
The map K --+ R, x --+ Ad(K NB(x, ro) ) is continuous. Thus c -- min Aa(K nB(x, to))
xCK
o
exists. Since B(x, r0) N int K ~ 0 we know that c > 0. For 0 < t < 1 and x E K the
convexity of K yields
o o
x + t(B(O, ro) n (K - x)) C K n B(x, tro).
12. Regular sets and measures of dimension D 169
We deduce
o o
= tUAd(B(x, ro) n K )
C d
> (tro) .
12.8 E x a m p l e ( S u r f a c e s o f c o n v e x sets)
Let K C R d be nonempty, convex, and compact. Let b d K denote the relative
b o u n d a r y of K , i.e., the boundary of K relative to the affine subspace spanned by
K . Let D + 1 be the dimension of K . We will show t h a t b d K is regular of dimension
D.
As above we may assume t h a t D + 1 = d and t h a t R d carries the/u-norm.
For x, y E R d the number (x, y) E R is the s t a n d a r d scalar product of x and y, i.e.,
i f x = ( X l , . . . ,xd) and y = (Yl,... ,Yd) then
d
(12.3) (x, y) = x,y .
4=1
Once this has been proved the argument is finished as follows. The set f ( B ( x , r)) is
open. Hence there is an s > 0 with B(f(x), s) C f ( B ( x , r ) ) . By Example 12.7 the
o
implies ~(y) _> O. Thus y • OK yields ~(y) = O. Obviously every y • B(x, r) with
~(y) --- 0 belongs to OK.
Next we will show t h a t ~, is convex. Let y, y' • B(x, r) and t • [0, 1] be arbitrary.
Then ty + (1 - t)y' + (t~(y) + (1 - t)~o(y'))u = t(y + ~o(y)u) + (1 - t)(y' + ~o(y')u) • K,
hence
~o(ty + (1 - t)y') <_ t~o(y) + (1 - t)~(y').
A convex function on an open set is locally Lipschitz (see Webster, 1994, p. 224/225,
proof of Theorem 5.51). Thus, by making r a bit smaller if necessary we may assume
t h a t p is Lipschitz, i.e., there is a c > 0 with ]~o(y) - ~(y~)] _< c]]y - y']] for all
y,y' E B ( x , r ) . Define H = {y • Rd: (y,u) = 0} and let Ptt be the orthoghonal
projection onto H. Define f : B ( x , r ) -+ R ~ by f ( y ) = PH(Y) -- ~O(y)u. Then f is a
Lipschitz map, since
~o(z) = min{s • R: z + su • K}
= min{s • R: PH(Y) + tu + su • K }
= min{s • R: PH(Y) + (Y, u)u + (t + s -- (y, u))u ~_ K}.
Moreover we have
(12.10) f ( B ( x , r) n OK)) = g A f ( B ( x , r) ).
This can be seen as follows:
If y 6- OK n B(x, r) then ~(y) -- 0 and, hence,
f(Y) = PB(Y) C H N f ( B ( x , r ) )
If y 6- B(z, r) and f(y) 6_ H then ~(y) = 0, hence, y 6_ OK N B(x, r).
12.9 E x a m p l e ( C o m p a c t d i f f e r e n t i a b l e m a n i f o l d s )
Let D 6_ {1,... ,d - 1} and M be a compact D-dimensional CLsubmanifold of R a.
Then M is regular of dimension D.
To prove this we will show that every point in M has a regular neighbourhood of
dimension D in M. The claim then follows from Lemma 12.5. Let x 6_ M be
arbitrary. By Federer (1969, p. 231, 3.1.19) there exists an open neighbourhood U
of x, an open set V in R d, a D-dimensional vector subspace W of R d, and a C 1-
diffeomorphism f : U --~ V with f ( M n U ) = W N V . Let r > 0 be such that
B ( f ( x ) , r) (:7_Y. By Example 12.7 the set B ( f ( x ) , r) N W is regular of dimension
D, g = f - 1 is bi-Lipschitz and maps B ( f ( x ) , r ) [7 W onto f - l ( B ( f ( x ) , r ) ) t3 M. By
Lemma 12.6 f - l ( B ( f ( x ) , r))NM is regular of dimension D. Since f-1 (B(f(x), r))NM
is obviously a neighbourhood of x in M the argument is finished.
172 III. Asymptotic quantization for singular probability distributions
12.10 E x a m p l e ( S e l f - s i m i l a r sets)
Let S 1 , • . . , SN : R d ---> R d be contracting similarity transformations with scaling num-
bers s l , . . . ,Sg E (0,1), i.e., for i E { 1 , . . . , N } and allx, y E R d,
It was shown by Hutchinson (1981) that there is a unique non-empty compact set A
in •a with
A = SI(A) U . . . U S N ( A ) .
This set is called the self-similar set corresponding to $ 1 , . . . , SN. It is easy to see
that there is a unique real number D > 0 with
N
~--~8D : i,
i=l
Before we deal with the quantization of regular measures of dimension D let us men-
tion the following characterization of these measures.
12.11 P r o p o s i t i o n
Let # be a finite measure on R d with compact support K . Then the following prop-
erties are equivalent:
12. Regular sets and measures of dimension D 173
Proof
That (i) implies (ii) is an immediate consequence of a proposition in Falconer(1990,
Proposition 4.9) and the definition of regularity of dimension D.
The converse implication (ii) =~ (i) follows from the definition of regularity of dimen-
sion D. []
12.12 P r o p o s i t i o n
Let P be a probability on R d. Assume that there is a c > 0 and an ro > 0 such that
o
(12.11) P ( S ( a , r ) ) < cr D
Proof
o
By Lemma 12.3 there is a 5 > 0 with P ( B ( a , r)) < 5r D for all a ~ R a and all r > 0.
Let a E R a and let B be a Borel subset of R a. If P ( B ) = 0 then the conclusion
(12.12) obviously holds. Let us assume P ( B ) > 0 and set
o
Since lira P ( S ( a , r)) = 1 >_ P ( B ) there is an r > 0 with P ( B ( a , r)) >_ ½P(B). Hence,
r --+OO
which implies
P ( S ( a , r)) < 1 p ( B )
174 III. Asymptotic quantization for singular probability distributions
If (r,~),~eN is any increasing sequence with rn < rB and lim r,~ ----rB we deduce from
o o
> l (~-~)b(P(B))l+v.
[]
12.13 C o r o l l a r y
Let P be a probability on R d. Then the following conditions axe equivalent:
o
/ llx - all dP(x) >_ctP(B(a, T)) l+b
B(a,r)
Proof
That (i) implies (iii) is Proposition 12.12 and that (iii) implies (ii) follows by setting
o
B = B(a, r). It remains to show that (ii) implies (i). Obviously, (ii) yields
and, hence,
o
1 D_D
(~) "1 >_P(B(a,r)).
[]
12.14 C o r o l l a r y
Let P be a probability on R d. Suppose that there is a c > 0 and an ro > 0 with
o
P ( B ( a , r)) < cr o
for all a E supp(P) and a/l r C (0, r0). Then there is a c o n s t a n t b > 0 such that, for
every e - p ~ k i n g {B1,... ,B~} in R d with P(Rd\ ~ BO = 0 o~d an ~ , , . . . , ~ e R d
i=1
Proof
Without loss of generality D > 0. Set p = 1 + -~ and q = 1 + D. Then we have
~+~1t = 1 and p > 1. H61der's inequality yields
D
S := ~ 1 IIx - adl d P ( x )
= n~ IIx - a~ll d P ( x
This implies
Using Proposition 12.12 we get for the constant c' > 0 of that proposition
D
12.15 P r o p o s i t i o n
Let P be a probability on R d. Suppose that there are constants c > 0 and ro > 0
with
o
P ( B ( a , r)) <_ cr D
for every a e supp(P) and every r e (0, r0). Then
> o.
Proof
Let am E C~,1 and let {Aa: a E am} be a Voronoi partition o f R d with respect to am.
By Corollary 12.14 we have
12.16 C o r o l l a r y
Let P be a probability on R d. Then
Proof
By Corollary 11.4 it suffices to prove dimH(P) _< D__t(P). If dimH(P) = 0 then there
is nothing to show. So let dimH(P) > 0 and let t with 0 < t < dimH(P) be arbitrary.
By Falconer (1997, Prop. 10.3) we have
o
This implies
and, hence,
o
P ( { x E Ru: 3r~ > 0 Vr _< rx: P ( B ( x , r ) ) < rt}) > O.
Thus, there exists a compact set K C R d with P ( K ) > 0 and an r0 > 0 such t h a t
o
P ( B ( x , r)) <_ r t
o 1 t
Q ( S ( x , r ) ) <_ p----(~r
for all x E supp(Q) and all r _< r0 and Proposition 12.15 yields
[]
12.17 P r o p o s i t i o n
Let P be a probabifity on R ~ with compact support K . Assume that there is a c > 0
and an ro > 0 with
o
Proof
If there is an no C N with e~o,oo = 0 then en,oo -- 0 for all n ~ no and the assertion
of the proposition is obvious. So let us assume t h a t e,~,o¢ > 0 for all n E N. Since
K is compact there exists a finite set an C K of m a x i m u m cardinality satisfying
[Ix - y[[ >_ eu,oo for all x, y E ~ with x ¢ y.
1 78 IlL Asymptotic quantization for singular probability distributions
We will show that [an[ > n. Assume the contrary. Then we know that
Hence there exists a y E K with [[y - all > en,o~ for all a E an, which contradicts the
maximality of an.
For x, y ~ au with x ¢ y we have
0 0
1
en,oo ~ r0
1 D
1 _> E e(2en,°°)
aC ¢~n
and, therefore,
2D
neDoo ~ - -
-- C "
12.18 T h e o r e m
Let P be a regular probability of dimension D on R ~. Then, for 1 < r < o%
Proof
The inequality (12.15) follows immediately from Propositions 12.15, 12.17 and Lemma
10.1 (a). The remaining statements follow from Corollary 11.4 and Proposition 12.11.
[]
12. Regular sets and measures of dimension D 179
12.19 R e m a r k
It remains an open question for which regular probabilities P of dimension D the
limit
lim ne D
~--+OO Dr
exists in (0, oo). Recall from (6.4) that in this situation, for 1 _< r < oo,
is called the r-th quantization coefficient of P. It follows from Theorem 6.2 that
for the normalized volume measure P of a convex compact set the r-th quantization
coefficient exists. We conjecture that the same is true for the normalized surface mea-
sures on convex compact sets and compact C1-manifolds. For the natural Hausdorff
measure on a self-similar set the quantization coefficients exist in some cases while
in other cases (like the classical Cantor set) they need not exist. We will discuss
measures on self-similar sets in Section 14.
Notes
The concept of regularity for sets and measures can be found in several books on geo-
metric measure theory (see, for instance, David and Semmes (1993, 1997) and Mattila
(1995)). Since there are several different notions of regularity for sets and measures
the above regularity is sometimes called Ahlfors-David regularity (see Mattila, 1995,
p. 92). The elementary results on regular sets of dimension D (Lemma 12.4-12.6)
and the results concerning the regularity of convex sets and their boundaries as well
as that of compact Cl-manifolds are probably well-known. We just could not find an
explicit reference. To our knowledge the results concerning the quantization of regular
sets and measures of dimension D as stated above are new. A good introduction to
the theory of convex sets is Webster (1994). The basic theory concerning self-similar
sets as well as many examples can be found in Barnsley (1988). For the canonical
normalized Hausdorff measure P on a self-similar set with OSC the inequalities in
(12.15) were first proved in Graf and Luschgy (1996). After this book had essentially
been finished PStzelberger (1998a) gave different conditions for a probability P to en-
sure that 0 < liminf,~oone,~,2(P) D or limsup,~_~oone~2(P) D < oo or lim ne,~2(P) D
exists (for the/s-norm), where D is suitably chosen.
180 IlL Asymptotic quantization for singular probability distributions
13 Rectifiable curves
Here we consider the length measures on rectifiable curves. These measures can be
obtained by restricting the one-dimensional Hausdorff measure to the given rectifiable
curve. In this way we get an elementary class of singular measures of quantization
dimension 1 for which the quantization coefficients exist and will be calculated. Nev-
ertheless there are simple examples that show that the length measure on a rectifiable
curve need not be regular of dimension 1 (see below).
In this section [I [[ will always denote an euclidean norm on R a. First we wilt collect
some basic results about rectifiable curves.
13.1 D e f i n i t i o n
Let a, b E R with a < b. A c u r v e (more exactly, a Jordan curve) F is the image of a
continuous injection 3`: [a, b] -+ R d. 3' is called a p a r a m e t r i z a t i o n of F. A curve is
called r e c t i f i a b l e if
L is called the l e n g t h o f t h e c u r v e F.
13.2 L e m m a
I f F c R d is a curve then 7-/1(F) -- L(F), in particular L(F) does not depend on the
parametrization 3`.
Proof
See Falconer (1985, p. 29, L e m m a 3.2). []
13.3 D e f i n i t i o n
Let F be a rectifiable curve of length L. A continuous injection 3,: [0, L] --~ F is called
a p a r a m e t r i z a t i o n b y a r c l e n g t h ff L(3`([0, t]) = t for all t E [0, L].
13.4 R e m a r k
Every rectifiable curve admits a parametrization by arc length (see Falconer, 1985,
p. 29).
13.5 L e m m a
Let F be a rectifiable curve of length L and 3`: [0, L] -~ F a parametrization by arc
length. Let # be 1-dimensionaI Lebesgue measure restricted to [0, L]. Then
Proof
The lemma follows immediately from Falconer (1985, p. 29, (3.2) and p. 30, Corollary
3.3). []
13. Rectiaable curves 181
13.6 Lemma
Let F be a rectifiable curve, x E F and r E (0, ½diam F). Then
o
(13.2) 7/1 (F M B(x, r)) > r.
Proof
This follows from Falconer (1985, p. 30, L e m m a 3.4). []
Inequality (13.2) is one half of the condition for regularity of dimension 1 of a recti-
fiable curve. That, in general, a rectifiable curve need not be regular of dimension 1
is shown by the following example.
13.7 E x a m p l e
Let m E N, m > 1. Then there exist a continuous injection %~: [m---~' ~] --+ R2 with
~m(~) =
(~--~,o),
, ~
(~)
1
=(~,1 0 ), ~-r-1<ll%.(t)ll<'forallte~ [~-¥r,~),and
1 1
lm:=L(Tm([rnll,1])) :max( 1
m+l'v~
Define 7: [0,i] --+R ~ by
{°,
7(t)= %~(t),
t=0
1
if m E N with t E [~--~, 1
~]
7/I(B(0, )NF) = Ik
k=rn
k=m
so that
o 1
7-/1 (B(0, 1 ) n r ) -- v ~
182 III. Asymptotic quantization for singular probability distributions
Before we come to the quantization of recitifiable curves we will prove a result con-
cerning the distance of the n-optimal set of centers of order r from the support of the
probability in question.
13.8 L e m m a
Let P be a probability on R d with compact support K . Let 1 < r < 0o and let ~
be an n-optimal set of centers for P of order r. Define
5~ = ma~ max
aCa,~ xC W(al(xn)g)K
IIx- all = IId.ollo~
Then
o ~ ~
(13.3) 5..~2minP(B(Xx,eK 2 ))" -< e,~,,(P).
Proof
Let a E a~ satisfy 5u = max ]]x - all. Then there exists an x E W(a]a~) N K
xEW(ala~)nK
with 5~ = I]x - all. For every b E au we have
1
Ib - bll _> IIx - bll - IIx - yll -> II~ - all - II~ - yll = ~ - IIx - yll -> ~ .
Using this inequality we deduce
> / d.~(z)"dP(z)
~n~(~,½~)
_> (15n)rp( K A B(x,
o 1
-~,~))
13.9 C o r o l l a r y
Let P be a probability on ]~d with compact support K . Let n ~ [K[, 1 <_ r < 0o and
let a,~ be an n-optimal set of centers for P of order r. Then, for every a E ~ ,
1 o 1 1
(13.4) -sd(a,K) m ~ P ( B ( y , x d ( a , K ) ) ) ; ~_ en,r(P).
yE Z
13. Recti~able curves 183
Proof
The corollary follows from L e m m a 13.8 if one observes that
5n >_ d(a, K )
for all a E am, since W(a[a~) n K ¢ O for all a E an by (4.1). []
First we give a quantization result for line segments. For x, y E R d let [x, Yl be the
line segment from x to y, i.e.
(see L e m m a (13.5)).
13.10 L e m m a
1 1
Let x, y E R d with x ~ y be given. Let P = lWz~7-/l[~,y]. Then, for 1 _< r < co and
n>_l,
(13.6) e,~,r(P) = \1 + r] 2n
Proof
By the remark preceding the l e m m a we have P = U([0, 1])I = U([0,1]) o f - 1 . Let
l<r<oo.
" < ' : For a E C~,r(U([0,1]) the set fl = f((~) has n points. Hence we deduce
This yields
184 III. Asymptotic quantization for singular probability distributions
">": Let fl E C~,r(P). Since supp(P) = [x,y] is convex and since the underlying
norm is euclidean Remark 4.6 yields fl C Ix, Y]. Let a C [0, 1] equal f - l ( f l ) .
Then a has n points and we get
and, hence,
[]
Now we will give a first lower bound for the quantization errors for a normalized
one-dimensional Hausdorff measure on a rectifiable curve.
13.11 L e m m a
Let 7: [a, b] --~ R d be a continuous injection which is a parametrization of the rectifi-
i 1
able curve F with length L > O. Let P = -fl-l[r. Then, for 1 < r < c%
1 !
(13.7) ~ r, ( p ) >- ((r (a)ii
Proof
Let G = {(1 - t)'),(a) + tT(b): t E R} be the line through 7(a) and 7(b). Let PG be
the orthogonal projection onto G. By [7(a), 7(b)] = {(1 - tT(a) + tT(b) : t E [0, 1]} we
denote the line segment from 3'(a) to 7(b). Let Q denote the image of P with respect
to PG. First we will show that
(13.8) 1
~¢lp~(r) -< n~r o p51.
Let B be a Borel set in R a. Then using the fact that
IlPa(x) - Pa(y)ll ~ IIx - y[I
for all x, y c R d and Falconer (1985, p. 27, Proposition 2.2) we get
7 @ ( P a ~ ( B ) ) = 7tl(P51(B) Cl F)
> 7-ll(Pa(Pal(B) Cl F))
= n l ( B 17 P c ( r ) )
13. Rectifiable curves 185
r
er~r = V,~r = / d(x, a) ~ dP(x)
F
= 1L / d(x, a) r dn~r(Z )
r
= 1L[[-),(b)l~+~(a)[[ ([[7(b)~n.~(a)[])~
13.12 Theorem
1 1 Then
Let F C Rd be a rectifiable curve, with length L > O. Set P = ZT-/ir.
1
O) for 1 < r < 0% lim n e ~ ( P ) =
_ Q~([O,ll)~/'nI(F) = (~-~)~ -~L
(ii) l i m ne~,.(P) = Q.([O, 1])7-/1(F) = L.
Proof
Let 7: [0, L] -~ F be a parametrization of F by arc length.
First we will show (i).
>
for i ~ j and all n > no. Since lim e,~,r = 0 there exists an no c N with
(13.12) Iid~il~minP(B ( 1
yEF \ \ ~ -- --
5
]lda, II~ <
for all n > no. By the definition of 6 and an,i this yields (13.10). It follows from
(13.10) that, for n > no,
> ~ ~ / a(x,.o)r~r(x)
i:l Fi
:nr~Ffd(x'°ln'i)rdP(x).=
.
13. Rectifiable curves 187
w, _ "~ L, l f "d 1 x
i=l
Fi
= n" Z V.,,.(P,)
i:I
(13.14) s~ n -> s ,
i:1 i----1
hence
l+r
13.15) liminfnen,,>l ( 1 - - ~ ) 7
-<" Now let/3 C [0, L] be of cardinality less t h a n or equal to n and set oe = 7(/3)-
Using L e m m a 13.5 we deduce
Eo,z]
[0,El
188 IlL Asymptotic quantization for singular probability distributions
Thus we get
Combining (13.15) and (13.17) yields the first part of the theorem.
L
ne,,~(P) < ne,~,,(U([O,L]) = 7"
Hence
L
(13.18) limsupnen,oo(P) < -~
~--~oo
L = 7{'(V) < 2 l i m i n f n e n , ~ ( P )
~-~oo
so that
L
(13.19) lira inf ne,~,oo(P) >
n--~O0 -- -2'
13.13 R e m a r k
In geometric measure theory a measure /z on ]Rd is called m-recitifiable if m C
{ 1 , . . . , d}, /z is absolutely continuous with respect to 7/m, and # is supported on
a countable union of m-dimensional Cl-manifolds. A length measure on a rectifiable
curve is 1-rectifiable in this sense.
We conjecture that, for all m-rectifiable measures # on H a and all 1 < r < co,
lim ne,~,,.(p,) ~
7/ , -+00
Notes
The basic notions a b o u t rectifiable curves can, for instance, be found in the book of
Falconer (1985, C h a p t e r 3). A good introduction to the theory of rectifiable measures
is given by M a t t i l a (1995, §15-20).
190 IH. Asymptotic quantization for singular probability distributions
The class of self-similar measures has been a central object of studies in fractal geom-
etry during the last two decades. Most self-similar measures are singular with respect
to Lebesgue measure, but the restriction of Lebesgue measure to the d-dimensional
cube is, for instance, also self-similar. In this section we determine the quantization
dimension of all self-similar measures that satisfy a certain separation condition. As
it turns out many self-similar measures have the property that their quantization
dimension of order r is strictly increasing with r. In this respect they are different
from all measures considered so far in this volume.
N
(14.3) P = ~ p ~ P o S~-'.
i=1
Si(A) n Sj(A) = 0
14. Self-similar sets and measures 191
for i ¢ j .
($1,. •. , S ~ ) satisfies the o p e n s e t c o n d i t i o n (OSC) if there exists a nonempty open
set U C R ~ with Si(U) C U and S~(U) A Sj(U) = 0 for i ¢ j .
Schief (1994) has shown t h a t the open set U in the above definition can always be
chosen to be bounded and satisfy U N A ¢ 9.
If ($1,... , SN) satisfies the strong separation property t h a n it also satisfies the open
set condition.
If ($1,... , SN) satisfies the OSC and P is the self-similar measure corresponding to
( $ 1 , . . . , Szv; sD,. • • , SD) then P is the normalized D-dimensional Hausdorff measure
restricted to the a t t r a c t o r A, i.e. 0 < TiP(A) < co and
1 D
(14.4) P = 7.ID(A) 7-LIA
(7- = ~ , n= 1
( (71 . . - (7n--1~ n > 1.
(71m ~- { 0,
(71 . . . a m ,
m = 0
m > 1
I -I and -II l =
r/ll~ I = a .
(14.5)
Z
aCF
q~ = 1.
We use the notation introduced in 14.1. In the following (Pl,... ,P~) is always a
probability vector with pi > 0 for i = 1 , . . . , N and P is the self-similar probability
measure on Iit~ corresponding to (S1,... , SN; Pl,... ,PN).
14.1 L e m m a
For every n >_ N and every r • [1, oo),
Proof
Since Si is a similarity transformation it follows immediately (see L e m m a 3.2) that
for all i • {1,... , N } and all m • N. Using (14.3) Lemma 4.14 (b) implies, for
N
n l , . . . ,nN • N with r~ _> 1 and ~ ni _< n,
i=l
Using (14.7) to substitute each summand on the right hand side yields the assertion
of the lemma. []
14. Self-similar sets and measures 193
14.2 C o r o l l a r y
Let P C { 1 , . . . , N}* be a finite maximal antichain, n C N with n >_ lPI and r C
[1, +oo). Then
Proof
The corollary follows from L e m m a 14.1 by induction on max{lal : a C F}. []
14.3 L e m m a
For every n > N,
N
(14.9) e~,oo(P) < min~t max s, en.~o(P): 1 <
-- / I < i < N ~' " " -- 77"i~
~ n, < n }
-- "
-- -- i = l
Proof
N
Since s u p p ( P o S~-1) = Si(A) and A = [.J S~(A) the lemma is an immediate eonse-
i=l
quence of L e m m a 10.2(b) and L e m m a 10.6(b). []
14.4 L e m m a
Let r C [1 + oo) be fixed. Then there exists exactly one number ~r C (0, +oo) with
(14.10) ~ ( 8 Pi ri)~~+r
r = 1.
i=1
Proof
N
Since 0 < pis~ < 1 the function t -+ Y~(pis~) t is strictly decreasing and continuous.
i=l
Since this function tends to N as t tends to 0 and takes a value less than 1 at t = 1
the intermediate value theorem implies the existence of a unique t E (0, 1) with
N
y~(pisr) t = 1. Then ~r = ~ satisfies the conclusions of the lemma. []
i=1
14.5 P r o p o s i t i o n
Let r C [1, q-c~) and let ~r satisfy (14.10). Then
Proof
Let qi = (pisS)"+'~, and Co = min{qb .. ,qt¢}. Then we have ~0 > 0. Let m , n E N be
arbitrary with m < c ~ a n d s e t ¢ = ¢ 0 - - l m~-. F o r P ( ¢ ) = { a E {1,... ,N}*: q~- > ¢ >
q,} it follows by (14.5) that
1= ~-~q,
,er(~)
: ~ qa-qawl
~er(~)
> c¢olr(~)I ,
hence
act(E)
(p~s: )r+~, (p~s;),+~, vm,,(p)
Thus we obtain
r _ r g_ r
n~Vn,r(P) <_~o "r m~Vrn,~(P)
which implies
ne,~,,(P) '~" <_ ¢o'mem,r(P) "r.
Since, for fixed m, this inequality holds for all but a finite number of n we get
< +cx)
14.6 R e m a r k
The proof of the preceding proposition shows that
(14.11) limsupne,~,,(P) '~" < max{(pls~) . , + ~. , , . ... , ( p., , , , ~ ) ~,+-, " ,~,~,,( )
14. Self-similar sets and measures 195
14.7 P r o p o s i t i o n
Let D be the similarity dimension o f ( & , . . . , SN). Then
Proof
Set qi = s ° and eo = m i n { q l , . . . , qN}. Let m, n E N be arbitrary with ~ < 5 2. Set
5 = 5 0- - I r~a a n d F ( 5 ) = {a E { 1 , . . . ,N}*: q~- _> 5 > q~}. SinceF(e) i s a m a x i m a l
finite antichain it follows by Lemma 14.3 and an induction argument that
14.8 R e m a r k
The proof of the preceding proposition shows that
The general assumptions in this section are the same as those in the preceding section.
14.9 L e m m a
For every 5 > O,
Proof
Sett--(~) D a n d F ( t ) = { a e { 1 , . . . , N } * : s ~ _ D _> t > S~}. Then r(t) is a finite
maximal antichain. Thus a = min{p~: a C F(t)} > 0. Let z E A be arbitrary. Since
A = (.J S~(A) there is a 7- C F(t) with z E S~(A). Since ST is a similarity we have
~er(0
diam(S~(A)) = s~ diam(A)
and, hence,
diam(S~(A)) < c.
Thus, we have S~(A) C B(z,e). Prom (14.3) it follows that
P(S~(A)) = ~ p ~ e o S[I(S~(A))
~erCt)
>_p ~ P ( S ~ ( S ~ ( A ) ) )
= Pr.
14.10 L e m m a
Let ( S~,. . . , SN) satisfy the strong separation condition and let r E [1, +oo) be given.
Then
Proof
"<": T h a t Vn,r(P) is less than or equal to the right hand side for all but finitely many
n is the statement of Lemma 14.1.
">": To show the converse inequality let ~ = min{d(S~(A), Sj(A)): i # j}. Then we
have ~ > 0. Prom Lemma 14.9 we deduce
lim V,~,~(P) = O.
r~-~oo
14. Self-similar sets and measures t97
for all n > no. Let on be an n-optimal set of centers for P of order r. Then
L e m m a 13.8 implies t h a t
1 r
for all n > no and all a E on. Since the function t -+ t~minP(B(y,t)) is
-- yEA
non-decreasing it follows t h a t
i.e.
6
for all n _> no and all a E a~. By the definition of 5 we deduce that, for n > no
and i , j E { 1 , . . . , N } with i ¢ j , the sets an,i = (a E a~ : W(a[a~)MSi(A) • 0}
and an,j -- {a E a~: W(aIan) M S j ( A ) ¢ O} are disjoint.
Using (14.3) we obtain
pisrVm,r(P),
•
where ni = I~,~1 ~ 1
Since
N
Z n i ~ IC~nl z n
i=1
we deduce
198 III. Asymptotic quantization for singular probability distributions
[]
14.11 Proposition
Let ( $ 1 . . . , SN) satisfy the strong separation condition and let r E [1, +oo) be g/ven.
Moreover, let nr satisfy (14.10). Then
in particular, the lower quantization dimension D___r(P ) is greater than or equM to t~.
Proof
Since [A[ = c~ we have V,,~(P) > 0 for all n 6 N. Let no E N be such that (14.14)
holds for all n _> no. Choose c > 0 with
V~,r(P) _> c n - ~
Vk,~(P) >_ c k - ~
N }
V,~,r(P) = m i n ~ E P i s : V ~ , r ( P ) : l < _ hi, E ni _< n
i=1
Thus we get
v~,~(P) >_ cn ~,
14. Self-similar sets and measures 199
14.13 P r o p o s i t i o n
Let ($1,... , SN) satisfy the open set condition and let D be the similarity dimension
of ($1,... , SN). Then
lim infne~ oo(P) D > 0,
Proof
Since supp(P) =- A -- supp(7/~A ) the statement of the proposition follows immediately
from Example 12.10, Theorem 12.18 and Remark 11.2. []
The general assumptions are the same as in Section 14.2. We denote the similarity
dimension of ($1,... , SN) by D. In this section we will show that, for most serf-similar
measures, the quantization dimensions of different orders are different.
14.14 T h e o r e m
Let r E [1, +oc), let ~r C (0, oc) be defined by
N
(14.16) E(p~s[);~z;~ = 1,
i=1
Proof
The result follows from Proposition 14.5 and Proposition 14.11. []
200 IlL A s y m p t o t i c quantization for singular probability distributions
Next we will prove some auxiliary results concerning the function K : [1,+oo) -~
(0, +co), r --+ at. Define F : [1, +co) × (0, +co) -+ R by
N
F ( r , t ) = E ( p , s [ ) r ~ ; - 1.
i=1
and
N
- - ~ ( r , t ) = i=l (t + r)2(l°gpi + r l ° g s i )
14.15 L e m m a
I f there exists an ro >_ 1 with K'(ro) = 0 then p~ = s D for i = 1 , . . . , N and K r = D
for all r E [1, +co).
Proof
r ~o
Set qi : (pisi°) ~°+~°. Since K'(ro) -- 0 we derive from (14.17) and tcro > 0 that
N
(14.18) E qi (log pi - t%o logsi) = O.
i=l
N
E qi(r° + ~ro log Pi ro + aro log qi) = O.
i=1 7"0 r°
Since ~ r0
> 0 this implies
N
E qi log p--!= O.
i=1 qi
Pi = qi
i.e.
Pi = S~ r°
N
Since y~ Pi = 1 the definition of the similartiy dimension of ( S b . . . , Sly) yields
i=1
and Pi = s~ for i = 1 , . . . , N.
Using this identitiy in (14.16) we obtain
N
~'~ (s9+r~
\ $ ./ rq*-~r ~ 1.
i=1
This implies
D+r
D = t % ~
/~r - b r
and, hence
~r = D
for all r • [1, +co). []
14.16 L e m m a
If (p~,... ,p~) = ( s f , . . . , s~) then ~, = D for ~tI r • [1, + o o ) .
I f (Pl,... ,P~) ~ ( s ~ , . . . , sg) then K : (0, +oo) --+ R, r --+ ~;r is strictly increasing
with
lim nr --= D.
202 IIL Asymptotic quantization for singular probability distributions
Proof
If ( p l , . . . ,p~¢) ----( s ~ , . . . , s~) then the last part of the proof of L e m m a 14.15 shows
that ~r = D for all r E [1, +co).
If (Pl,... ,PN) ~ (s~,... , s~) then it follows from L e m m a 14.15 that K'(r) ~ 0 for
all r C [1, +c~). Since, as a consequence of the definition of ~r, the function K is
increasing this implies that K is strictly increasing. In particular moo = lim mr exists
r--+oo
in [0, +c~]. Since
N N
~p r~r
1 = Z ( p i s r ) r+'~ <_ ~ s: ÷'"
i=1 i=1
we obtain
- rm
- r~ D,
r-l-mr
hence
Dr
mr ~ - -
r-D
for r > D. Thus we deduce
moo < D .
Since
N N rif T
1 = ~oolim--~-~)(pis
_ 'r - ~-~ = ~oolimZ p i ~:~ s,~+~"
i=1 i=l
N
i=1
we deduce
m~o = D.
[]
14.17 T h e o r e m
Let q, r E [1, +oo] and let ($1,... , SN) satisfy the strong separation condition.
(i) If (Pl,... , P~) = (sO,... , s °) then the quantization dimension Dr(P) of order
r exists and equals D.
(ii) If (Pl,... ,PN) ~ ( s f , . . . ,s D) then the quantization dimension D r ( P ) exists
and
J D , r = +o0
Dr(P)
mr, r < -'[-00.
Moreover,
q < r =¢. Dq(P) < Dr(P)
and
lim Dr(P) = D.
r-+OO
14. Self-similar sets and measures 203
Proof
That D ~ ( P ) exist and equals D follows from Proposition 14.7 and Proposition 14.13.
The remaining statements follow from Theorem 14.14 and Lemma 14.16. []
14.18 P r o b l e m
Does Theorem 14.17 hold under the weaker assumption that ($1,... , SN) satisfies
the open set condition?
14.19 R e m a r k
It is shown by Kawabata and Dembo (1994, Theorem 4.1), that under the assumptions
of Theorem 14.17 the rate distortion dimension of P equals
N
p~ log p~
i=1
N
Pi log si
i=1
and, therefore, equals the Hausdorff dimension dimn(P) of P by Cawley and Mauldin
(1992, Theorem 2.1).
In the preceding sections we have shown that, for many self-similar probabilities P,
the inequality
holds for r E [1, col and the quantization dimension Dr of order r for P. It is, there-
fore, natural to investigate the problem under what conditions the above sequence
has a finite and positive limit. Taking r < co and generalizing Theorem 6.2 and (6.4)
the ~7-th power of this limit, if it exists, is called the r-th quantization coefficient
1
Qr(P). For r = co, lim n~e,~,~, if it exists, is called the covering coefficient or
r~-+oo
quantization coefficient of order co and does only depend on supp(P) ((10.10) and
Theorem 10.7). Little seems to be known about the above problem. We will first
state a positive result concerning the quantization coefficient of order co. To this end
we need the following definition.
14.20 D e f i n i t i o n
An N-tuple (Sl,... , sN) of real numbers is called a r i t h m e t i c if there is a positive
s E R with Sl,... ,SN E s Z : = {sn: n E Z}.
14.21 T h e o r e m
Let ($1,... , SN ) be an N-tuple of contracting similarity transformations of R d stat-
isfying the open set condition and let the corresponding N-tuple ( S l , . . . , SN) of con-
traction numbers be such that (logs1,... ,logsN) is not arithmetic. Let A be the
204 III. Asymptotic quantization for singular probability distributions
attractor of ($1,... , SN) and D its similarity dimension. Then (ne,,oo(A)D),er~ has
a finite and positive limit, hence the quantization coefficient Qoo(A) of A exists in
Proof
With an argument similar to that given in Remark 10.9 one can show that
lim nem~(A) D exists if and only limN(c)E D exists, where N(c) is the minimal num-
r/.--', OO ¢--~0
ber of balls of radius ¢ > 0 that cover A. If one of the limits exists then so does
the other and they agree. Due to a result of Lalley (1988) combined with a result
of Schief (1994), limN(¢)¢ D exists in (0, +oo) under the assumptions of the theorem
¢~t0
(see also Falconer, 1997, p. 123, Proposition 7.4). []
The following proposition shows that the quantization coefficient Qo~(A) need not
exist if the assumption is dropped that (log s b . • • , log s~v) is not arithmetic.
14.22 P r o p o s i t i o n
Let N >_ 2, ($1,... , SN), A, and D be as above but assume that ( S 1 , . . . , SN) satisfies
the strong separation condition and that all Si have the same contraction number s.
Then
0 < lim inf ne~ oo(A) D < lim sup ne~,oo (A) D < +oo.
Proof
According to Proposition 14.7 and 14.13 we know that
We claim that
for all n _> no, where [~] denotes the greatest integer less than or equal to ~. Setting
ni = [~] it follows from (14.20) that
N
Now let n l , . . . , nN C N satisfy 1 _< hi, ~ ni _< n, and
i=l
e,~,~(A) = se,~,,oo(A)
and, therefore,
{2no -}- 1~
-- )C,
which yields a contradiction and finishes the proof of the proposition• []
14.23 R e m a r k
It remains an open problem to characterize those serf-similar sets A for which the
quantization coefficient Q ~ ( A ) exists by a natural condition on the generating N-
tuple ( S l , • . . , SN). For 1 < r < co and general serf-similar probabilities P almost
nothing is known about the existence of the quantization coefficients Qr(P). The
only serf-similar probability P for which the existence of all quantization coefficients
Qr(P), 1 < r < co is known seems to be the restriction of Lebesgue measure to
the unit cube in R d. The classical Cantor distribution P on R has no quantization
coefficient Q2(P) (cf. Graf and Luschgy, 1997).
Below we will summarize the known results for the Cantor distribution.
Let S1,S2: R -+ R be defined by S i x = ~xl and S2x = .5xl +.5.~ Then $1 and $2
are similarity transformations with contraction number ~. The attractor of the pair
206 III. Asymptotic quantization for singular probability distributions
($1, $2) is the classical C a n t o r set C c [0,1]. The similarity dimension of ($1, $2)
equals D = ~log 3" Let P be the self-similar probability corresponding to ($1, $2,½,½)
(see (14.3)). According to (14.4) P is the normalized D-dimensional Hausdorff mea-
sure on C. This distribution is called the (classical) C a n t o r d i s t r i b u t i o n . Since
(S1, $2) satisfies the strong separation condition and since (s D, s D) = (½, ½) we know
from Theorem 14.17 that D is the quantization dimension of P of order r for all
r C [1, + o o 1. In the following theorem we will describe all optimal sets of n-centers,
the quantization errors V~,r (P), and all limits points of the sequence (n 2/DV,,,r(P)),~eN
for r - 2. In particular we show that the quantization coefficient Q2(P) does not
exist.
To do this we need some more notation. For a E {1, 2}* let ag = Sg(½). For n _> 1
let l(n) = [log 2 n]. For I C {1, 2} ~('0 with [I] -- n - 2° 0 let
Define f : [1, 2] -~ R by
1 2
f(x) ----~-~xZ(17- 8x).
14.24 T h e o r e m
Let P be the Cantor distribution and let D, l(n), an(I), and f be defined as above.
(a) For every natural number n >_ 1 the following conditions axe equivalent
V~ 2(P) = 1 1 (2/(n)+, _ n + 1
, 18,(~) . g ~(n - 2~(~))).
1 17
(Notice that =
In particular P has no second quantization coe~cient.
The proof is given in Graf and Luschgy (1997) and will be omitted here.
14. Self-similar sets and measures 207
Notes
Univariate distributions
The following univariate distributions served as examples. Recall that the r-th (ab-
solute) moment about the center of a real random variable X is given by
V~(X) = aE
infIR E ] X - a l l
N o r m a l distribution N(O,a 2)
The normal distribution is strongly unimodal. If X is N(0, a~)-distributed, then
~ 2/~-a2rF(r+l ~ r : > l .
V,(X)=EIXr = V T ~TJ' -
In particular
v, ( x ) = o vr ;.
where a > 0 is a scale parameter. The logistic distribution is symmetric about the
origin and strongly unimodal. The distribution function takes the form
1 e~/~
F ( x ) - 1 + e-Z~ " - 1 + e~/~
Suppose that X has distribution L(a). Then
oo
V r ( X ) = EIX[*" = 2arF(r + 1) E ( - 1 ) J - l j -r, r _~ 1
j=l
= 2 a r r ( r + 1)(1 -- 2 - ( r - x ) ) ¢ ( r ) , r > 1,
210 Appendix Univariate distributions
G e n e r a l i z e d L o g i s t i c d i s t r i b u t i o n GL(a, b)
D o u b l e E x p o n e n t i a l d i s t r i b u t i o n DE(a)
h(x) = l e - I ~ l / ~ , x ~ IR,
F ( x ) = ~ 2v ' x < O
I 1 - - ~i e-~/a , x > 0 .
I.
If X is DE(a)-distributed, then
V~(X) = E I X F -- a~r(1 + r), r _> 1.
In particular
VI(X) = a , V2(X) = 2a 2.
where a > 0 is a scale parameter and b > 0 is a shape parameter. We have DF(a, 1) =
DE(a). If X is DF(a, b)-distributed, then
v ~ ( x ) = E I X I r - a r r ( b + r)
r(b) , r > 1.
In particular
VI(X) = ab, V2(X) = a2b(b + 1).
Appendix Univariate distributions 211
Hyper-exponential d i s t r i b u t i o n HE(a, b)
U n i f o r m d i s t r i b u t i o n U([a, b])
T r i a n g u l a r d i s t r i b u t i o n T(a, b; c)
u~(x) = E X - a+b~
-5--
(b - a) r
-- (r q- 1)(r + 2)2 r-1 ' r > 1.
In particular
Vl(X)- b-a V2(X)- (b-a) 2
' 24
212 Appendix Univariate distributions
E x p o n e n t i a l d i s t r i b u t i o n E(a)
h(x) = 1-e-~/%0,=)(x),
a
If X is E(a)-distributed, then
F(x) =
{1 O, x <_ O.
Mud(X) = {alog2}, E X = a,
VI(X) = E I X - alog21 = a l o g 2 ,
V2(X) = V a r X = a 2.
where a > 0 is a scale parameter and b > 0 is a shape parameter. The Weibull
distribution is strongly unimodal for b >_ 1. The distribution function takes the form
F(x)=
{ 1-exp
0,
- ~ , x>0
x<0.
We have W(a, 1) = E(a). The distribution W(a, 2) is called Rayleigh distribution.
Let X be W(a, b)-distributed. Then
Mud(X) = {a(log 2)1/b},
E X " = arF (1 + b) , r _> 0,
Let b = 2. Then
V~(X) = EIX - a ~ [
3
71-
v~(x) = a~(1 - -~).
For
a= [1Vi~ + ~ ( ~ - 2¢(~))]-' = 2.7027...
one obtains Vl(X) = 1.
Appendix Univariate distributions 213
h(~)- 1 ~_~_~/o. ,,
abF(b) X e 1(o,¢¢)(x),
G e n e r a l i z e d G a m m a d i s t r i b u t i o n GF(a, b, c)
h(x) = ba%-(b+l)l(a,oo)(x),
where a > 0 is a scale parameter, b > 0. The distribution function takes the form
Med(X) = {a21/b},
214 Appendix Univariate distributions
arb
EX r - , b > r ~ 0,
b-r
B(X) = E I X - a2~lbl - ab(211b - 1)
b-1 , b>l,
a2b
v:(x) = Va~X = (b - 2 ) ( b - 1)~ ' b > 2.
Cantor distribution
Let C c [0, 1] be the (classical) Cantor set and let D = ~mog3be the Hausdorff
dimension of C. Then the Cantor distribution P ist the normalized D-dimensional
Hausdorff measure on C. Define $1, $2: R --+ R by S i x = ~xl and S 2 x = ~xl + ~"
Then P is the unique Borel probability on R with
p= l(ps, + e,~).
1 V2(X) = V a r X = ~1
E(X) = 3'
Bibliography
Abaya, E.F. and Wise, G.L. (1981). Some notes on optimal quantization. Pro-
ceedings of the International Conference on Communications (Denver, Colorado),
30.7.1-10.7.5. IEEE Press, New York.
Abaya, E.F. and Wise, G.L. (1984). Convergence of vector quantizers with applica-
tions to optimal quantizers. SIAM J. Appl. Math. 44, 183-189.
Abut, H., editor (1990). Vector Quantization. IEEE Press, New York.
Adams Jr., W.C. and Giesler, C.E. (1978). Quantizing characteristics for signals hav-
ing Laplacian amplitude probability density function. IEEE Trans. Communications
26, 1295-1297.
Agrell, E. and Eriksson, T. (1998). Optimization of lattices for quantization. IEEE
Trans. Inform. Theory 44, 1814-1828.
Anderberg, M.R. (1973). Cluster Analysis for AppLications. Academic Press, San
Diego.
Aurenhammer, F. (1991). Voronoi diagrams: A survey of a fundamental geometric
data structure. ACM Computing Surveys 23, 345-405.
Baranovskii, E.P. (1965). Local density minima of a lattice covering of a four-
dimensional Euclidean space by equal spheres. Soviet Math. Dokl. 6, 1131-1133.
Barnes, E.S. and Sloane, N.J.A. (1983). The optimal lattice quantizer in three di-
mensions. SIAM J. Algebraic Discrete Methods 4, 30-41.
Barnsley, M. (1988). Fractals Everywhere. Academic Press, London.
Bartlett, P.L., Linder, T., and Lugosi, G. (1998). The minimax distortion redundancy
in empirical quantizer design. IEEE Trans. Inform. Theory 44, 1802-1813.
Benhenni, K. and Cambanis, S. (1996). The effect of quantization on the performance
of sampling designs. Techn. Report No. 481, Center for Stoch. Processes, Univ. of
North Carolina, Chapel Hill.
Bennett, W.R. (1948). Spectra of quantized signals. Bell Systems Tech. J. 27,
446-472.
Bock, H.H. (1974). Automatische Klassifikation. Vandenhoeck and Ruprecht,
GSttingen.
216 Bibliography
Cuesta-Albertos, J.A., Gordaliza, A., and Matr£n, C. (1998). Trimmed best k-nets:
a robustified version of an L~-based clustering method. Statist. Probab. Letters 36,
401-413.
Cuesta-Albertos, J.A., Garci£-Escudero, L.A., and Gordaliza, A. (1999). Trimmed
best k-nets: asymptotics and applications. Preprint.
Dalenius, T. (1950). The problem of optimum stratification. Scandinavisk Aktuari-
etidskrift 33, 203-213.
David, G. and Semmes, S. (1993). Analysis of and on Uniformly Rectifiable Sets.
Mathematical Surveys and Monographs, Vol. 38, American Mathematical Society,
Rhode Island.
David, G. and Semmes, S. (1997). Fractured Fractals and Broken Dreams. Clarendon
Press, Oxford.
Dharmadhikari, S. and Joag-Dev, K. (1988). Unimodality, Convexity and Applica-
tions. Academic Press, Boston.
Diday, E. and Simon, J.C. (1976). Clustering analysis. Digital Pattern Recognition,
47-94 (ed., K.S. Fu). Springer, New York.
Elias, P. (1970). Bounds and asymptotes for the performance of multivariate quan-
tizers. Ann. Math. Statist. 41, 1249-1259.
Eubank, R.L. (1988). Optimal grouping, spacing, stratification, and piecewise con-
stant approximation. SIAM Review 30, 404-420.
Falconer, K.J. (1985). The Geometry of Fractal Sets. Cambridge University Press,
Cambridge.
Falconer, K.J. (1990). Fractal Geometry. Wiley, Chicester.
Falconer, K.J. (1997). Techniques in Fractal Geometry. Wiley, Chicester.
Fang, K.-T. and Wang, Y. (1994). Number-theoretic Methods in Statistics. Chapman
and Hall, London.
Federer, H. (1969). Geometric Measure Theory, Springer, Berlin-Heidelberg-New
York.
Fejes T6th, L. (1959). Sur la repr6sentation d'une population infinie par un nombre
fini d' @16ments. Acta Math. Acad. Sci. Hung. 10, 299-304.
Fejes T6th, L. (1972). Lagerungen in der Ebene, anf der Kugel und im Raum. Second
Edition. Springer, Berlin.
Fleischer, P.E. (1964). Sufficient conditions for achieving minimum distortion in a
quantizer. IEEE Int. Cony. Rec., part 1, 104-111.
Flury, B.A. (1990). Principal points. Biometrika 77, 33-41.
218 Bibfiography
Forney Jr., G.D. (1993). On the duality of coding and quantization. Coding and
Quantization, 1-14 (eds., R. Calderbank et al.). DIMACS Vol. 14, American Mathe-
matical Society.
Fort, J.C. and Pages, G. (1999). Asymptotics of optimal quantizers for some scalar
distributions. Preprint.
Garci£-Escudero, L.A., Gordaliza, A., and MatrOn, C. (1999). A central limit theo-
rem for multivariate generalized trimmed k-means. Ann. Statist. 27, 1061-1079.
Gardner, W.R. and Rao, B.D. (1995). Theoretical analysis of the high rate vector
quantization of LPC parameter. IEEE Trans. Speech Audio Processing 3, 367-381.
Garkavi, A.L. (1964). The best possible net and the best possible cross-section of a
set in a normed space. Amer. Math. Soc. Translations 39, 111-132.
Gersho, A. (1979). Asymptotically optimal block quantization. IEEE Trans. Inform.
Theory 25, 373-380.
Gersho, A. and Gray, R.M. (1992). Vector Quantization and Signal Compression.
Kluwer, Boston.
Gilat, D. (1988). On the ratio of the expected maximum of a martingale and the
Lp-norm of its last term. Israel J. Math. 63, 270-280.
Goddyn, L.A. (1990). Quantizers and the worst-case Euclidean traveling salesman
problem. J. Combinatorial Theory Series B 50, 65-81.
Graf, S. and Luschgy, H. (1994a). Foundations of quantization for random vectors.
Research Report No. 16, Applied Mathematics and Computer Science, University of
Miinster.
Graf, S. and Luschgy, H. (1994b). Consistent estimation in the quantization problem
for random vectors. Trans. Twelfth Prague Conf. Inform. Theory, Stat. Decision
Functions, Random Processes, 84-87.
Graf, S. and Luschgy, H. (1996). The quantization dimension of self-similar sets.
Research Report No. 9, Dept. of Mathematics and Computer Science, University of
Passau.
Graf, S. and Luschgy, H. (1997). The quantization of the Cantor distribution. Math.
Nachrichten 183, 113-133.
Graf, S. and Luschgy, H. (1999a). Quantization for random vectors with respect to
the Ky Fan metric. Submitted.
Graf, S. and Luschgy, H. (1999b). Quantization for probability measures with respect
to the geometric mean error. Submitted.
Graf, S. and Lusehgy, H. (1999c). Rates of convergence for the empirical quantization
error. Submitted.
Gray, R.M. (1990). Source Coding Theory. Kluwer, Boston.
Bibfiography 219
Gray, R.M., Neuhoff, D.L., and Shields, P.C. (1975). A generalization of Ornstein's
distance with applications to information theory. Ann. Probab. 3, 315-328.
Gray, R.M. and Davisson, L.D. (1975). Quantizer mismatch. IEEE Trans. Commu-
nications 23, 439-443.
Gray, R.M. and Karnin, E.D. (1982). Multiple local optima in vector quantizers.
IEEE Trans. Inform. Theory 28, 256-261.
Gray, R.M. and Neuhoff, D.L. (1998). Quantization. IEEE Trans. Inform. Theory
44, 2325-2383.
Gruber, P. (1974). 0ber kennzeichnende Eigenschaften yon euklidischen Pd4umen und
Ellipsoiden I. J. Reine Angew. Math. 265, 61-83.
Gruber, P.M. and Lekkerkerker, C.G. (1987). Geometry of Numbers. Second Edition.
North-Holland, Amsterdam.
Griinbanm, B. and Shephard, G.C. (1986). Tilings and Patterns. Freeman and
Company, New York.
Haimovich, M. and Magnati, T.L. (1988). Extremum properties of hexagonal parti-
tioning and the uniform distribution in euclidean location. SIAM J. Discrete Math.
1, 50-64.
Hartigan, J.A. (1978). Asymptotic distributions for clustering criteria. Ann. Statist.
6, 117-131.
Hochbaum, D. and Steele, J.M. (1982). Steinhaus's geometric location problem for
random samples in the plane. Adv. Appl. Probab. 14, 56-67.
Hoffmann-Jorgensen, J. (1994). Probability with a View Toward Statistics. Vol. 1.
Chapman and Hall, New York.
Hutchinson, J.E. (1981). Fractals and self-similarity. Indiana Univ. Math. J. 30,
713-747
Iyengar, S. and Solomon, H. (1983). Selecting representative points in normal pop-
ulations. Recent Advances in Statistics, Papers in Honor of H. Chernoff, 579-591.
Academic Press.
Jahnke, H. (1988). Clusteranalyse als Verfahren der schliet3enden Statistik. Vanden-
hoeck and Ruprecht, GSttingen.
Johnson, M.E., Moore, L.M., and Ylvisaker, D. (1990). Minimax and maximin dis-
tance designs. J. Statist. Plann. Inference 26, 131-148.
Karlin, S. (1982). Some results on optimal partitioning of variance and monotonicity
with truncation level. Statistics and Probability: Essays in Honor of C. R. Rao,
375-382 (eds., G. Kallianpur et al.). North-Holland, Amsterdam.
Kawabata, T. and Dembo, A. (1994). The rate distortion dimension of sets and
measures. IEEE Trans. Inform. Theory 40, 1564-1572
220 Bibliography
McClure, D.E. (1980). Optimized grouping methods. Part 1 and part 2. Statistik
Tidskrift 18, 101-110, 189-198.
McGivney, K. and Yukich, J.E. (1997). Asymptotics for geometric location problems
over random samples. Preprint.
McMullen, P. (1980). Convex bodies which tile space by translation. Mathematika
27, 113-121. (Acknowledgement of priority: Mathematika 28, 191.)
Milasevic, P. and Ducharme, G.R. (1987). Uniqueness of the spatial median. Ann.
Statist. 15, 1332-1333.
Moiler, J. (1994). Lectures on Random Voronoi Tesselations. Lecture Notes in Statis-
tics 87. Springer, New York.
Moran, P.A.P. (1946). Additive functions of intervals and Hausdorff measure. Proc.
Cambridge Phil. Soc. 42, 15-23.
Na, S. and Neuhoff, D.L. (1995). Bennett's integral for vector quantizers. IEEE
Trans. Inform. Theory 41,886-900.
Newman, D.J. (1982). The Hexagon theorem. IEEE Trans. Inform. Theory 28,
137-139.
Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Meth-
ods. CBMS-NSF Regional Conference Series in Applied Math. Vol. 63. SIAM.
Okabe, A., Boots, B. and Sugihara, K. (1992). Spatial Tesselations: Concepts and
Applications of Voronoi Diagrams. Wiley, Chicester.
Pages, G. (1997). A space quantization method for numerical integration. J. Comput.
Appl. Math. 89, 1-38.
P~rna, K. (1988). On the stability of k-means clustering in metric spaces. Tartu
Riikliku 01ikooli Toimetised 798, 19-36.
P~rna, K. (1990). On the existence and weak convergence of k-centres in Banaeh
spaces. Tartu Ulikooli Toimetised 893, 17-28.
Panter, P.F. and Dite, W. (1951). Quantization distortion in pulse-count modulation
with nonuniform spacing of levels. Proc. Inst. Radio Eng. 39, 44-48.
Pearlman, W.A. and Senge, G.H. (1979). Optimal quantization of the Rayleigh prob-
ability distribution. IEEE Trans. Communications 27, 101-112.
Pierce, J.N. (1970). Asymptotic quantizing error for unbounded random variables.
IEEE Trans. Inform. Theory 16, 81-83.
Pisier, G. (1989). The Volume of Convex Bodies and Banach Space Geometry. Cam-
bridge University Press, Cambridge.
Pollard, D. (1981). Strong consistency of k-means clustering. Ann. Statist. 9,
135-140.
222 Bib~ography
Pollard, D. (1982a). Quantization and the method of k-means. IEEE Trans. Inform.
Theory 28, 199-205.
Pollard, D. (1982b). A central limit theorem for k-means clustering. Ann. Probab.
10, 919-926.
PStzelberger, K. and Felsenstein, K. (1994). An asymptotic result on principal points
for univariate distributions. Optimization 28, 397-406.
PStzelberger, K. (1998a). Asymptotik des Quantisierungsfehlers. Quantisierungsdi-
mension, Verallgemeinerung des Satzes von Zador und Verteilung der Prototypen.
Preprint.
PStzelberger, K. (1998b). Asymptotik des empirischen Quantisierungsfehlers und
Konsistenz des Sch~tzers oder Quantisierungsdimension. Preprint.
PStzelberger, K. and Strasser, H. (1999). Clustering and quantization by MSP-
partitions. Preprint.
Rachev, S.T. (1991). Probability Metrics and the Stability of Stochastic Models.
Wiley, Chicester.
Rachev, S.T. and Riischendorf, L. (1998). Mass Transportation Problems. Vol. 1
and Vol. 2. Springer, New York.
Rdnyi, A. (1959). On the dimension and entropy of probability distributions. Acta
Math. Sci. Hung. 10, 193-215.
Rhee, W.T. and Talagrand, M. (1989a). A concentration inequality for the k-median
problem. Math. Oper. Res. 14, 189-202.
Rhee, W.T. and Talagrand, M. (1989b). On the k-center problem with many centers.
Oper. Res. Letters 8, 309-314.
Rogers, C.A. (1957). A note on coverings. Mathematika 4, 1-6.
Sabin, M.J. and Gray, R.M. (1986). Global convergence and empirical consistency of
the generalized Lloyd algorithm. IEEE Trans. Inform. Theory 32, 148-155.
Schief, A. (1994). Separation properties for self-similar sets. Proc. Amer. Math.
Soc. 122, 111-115.
Schulte, E. (1993). Tilings. Handbook of Convex Geometry, 899-932. (eds., P.M.
Gruber and J.M. Wills). Elsevier Sciene Publishers.
Semadeni, Z. (1971). Banach Spaces of Continuous Functions. Polish Scientific Pub-
lishers, Warszawa.
Serinko, R.J. and Babu, G.J. (1992). Weak limit theorems for univariate k-mean
clustering under a nonregular condition. J. Multivariate Anal. 41, 273-296.
Serinko, R.J. and Babu, G.J. (1995). Asymptotics of k-mean clustering under non-
i.i.d, sampling. Statist. Probab. Letters 24, 57-66.
Bibfiography 223
Shannon, C.E. (1959). Coding theorems for a discrete source with a fidelity criterion.
IRE National Convention Record, Part 4, 142-163.
Singer, I. (1970). Best Approximation in Normed Linear Spaces by Elements of Linear
Subspaces. Springer, Berlin.
Small, C.G. (1990). A survey of multidimensional medians. Int. Statist. Review 58,
263-277.
Sp~th, H. (1985). Cluster Dissection and Analysis. Ellis Horwood Limited, Chich-
ester.
Stadje, W. (1995). Two asymptotic inequalities for the stochastic traveling salesman
problem. Sankhy~ 57, Series A, 33-40.
Steinhaus, H. (1956). Sur la division des corps materiels en parties. Bull. Acad.
Polon. Sci. 4, 801-804.
Stute, W. and Zhu, L.X. (1995). Asymptotics of k-means clustering based on pro-
jection pursuit. Sankhy~ 57, Series A, 462-471.
Su, Y. (1997). On the asymptotics of quantizers in two dimensions. J. Multivariate
Anal. 61, 67-85.
Tarpey, T. (1994). Two principal points of symmetric, strongly unimodal distribu-
tions. Statist. Probab. Letters 20, 253-257.
Tarpey, T. (1995). Principal points and self-consistent points of symmetric multivari-
ate distributions. J. Multivariate Anal. 53, 39-51.
Tarpey, T. (1998). Serf-consistent patterns for symmetric multivariate distributions.
J. Classification 15, 57-79.
Tarpey, T., Li, L., and Flury, B.D. (1995). Principal points and self-consistent points
of elliptical distributions. Ann. Statist. 23, 103-112.
Tou, J.T. and Gonzales, R.C. (1974). Pattern Recognition Principles. Addison-
Wesley, Reading.
Trushkin, A.V. (1984). Monotony of Lloyd's method II for log-concave density and
convex error weighting function. IEEE Trans. Inform. Theory 30, 380-383.
Vajda, I. (1989). Theory of Statistical Inference and Information. Kluwer, Dordrecht.
Wagner, T.J. (1971). Convergence of the nearest neighbor rule. IEEE Trans. Inform.
Theory 17, 566-571.
Webster, R. (1994). Convexity. Oxford University Press, Oxford.
Williams, G. (1967). Quantization for minimum error with particular reference to
speech. Electronics Letters 3, 134-135.
Wong, M.A. (1982). Asymptotic properties of bivariate k-means clusters. Comm.
Statist. Theory Methods. 11, 1155-1171.
224 Bibliography
A symmetric difference, 27
~x point mass at x, 25
r(a,b) Gamma distribution, 72
r(e) maximal finite antichain generated by e, 192
1-dimensional Lebesgue measure, 55
A~ d-dimenisonal Lebesgue measure, 13
#(-IA) 26
pr Lr-minimM metric, 33, 140
(7- immediate predecessor of a, 191
Iol length of a, 191
tT[m restriction of a to m, 191
(7_<T a is a predecessor of 7-, 191
IAI cardinality of A, 13
1A indicator function of the set A, 31
D
----+ weak convergence, 57
0 boundary, 10
Lr(P)-norm of g, 137
IJhll~ Lp(Ag)-(quasi-)norm of h, 78
228 Symbols
packing, 50 tesselation, 15
parametrization, 180 triangular distribution, 123
parametrization by arc length, 180 truncated octahedron, 118
Pareto-distribution, 99
uniform distribution, 20
polyhedral set, 17
unimodal distribution, 64
polytope, 17
Uniqueness theorem, 22, 64
predecessor, 191
upper box dimension, 159
product quantizer, 42
upper quantization dimension of order
quantization coeffÉcient of order co, 145 r, 155
quantization dimension of order r, 155 upper rate distortion dimension, 161