Download as pdf or txt
Download as pdf or txt
You are on page 1of 237

Lecture Notes in Mathematics 1730

Editors:
A. Dold, Heidelberg
E Takens, Groningen
B. Teissier, Paris
Springer
Berlin
Heidelberg
New York
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo
Siegfried Graf Harald Luschgy

Foundations of
Quantization for
Probability Distributions

~ Springer
Authors
Siegfried Graf
Faculty for Mathematics and Computer Science
University of Passau
94030 Passau, Germany
E-mail: graf@ fmi.uni-passau.de

Harald Luschgy
FB IV, Mathematics
University of Trier
54286 Trier, Germany
E-maih luschgy@uni-trier.de

Cataloging-in-Publication Data applied for


Die Deutsche Bibliothek - CIP-Einheitsaufnahme

Graf, Siegfried:
Foundations of quantization for probability distributions / Siegfried
G r a f ; Harald Luschgy. - Berlin ; Heidelberg ; New York ; Barcelona ;
Hong Kong ; London ; Milan ; Paris ; Singapore ; Tokyo : Springer,
2000
(Lecture notes in mathematics ; 1730)
ISBN 3-540-67394-6

Mathematics Subject Classification (2000): 60Exx, 62H30, 28A80, 90B05,


94A29
ISSN 0075- 8434
ISBN 3-540-67394-6 Springer-Verlag Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, re-use
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other
way, and storage in data banks. Duplication of this publication or parts thereof is
permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from
Springer-Verlag. Violations are liable for prosecution under the German Copyright
Law.
Springer-Verlag is a company in the BertelsmannSpringer publishing group.
© Springer-Verlag Berlin Heidelberg 2000
Printed in Germany
The use of general descriptive names, registered names, trademarks, etc. in this
publication does not imply, even in the absence of a specific statement, that such
names are exempt from the relevant protective laws and regulations and therefore
free for general use.
Typesetting: Camera-ready TEX output by the author
Printed on acid-free paper SPIN: 10724973 41/3143/du 543210
Contents

List of Figures VIII

List of Tables IX

Introduction 1

I General properties of the quantization for probability distributions 7


1 Voronoi partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1 General norms . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Euclidean norms . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 C e n t e r s a n d m o m e n t s of p r o b a b i l i t y d i s t r i b u t i o n s ........... 20

2.1 U n i q u e n e s s a n d c h a r a c t e r i z a t i o n of c e n t e r s . . . . . . . . . . . 20

2.2 M o m e n t s of b a l l s . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 The quantization problem . . . . . . . . . . . . . . . . . . . . . . . . 30

4 B a s i c p r o p e r t i e s of o p t i m a l q u a n t i z e r s . . . . . . . . . . . . . . . . . . 37

4.1 Stationarity and existence .................... 37


4.2 T h e f u n c t i o n a l V,~,r . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3 Q u a n t i z a t i o n e r r o r for b a l l p a c k i n g s . . . . . . . . . . . . . . . 50
4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.5 Stability properties and empirical versions ........... 57

5 U n i q u e n e s s a n d o p t i m a l i t y in o n e d i m e n s i o n .............. 64

5.1 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2 Optimal Quantizers . . . . . . . . . . . . . . . . . . . . . . . . 66

II Asymptotic quantization for nonsingular probability distributions 77


6 A s y m p t o t i c s for t h e q u a n t i z a t i o n e r r o r ................. 77
vi Contents

? Asymptotically optimal quantizers .................... 93

7.1 Mixtures and partitions ..................... 93

7.2 Empirical measures . . . . . . . . . . . . . . . . . . . . . . . . 96

7.3 A s y m p t o t i c o p t i m a l i t y in o n e d i m e n s i o n ............ 99

7.4 Product quantizers . . . . . . . . . . . . . . . . . . . . . . . . 102

8 R e g u l a r q u a n t i z e r s a n d q u a n t i z a t i o n coefficients ............ 106

8.1 B a l l lower b o u n d . . . . . . . . . . . . . . . . . . . . . . . . . 107

8.2 Space-filling figures, r e g u l a r q u a n t i z e r s a n d u p p e r b o u n d s . . . 107

8.3 Lattice quantizers . . . . . . . . . . . . . . . . . . . . . . . . . 111

8.4 Q u a n t i z a t i o n coefficients of o n e - d i m e n s i o n a l d i s t r i b u t i o n s . . . 121

9 R a n d o m q u a n t i z e r s a n d q u a n t i z a t i o n coefficients . . . . . . . . . . . . 127

9.1 A s y m p t o t i c s for r a n d o m q u a n t i z e r s ............... 127

9.2 Random quantizer upper bound ................. 130

9.3 d-asymptotics and entropy .................... 132

10 A s y m p t o t i c s for t h e c o v e r i n g r a d i u s . . . . . . . . . . . . . . . . . . . 137
10.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . 137

10.2 Asymptotic covering radius ................... 141

10.3 C o v e r i n g r a d i u s of l a t t i c e s a n d b o u n d s ............. 146

10.4 Stability properties and empirical versions ........... 151

III Asymptotic quantization for singular probability distributions 155


11 The quantization dimension . . . . . . . . . . . . . . . . . . . . . . . 155

11.1 Definition and elementary properties .............. 155

11.2 Comparison to the Hausdorif dimension ............ 157

11.3 Comparison to the box dimension ................ 158

11.4 Comparison to the rate distortion dimension .......... 161

12 R e g u l a r sets a n d m e a s u r e s of d i m e n s i o n D ............... 165


12.1 Definition and examples ..................... 165

12.2 A s y m p t o t i c s for t h e q u a ~ t i z a t i o n e r r o r ............. 173

13 Rectifiable curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

14 Self-similar sets a n d m e a s u r e s ...................... 190

14.1 Basic notion and facts ...................... 190


Contents vii

14.2 A n upper b o u n d for the quantization dimension . . . . . . . . 192


14.3 A lower bound for the quantization dimension . . . . . . . . . 195
14.4 The quantization dimension . . . . . . . . . . . . . . . . . . . 199
14.5 The quantization coefficient . . . . . . . . . . . . . . . . . . . 203

Appendix Univariate distributions 209

Bibliography 215

Symbols 225

Index 229
List of Figures

1.1 Voronoi diagram of a finite set in R 2 with respect to t h e / p - n o r m for


(a) p - - 2 and (b) p = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Voronoi region and separator with respect to t h e / 1 - n o r m . . . . . . . 12
1.3 Voronoi region with respect t o / 2 - n o r m which is not a polyhedral set . 17
1.4 Unbounded Voronoi region with respect to t h e / 1 - n o r m generated by
an interior point of conv (~ . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1 C t ( P ) and Cr(P),r > 1, with respect to t h e / ~ - n o r m for a discrete
probability P with two supporting points . . . . . . . . . . . . . . . . 26
3.1 Quantization scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 2-optimal centers of order 1 with respect to the I~ norm ....... 41
4.2 Square quantizer for U([0, 1]2) . . . . . . . . . . . . . . . . . . . . . . 52
4.3 3- and 4-stationaxy sets of centers for P = N2 (0,/2) of order r = 2 and
Voronoi diagrams with respect to t h e / 2 - n o r m . . . . . . . . . . . . . 58
5.1 2-optimal centers of order 2 ....................... 65

8.1 Tesselation of [0, 1]2 into m -- 6 regular hexagons and a boundary


region, n ----10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.2 Voronoi region W(01A) with respect to t h e / 1 - n o r m for a nonadmissible
lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.3 Voronoi region W(0[A) with respect to t h e / 2 - n o r m for the hexagonal
lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.4 Truncated octahedron .......................... 119
8.5 Densities of hyper-exponential distributions P = H(a, b) with variance
equal to one and Q2(P) :- 1.8470 (top), Q2(P) -- 3.3106 (center),
Q2(P) -- 8.1000 (bottom) . . . . . . . . . . . . . . . . . . . . . . . . . 122
List of Tables

5.1 n - o p t i m a l centers a n d n - t h q u a n t i z a t i o n error for the n o r m a l distribu-


tion N(0, 1) of order r = 2 . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2 Logistic d i s t r i b u t i o n L ( ~ ) , r = 2 .................... 71


5.3 D o u b l e e x p o n e n t i a l d i s t r i b u t i o n D E ( ~ 2 ) , r -- 2 ............ 72

5.4 E x p o n e n t i a l d i s t r i b u t i o n E ( 1 ) , r -- 2 .................. 72
5.5 G a m m a d i s t r i b u t i o n F ( ~ , 2), r = 2 .................. 72

5.6 Rayleigh d i s t r i b u t i o n W ( 4:~_~ , 2), r = 2 ................ 73

5.7 N o r m a l d i s t r i b u t i o n N ( 0 , ~), r = 1 ................... 73


5.8 Logistic d i s t r i b u t i o n L ( ~ )1, r -- 1 . . . . . . . . . . . . . . . . . . . 73

5.9 Double exponential distribution D E ( I ) , r = 1 . . . . . . . . . . . . . 74


5.10 E x p o n e n t i a l d i s t r i b u t i o n E ( ~ )1, r = 1 ................. 74

5.11 G a m m a d i s t r i b u t i o n F(a, 2), a = 0 . 9 5 0 8 . . . , r -- 1 . . . . . . . . . . . 74


5.12 Rayleigh d i s t r i b u t i o n W(a, 2), a = 2 . 7 0 2 7 . . . , r -- 1 . . . . . . . . . . 75

7.1 P r o b a b i l i t y d i s t r i b u t i o n s Pr ....................... 98

7.2 Probability distributions (~ P) r .................... 99

7.3 r -- 2, V2(P) -- 1. Q u a n t i z a t i o n error for ~ - q u a n t i l e s (first line) a n d


i
n~-i-quantiles (second line) of P2, 1 < i < n . . . . . . . . . . . . . . . 102

7.4 r = 1, VI(P) -- 1. Q u a n t i z a t i o n error for ~ - q u a n t i l e s (first line) a n d


n ~ v q u a n t i l e s (second line) of PI, 1 < i < n . . . . . . . . . . . . . . . 102
8.1 Q u a n t i z a t i o n coefficients . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.2 r -- 2. Q u a n t i z a t i o n coefficients of d i s t r i b u t i o n s P w i t h V2(P) = 1 . . 124
8.3 r -- 1. Q u a n t i z a t i o n coefficients of d i s t r i b u t i o n s P with VI(P) = 1 . . 125
9.1 /2-norm, r = 2. Ball lower b o u n d a n d r a n d o m q u a n t i z e r u p p e r b o u n d
for Q2([0,1] d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
X List of Tables

9.2 /1-norm, r = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132


9.3 Differential entropies. ~ = F I / F , "y = Euler's constant =- 0.5772 . . . . 136
9.4 Q u a n t i z a t i o n coefficients for p r o d u c t probability measures up to
Qr ([0,1] 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Introduction

The term "quantization" in the title originates in the theory of signal processing. It
was used by electrical engineers starting in the late 40's. In this context quantization
means a process of discretising signals and should not be mistaken for the same term in
quantum physics. As a mathematical topic quantization for probability distributions
concerns the best approximation of a d-dimensional probability distribution P by a
discrete probability with a given number n of supporting points or in other words,
the best approximation of a d-dimensional random vector X with distribution P by
a random vector Y with at most n values in its image. It turns out that for the error
measures used in this book there is always a best approximation of the form f ( X ) , a
"quantized version of X". The quantization problem can be rephrased as a partition
problem of the underlying space which explains the term quantization.
Much of the early attention in the engineering and statistical literature was concen-
trated on the one-dimensional quantization problem. See Bennett (1948), Panter and
Dire (1951), Lloyd's 1957 paper (published 1982), Dalenius (1950), and Cox (1957).
Steinhaus (1956) was apparently the first who explicitly dealt with the problem and
formulated it for general (3-dimensional) spaces. Since then quantization occurred in
various scientific fields, for instance

• Information theory (signal compression): Shannon (1959), Gersho and Gray


(1992)

• Cluster analysis (quantization of empirical measures), pattern recognition,


speech recognition: Anderberg (1973), Bock (1974), Diday and Simon (1976),
Tou and Gonzales (1974)

• Numerical integration: Pages (1997)

• Stochastic processes (sampling design): Bucklew and Cambanis (1988), Ben-


henni and Cambanis (1996)

• Mathematical models in economics (optimal location of service centers): Bol-


tob~s (1972,1973)

The aim of the present book is to describe the mathematical theory underlying the
different applications of quantization. The emphasis is on absolutely continuous as
2 Introduction

well as on singular (continuous) distributions on R d. In the nonsingular case we


present a rigorous treatment of known results in quantization including various new
aspects while the results for singular distributions seem to be completely new.
In more detail we consider the following problem.
We concentrate on norm-based error measures. Let II II be a norm on R g and 1 _< r <
oo. Define the Wasserstein-Kantorovitch Lr-metric p~ for probabilities/)1,/>2 by

P~(P1,/°2) = inf
{(f [Ix - Y[r d#(x, y) : # probability on R d x R a with

marginals P1 and P2 }

and the (minimal) quantization error of a given probability P and n c N by

V,~,~(P) = inf {p~(P,Q)~: [ supp(Q)[ _< n}


= inf { E IIx - S(X)ll~ : S: R ~ -* R a measurable, Is (R )I _< n}.
For an optimal quantizing rule f , i.e. f attaining the inf, the domains of constancy
provide P-almost surely a Voronoi partition of R a with respect to its respective values.
As a consequence the quantization problem is also equivalent to the n-centers problem,
which requires finding a set of n elements a which minimizes the expression

E ~nllX - all"

whose minimum value equals V~,~(P), that is

V,~,~(P) = inf{E ~ n l l X - allr: a C a d, I~1 -< n).

In Chapter I we present general properties of the quantization problem for a fixed


number n of quantizing levels. We discuss existence of optimal quantizers, neces-
sary conditions for optimality, and a sufficient conditon for uniqueness in the one-
dimensional case. Under this uniqueness condition it is easy to find numerically opti-
mal quantizers for P in the one-dimensional case. However, for dimensions d _> 2 it is
difficult even for small n to determine optimal quantizers. This is caused (among other
things) by the fact that the minimization function (al,. • • , an) ~-~ E min H X - aiH r is
l<i_<n
(typically) nonconvex. As examples we consider uniform distributions on ball pack-
ings and spherical distributions. Furthermore, we prove stability properties which
can be applied to the empirical analysis of the quantization problem.
Chapters II and III focuse on the asymptotic behaviour of the quantization problem
when the number n of quantizing levels tends to infinity. Chapter II starts with
the investigation of the order of convergence to zero for the sequence of quantization
errors (Vu,r(P))n_>l for probabilities P on R d whose absolutely continuous part does
not vanish. It is shown that the limit of the sequence (nzV,~,r(P))u>_lexists in (0, co),
provided a certain moment condition is satisfied. The limit

Qr(P) = lim nr/dv,~r(P)


Introduction 3

is called r-th quantization coefficient and can be expressed in terms of the r-th quan-
tization coefficient

Qr([o, 1] = dimoo (u
= inf n"/aV,~,r(U ([0, lld))
n>l

of the uniform distribution on the unit cube of R a and the density of the absolutely
continuous part of P with respect to Lebesgue measure. Quantization coefficients pro-
vide interesting parameters for probability distributions. They can be evaluated for
univariate distributions and some of them also for multivariate distributions. Funda-
mental work is due to Fejes Tdth (1959) and Zador (1963). Next we define asymptot-
ically optimal sequences of quantizers and sets of centers for nonsingular probability
distributions P and investigate their properties. It is proved that the empirical mea-
sures corresponding to asymptotically optimal sets of centers converge weakly to a
probability on R d which is explicitly given using P. Furthermore, the asymptotic
performance of certain classes of quantizers is compared to that of (asymptotically)
optimal quantizers. In particular, we consider regular quantizers which are based on
space-filling figures in R d, lattice quantizers, product quantizers, and random quan-
tizers. The results provide bounds for the quantization coefficients Q~ ([0,1]a).
All these considerations concern the ease 1 _< r < +oo and arbitrary norms on R e.
The rest of the chapter is devoted to the study of similar results for a geometric
covering problem which corresponds to the case r = c~. Here the quantization error
of a probability P (with compact support) and n c N is defined to be

e~,~(P) = inf{p~(P,Q): Isupp(Q)t < n}


= inf{esssup IIX - f ( x ) l l : If(Ra)l _< n}
= infacRd supxesupp(p) min~e~llx - all ,
lal<n
where Poo denotes the Loo-minimal metric, and coincides with the covering radius of
the most economical covering of supp(P) by at most n balls of equal radius, that is

e~,oo(P) = aca
dinf min { s > 0 : _ U B(a, s ) D supp(P)}.
I~l<__n ae~

The limit of the sequence (nl/ae,~,,,o(P)),,>_:exists in (0, c~) provided supp(P) is com-
pact Jordan measurable with positive (d-dimensional) volume. The limit

Qoo(supp(P)) = lim nl/%,,oo(P)


is called covering coefficient or quantization coefficient of order oo and can be ex-
pressed in terms of the covering coefficient Qoo ([0, t] a) of the unit cube and the vol-
ume of supp(P). The results for r = c~ cast new light on the quantization problem
for r < oo.
4 Introduction

Chapter III deals with the asymptotic behaviour of the quantization error for prob-
abilities P on R d which are singular with respect to Lebesgue measure. Following
Zador (1982) we introduce the concept of quantization dimension of order r. For
r E [1, +col define
fV~,r(P) 1/r if 1 _< r < co,
en r(P) "l

' [e.,oo(P) i f r = co.

and the quantization dimension of order r


log n
Dr(P) = limo° iloge~,r I

if this limit exists. We compare this concept of dimension to several concepts of


dimension which are used in fractal geometry or information theory, like Hausdorff
dimension, box dimension, and rate distortion dimension.
Then we consider the class of regular probabilities of dimension D on R d, where D
is a non-negative real number. A probability P on R a is regular of dimension D if P
has compact support and there is a constant c > 0 so t h a t

lSD <_ P(B(x, s)) <_cs D


C

for all balls B(x, s) whose center lies in the support of P and whose radius s is
smaller then a certain value So. Examples of this type of measures are the normalized
Lebesgue measure on a convex compact set, the normalized surface measure on a
convex compact set or a smooth compact manifold, and the normalized Hausdorff
measure on certain self-similar sets. For each regular probability P of dimension D
the quantization dimension of order r, r E [1, +co], is proved to be D, and moreover

0 < liminfne~,r(P) D < limsupne~,r(P) D < +co.


~--~ 00

Here en,r(P) is defined using an a r b i t r a r y norm on ]Rd.


For t h e / 2 - n o r m on ~d a n d P the normalized one-dimensional Hausdorff measure on
a rectifiable curve in R d of length L we show t h a t the quantization dimension of order
r, r E [1, +co], is one and t h a t

fQr([0, t])'/rL if 1 < r < + c o ,


lim hen r(P)
.-+o¢ ' [Qoo([O, 1])L if r = +co,

where Qr([O, 1]) = ~1-47


t 1 and Q~([0,1]) = ~.
Finally we deal with self-similar probabilities on R d. A probability P on R d is
self-similar if there are an N C N, N > 2, contracting similarity transformations
$1,... , SN of R d, and a probability vector ( P t , . . . P~¢) with
N
P =~ p~P o S~-t.
i=1
Introduction 5

P satisfies the strong separation property if the $1,..., Sly above can be chosen to
satisfy Si(supp(P)) 71Sj(supp(P)) = 0 for i ¢ j. If ( s l , . . . , sN) are the contractions
numbers corresponding to ($1,... , S~) then the similarity dimension is the unique
D E [0, +co) with s~ + . . . + s D = 1.
If the probability vector (pl,... ,ply) equals ( s ~ , . . . , s D) then the corresponding self-
similar probability P equals the normalized D-dimensional Hansdorff measure on the
support of P. If, in additon, P satisfies the strong separation condition, then the
quantization dimension Dr(P) of order r equals D for all r E [1, +co] and, moreover,

0 < liminfne~r < limsupneDr < +co.


~--+00 P -- n--+O0

If (pl,... ,PN) ¢ ( s D , - - - , S° ) and the strong separation condition holds then the
quantization dimension Dr(P) of the corresponding P satisfies
N

= 1
i=1

and D~(P) < Dr(P) if r < t. Still, for every r E [1, +co],

0 < liminfn~,r(P) DT(P)< limsupnen,r(P) DT(F) < +c¢.


I'1.-+ o o

Thus, self-similar probabilities constitute a class of probabilities for which the quanti-
zation dimensions of different orders do not all agree. It remains an open problem for
which probabilities lim nen,r(P) °~(e) exists, but it can be shown that for the classical
n--~oo
Cantor distribution this limit does not exist if r -- 2.
In the present book we do not intend to give a complete overview over the large
subject of quantization. We will focus on the quantization problem as stated earlier,
the so-called fixed rate quantization problem, and develop the underlying theory in
a mathematically rigorous way. For a comprehensive recent survey of the theory of
quantization including its historical development we refer the reader to the article
of Gray and Neuhoff (1998). This article also contains an extensive list of papers
published in electrical engineering journals on the subject.

A c k n o w l e d g e m e n t . Helpful comments by W. Quebbemann and F. Fehringer at


an early stage of the project are gratefully acknowledged. Thanks are due to H.
Strasser and M. Scheutzow for having invited the authors to give some lectures on
quantization at the Wirtschaftsuniversit~t Wien in May 1997 and the Technische
Universit£t Berlin in August 1997. We further wish to thank S. Stark for his help in
preparing the figures and doing the numerical computations.
Chapter I

General properties of the


quantization for probability
distributions

In this chapter we introduce the quantization problem for probability distributions


on R d with norm-based distortion measure and derive the basic features of optimal
quantizers. The investigation of optimal quantizers requires the concepts of Voronoi
diagrams and Voronoi partitions and the concepts of centers and moments of proba-
bility distributions on R d. This chapter also serves to develop the properties of these
notions as needed for the quantization problem.

1 Voronoi partitions

Voronoi partitions of R d will play a central role as optimal quantizing partitions for
probability distributions on R d. In this section we introduce Voronoi regions, Voronoi
diagrams and Voronoi partitions with respect to discrete point sets and describe some
of their basic properties.

1.1 General norms

Consider a nonempty subset c~ of R d. Throughout a is assumed to be locally finite in


the sense that the number of points of a within any bounded subset of R d is finite.
This implies that a is countable and closed. The quantization problem is associated
with finite point sets. However, in Chapter II we deal with lattices which are infinite
sets of regulary placed points.
Let ][ [[ denote any norm on R d. The Voronoi region generated by a c a is defined
8 L General properties of the quantization for probability distributions

by

(1.1) W(alo~ ) -- {= • R " : II= - all -- m i n [ix - bll }


bee
and {W(a[a): a E a } is called the V o r o n o i d i a g r a m of a; see Figure 1.1. Thus
W ( a l a ) consists of all points x such that a is a nearest point to x in a. The dependence
of the Voronoi regions (and of several other objects occuring later) on the norm is not
explicitely indicated. Most common norms a r e / f n o r m s given by [[x[[ = (~-~ [xi[P) 1/p
for 1 _< p < co and [[x[[ = ma~xl<_i<_d [xi[ for p = oo.

(a) (b)

Figure 1.1: Voronoi d i a g r a m of a finite set in R 2 with respect to t h e / p - n o r m for (a)


p - - 2 and (b) p = 1

A family A of subsets of R d is called l o c a l l y f i n i t e if the number of sets in A


intersecting any bounded subset of R ~ is finite. If x E R d and A is a nonempty subset
of R d, the distance from x to A is

d(x,A) = inf I[x - all.


aEA

The closed ball with center a E R d and radius r _> 0 is denoted by

B(a,r) = {x e Rd: IIz-all <r}.


1.1 P r o p o s i t i o n
The Voronoi diagram (W(a]c~) : a e a } is a IocalIy ~nite covering o f R d.

Proof
Let x E R d. Since locally finite subsets of R d are closed, there exists a E ~ such t h a t
IIx - all -- d(x, ~) and thus x E W(a[cQ. This proves t h a t the Voronoi d i a g r a m is a
1. Voronoi partitions 9

covering of R a, that is

[ J W ( a l ~ ) = R a.
aC~

Moreover, let 7 = {a ~ a : W(a[c~) n B ( 0 , s) # 0} with s > minlJall.


aEa
Choose
b E c~ n B(0, s). if a ~ 7, then there exists x e B(0, s) such that IIx - all -< Ilx - bl[
implying that

Ila[I _ IIz - bll + [Ixll < 211xll + [Ib[I _< 3s.

This gives 7 c B(O, 3s) and hence, 7 is finite. Thus the Voronoi diagram is locally
finite. []
The o p e n V o r o n o i r e g i o n generated by a E ~ is defined by

(1.2) Wo(al,~ ) = {x e Rd: IIx -- all < rain IIx - bll }.

These regions axe pairwise disjoint but do not provide a covering of R a. A Borel
measurable partition (A~ : a E a} of R d is called V o r o n o i p a r t i t i o n of R a with
respect to (~ (and P ) if

(1.3) Aa C W(alex ) (P-a.s.) for every a E c~,

where P denotes a Borel probability measure on R d. Proposition 1.1 shows that


Voronoi partitions o f R d with respect to ~ do exist and are locally finite. Furthermore,
the elements A~ of a Voronoi partition satisfy

Wo(aice ) c A~ for every a E 4.

The Voronoi regions are closed and star-shaped relative to their generator point, that
is, the line segment joining any x E W(alce ) and the point a is contained in W(aice ).
In case d = 1, where the underlying norm is throughout the absolute value, Voronoi
regions are closed intervals. For a, b E R a, let

(1.4) H(a,b) = {x E ~ta: IIx - all _< IIx - bll}.

be the Leibnitz "halfspace". Then H(a, b) = W(al{a, b}) and


(1.5) W(al~) = [-1 H(a, b).
bern

1.2 P r o p o s i t i o n
(a) W(al~ ) is closed and star-shaped relative to a.

(b) int W(alee ) = N int H(a, b) and Wo(alc~ ) is an open subset ofint W(alce ) which
bEce
is star-shaped relative to a. In particular, a E int W(alce) .
10 I. Genera] properties of the quantization for probability distributions

(c) a W ( ~ l a ) = U OH(a,b) n W(ala).


be-

Proof
(a) Let x E W(ala) and 0 < s < 1. The point y = sx + (1 - s)a on the line segment
joining a and x satisfies

IIx - yll + Ily - all = IIx - all.


Since

t]x - all < IIx - btt < IIx - Yll + IlY - bll for every b E a,

we obtain y E W(aia). This shows that W(ala) is star-shaped relative to a. The


Voronoi region is obviously closed.
(b) The region W0(aic~) = {x E R a : i]x _ ai] < d(x,a \ {a})} is open because the
distance function d(-, A) is continuous. As in the proof of (a) one shows that W0(aia)
is star-shaped relative to the point a. Furthermore, we have

int W(ala) C ~ int H(a, b) C W ( a l a ).


ben

It remains to show that Abe, int H(a, b) is open. This is clearly true if ~ is finite. So
assume a is not finite. Let x E Abe, int H(a, b) and set 7 = {b C o~ : lix-all = IIx-bI]}.
Since 7 c ~ A B(x, lix - all),3' is finite by the local finiteness of a. Therefore,

N i n t H ( a , b ) A W 0 ( a i ( a \ 7 ) U{a})
be'y

is an open subset of Abe, int H(a, b) containing x.


(c) follows immediately from (b). In fact, we have

o W ( a l ~ ) -- W(al~) n (int W(al~)) c


= U(intH(a,b))C n W(ala)
ben

= U OH(a, b) n W(~l~).
ben
[]

By the preceding proposition, one can find Voronoi partitions of Ra with respect to
a consisting of Borel sets A~ which are star-shaped relative to a E (~. In fact, let
a = {al, a z , . . . } be an enumeration of a and set, for instance,

A1 = W(alla),
Ak = W(aela) \ U W(aj I°~)
j<k
= W(akla) nWo(akl(al,... ,ak}), k >_ 2.
1. Voronoi partitions 11

Here ties are broken in favour of smaller indices. Some difficulties arise from the
fact that the intersection of different Voronoi regions may have interior points. This
corresponds to the fact that the separator of two points may have interior points; see
the subsequent Example 1.4. However, if the underlying norm is strictly convex this
cannot happen.
For a, b • R d, a ~ b, the s e p a r a t o r is defined by

(1.6) S(a, b) = {x • R'~: IIx - all = II x - bll}-

The separator contains the midpoint (a + b)/2 but no other point from the line
through a and b. The norm II II is said to be strictly convex if Ilxll = IlYll -- 1, x ¢ y
implies Ilsx + (1 - s)yll < 1 for every s • (0, 1). The/p-norms are strictly convex for
1 < p < oo, while the/1-norm and the/oo-norm are not strictly convex.

1.3 P r o p o s i t i o n ( S t r i c t l y c o n v e x n o r m s )
Suppose the underlying norm is strictly convex.

(a) int W ( a l a ) -- W0(al~).

(b) cOW(ala) : U S(a, b) n W(alo~),


bEa
b#=a
(c) W(al~)=clWo(ala).
Proof
The proof is based on the following observation. For b • a \ {a}, we have

(1.7) ~ ÷ (1 - s ) a • {y • R '~ : Ily - all < Ily - bll}

for every x • H(a, b), 0 < s < 1.


To verify this, let y = s x + ( 1 - s ) a w i t h O < s < 1. Notice that y • H(a,b) by
Proposition 1.2 (a) and ~(y - a) -- x - a. Assume IlY - all = Ily - bll- From the strict
convexity of the norm it follows that

Ily - a - s(b - a)ll = I1(1 - s)(y - a) + s(y - b)ll < IlY - all-

This gives

I]x- bll = Ill(y- a) - (b- a)l I

= lily - a - 8(b - a)ll

< ~llY - all = llx - all _< llx - bll,

a contradiction.
12 I. General properties of the quantization for probability distributions

(a) In view of Proposition 1.2 (b) we have to show for b • a \ (a}

intH(a,b) C {x • Rd: IIx - all < IIx - b{I}.

Let x • int H(a, b) with x ¢ a and choose e > 0 such that B ( x , c) C H(a, b). Let
t = 1 + c/]]x - ai] and z = a + t(x - a), Then

IIz - xll = II(t - 1)x - (t - 1)~11 = c

implying z • H(a, b). Since x -- -~z + (1 - ~)a


1 and 7 < 1, it follows from (1.7) that
IIz - all < IIx - bll.
(b) follows immediately from (a) and Proposition 1.2 (c).
(c) follows from (1.7). []

The preceding proposition implies

int W(al~ ) N int W(b[~) = 0, a, b C ~ , a ¢ b ,


(1.8) W(al~) n W(bl~) = OW(al(~ ) MOW(b]~), a,b C ~, a ~ b,
0W(al~) = U w(bl~) n w( al~) , a • ~.
bEc~
b•a

provided the underlying norm is strictly convex. The following example shows that
all assertions of Proposition 1.3 (and of (1.8)) can fail if [[ ]] is an arbitrary norm.

/
a xI

iiiiiiiii
Figure 1.2: Voronoi region and separator with respect to the/1-norm

1.4 Example
Let the underlying norm on R 2 be the /1-norm. For a = (1, 0) and b -- (0, 1), we
obtain

H(a,b)={xER 2:x2<0}U(xER 2:xl_>I}U(xER 2:x2_<xl}

and

S(a, b) = ( - ~ , 0] 5 U {s(1, 1): 0 < s < 1} U [1, ~)~.


1. Voronoi partitions 13

Thus H(a, b) is the union of three halfspaces and the separator is the disjoint union
of two quarterspaces and a line segment; see Figure 1.2. Clearly, all assertions of
Proposition 1.3 fail for (~ = (a, b}.

Under various conditions, Voronoi regions are geometrically regular. Let [A[ denote
the cardinality of a set A and let Ad denote the d-dimensional Lebesgue measure.

1.5 T h e o r e m ( B o u n d a r y t h e o r e m )
Each of the following conditions implies
Ad(OW(alc~)) = O, a • a.

(i) The underlying norm is strictly convex.


Oi) The underlying norm is the/p-norm with 1 <_p <_ oo.
(iii) d = 2.

Proof
According to Proposition 1.2 (c) it is sufficient to show that A~(OH(a, b)) = 0 for
a ¢ b. Since H(a, b) = H(a - b, 0) + b, we may assume without loss of generality that
b = 0. Set

A = H(a,O) c = {x • R d : Ilxll < [ I x - all}.


Then OA = OH(a, 0) and by Proposition 1.2(b), A is open and star-shaped relative
to0•A. F o r x • R d,let

I(x) = {t > 0 : tx • OA}.


Since A and cl(A) are star-shaped, I(x) is a closed interval (possibly empty, possibly
degenerate). Moreover, since

Ad(OA) = fs ~+ rd-lloA(rx) dr da(x)

= fs f1(x) rd-l dr da(x),

where S = (x • R d : IIx[[~2= 1} denotes the unit/2-sphere in R d and a its surface


measure, we find that Ad(OA) = 0 if and only if

(1.9) < 1

for a-almost all x • S.


Now assume (i). By (1.7), sy • A for every y • OA, 0 < s < 1. Therefore condition
(1.9) is satisfied for every x • R d.
14 L General p r o p e r t i e s o f the q u a n t i z a t i o n for p r o b a b i l i t y d i s t r i b u t i o n s

Assume (ii). Let p = 1 or p = oo. Then H ( a , 0) is a finite union of polyhedral sets


implying Ad(OH(a, 0)) = 0. If 1 < p < 0% then t h e / p - n o r m is strictly convex.
Assume (iii). We show t h a t (1.9) holds for every x. Assume on the contrary t h a t
there exists a point x and t > 1 such t h a t {x, t x } C OA. Since OA C S ( a , 0), the
convex function g : R -+ N given by g(s) = IIx - sail satisfies

g(o) = g ( 1 / t ) = ~(1) = INI.

Hence we get

g(s) = [[xl[ for every s e [0, 11.

This yields { s x : s >_ 1} C S ( a , 0 ) and thus

(1.10) { s x : s >_ r} C S ( r a , 0) for every r _> 0.

By assumption there exists a sequence (Y~)~_>I in A such t h a t lirn~_,~ y~ = tx. Note


t h a t z and a are linearly independent because the separator S ( a , 0) contains a / 2 but
no other point from the line { s a : s E R}. So, using d = 2, we have

Yn = sna + t n x , sn, tn E ~.

Then s~ --+ 0 and t~ --+ t > 1. Choose no E N such t h a t

Is~l < 1, t,~>_-s,,


1 tn
0 < - - < 1, - - > i for every n >_ no.
sn + t,~ 1 - sn -
We claim t h a t s,~ < 0 for n >_ no. We have
1 1 sn t~ tn
a+--(yr~-a)=(1---)a+ a+ x-- --x.
1-s,~" " 1 - s,~" 1 - s,~ 1 - s,~ 1 - sn

Therefore, it follows from (1.10) that

1
a + l_---L-~n(y,~ - a) • S ( a , O)

and hence
1
--(y,~ - a) e S ( - a , o).
i -- sn

Thus the convex function hn : N --+ N given by hn(s) = IlYn - a + sail satisfies

h~(0) = h~(1 - s~) = Ily~ - all.


This implies

h,~(s) <_ I l y , ~ - a l l , O < s < l - s , ~ ,


h,~(s) >_ I l y ~ - a l l , s >_ l - s,~.
1. Voronoi partitions 15

Since hn(1) = Ily-II < Ily= - all, one gets 1 < 1 - sn, n > no and our claim is proved.
Since t,~ > - s ~ > O, it follows from (1.10) that

tn
s~ + t~ x E S ( - s~ a, O), n > no.
Sn + tn
Hence

l i - - y ,1, l l = I I - - xe~ + s.@t_ all - .e.


JRxll, ~ > no.
s,~ + t . s,~ + t,~ _,. _,. s , ~ + t,~" -

Moreover, again by (1.10) we have

- - z e S( a, O)
Sn + tn n+
and therefore

1 t= t= all - t=
II~y=-all=lls.+t= x s~+ s. T t l Izll, n___no.
This yields
1
s= + t~ y~ E S(a, 0), n > no,

a contradiction in view of y~ E A and 0 < 1/(s~ + t~) < 1. []


For a Borel subset C of R a and a Borel measure # on R a, a t t - t e s s e l a t i o n of C
is a countable covering {Cn : n E N} of C by Borel subsets C~ C C such that
tt(Cn N Cm) = 0 for n # m. A/~d-tesselation is simply called t e s s e l a t i o n . In view of
Propositions 1.1 and 1.2 the Voronoi diagram of a is a #-tesselation of R d if and only
if
#(Rd \ U Wo(ala)) -- 0;
aC~t

it is a tesselation of R a if and only if

int W(alc~ ) A int W ( b l ~ ) = 0 for every a, b E (~, a ¢ b,


Ad(OW(aloO) = 0 for every a E (~.

We know from the Example 1.4 that the Voronoi diagram of a, in general does not
provide a tesselation of the space R d. (Notice that the Voronoi diagrams in Figure
1.1 provide tesselations.) According to (1.8) and Theorem 1.5, the Voronoi diagram
of (~ with respect to a strictly convex norm is a tesselation of R a.
Two further properties of Voronoi regions concerning neighbouring regions and
equivariance under similarity transformations are of interest. A bijective mapping
T : ]Rd --+ R d is called s i m i l a r i t y t r a n s f o r m a t i o n if there exists c E (0, co), the
s c a l i n g n u m b e r , such that HTx - Tyll = c]lx - Yll for every x , y E R d. Let
T ( a ) = { T a : a E a}; T((~) is locally finite.
16 L Genera/properties o f the quantization for probability distributions

1.6 L e m m a
Let T : R ~ --+ R d be a similarity transformation. Then

W(TaIT(~)) = TW(a[a).

Proof
Obvious. []
Voronoi regions are determined by their neighbouring regions in the following sense.
1.7 L e m m a
For a E c~, let

= {b • ~ : W(bt~ ) n W(al~ ) ¢ 0}.


Then W(al~ ) = W(alfl ).

Proof
Clearly we have W(al~) c W(al~). To prove the converse inclusion, let x • R a \
W(alc~) and consider the line segment {y, : s • [0, 1]} joining a and x, where y, =
sx + (1 - s)a. Since W(a[a) is closed, a • int W(ala), and W(ala) is star-shaped
relative to a, we obtain

W(al~ ) n ( y , : s • [0,11} = { y , : s • [0,s0]}


for some 0 < So < 1. By the local finiteness of the Voronoi diagram,

-y = (c • W(cl ) n {y,: s • (so, 1]} ¢ 0}


is finite. Since U W(cl~) is closed, one gets
eC~'

{y,: s • [So, 11} C U W(clol)"


cC7

Choose b • V with Yso • W(bl~). Then b • ~ and Yt • W(bl~) for some t • (So, 1].
Since Yt • W(al~), the point Yt satisfies Yt ¢ W(aIZ). This implies x ¢ W(alfl ). []
Notice that bounded Voronoi regions have a finite number of neighbouring regions by
the local finiteness of Voronoi diagrams.

1.2 Euclidean norms

Voronoi regions with respect to euclidean norms exhibit some special features. Let
( , ) be any scalar product on R d and ]lxll = (x, x) 1/2. Then for a ¢ b, H(a, b) is the
closed halfspace
1
(1.11) H(a,b) = {x E R d : ( a - b , x - ~(a + b ) ) > 0}
1. Voronoi partitions 17

bounded by the separating hyperplane

(1.12) s ( a , b) = { x e R d : (a - b, x - + b)) = 0}.

The hyperplane S(a, b) contains the midpoint (a ÷ b)/2 and is perpendicular (with
respect to ( , ) ) to the line through a and b. Thus the Voronoi regions W(a[a) are
convex, ff a is finite, then W(a]a) is a polyhedral set, that is, a finite intersection of
closed halfspaces in R ~. In the sequel a (convex) polytope means a compact polyhedral
set. By Lemma 1.7, bounded Voronoi regions are polytopes. The following example
shows that, in general, unbounded Voronoi regions are not polyhedral sets.

1.8 E x a m p l e
Let the norm on R 2 be the/2-norm. Consider the set ~ = {a, : n > 0} with a0 = (2, 0)
and a , -- (0, n) for n > 1. Then ~ is locally finite and the points

(4(n2q-n) q-l,nq-1), n>l

are extreme points of W(a0]~); see Figure 1.3. Since polyhedral sets in R d have a
finite number of extreme points, W(a0[c~) is not polyhedral.

a 5

a4

a3

a 2

a 1

Figure 1.3: Voronoi region with respect to/2-norm which is not a polyhedral set

1.9 R e m a r k
As indicated above, the Voronoi regions are convex in the euclidean case. It is an
interesting fact that the convexity of Voronoi regions even characterizes euclidean
norms. More precisely, if W(a]c~) is convex for every finite subset a of R d and every
a E (~, then the underlying norm is euclidean. This is a classical result of Mann
(1935). See also Gruber (1974).

For euclidean norms, there is a simple characterization of boundedness of Voronoi


regions. Denote by cony ~ the convex hull of a.
18 I. General properties of the quantization for probability distributions

1.10 Proposition
Let IIxH ----(x,x) U2 for some scalar product ( , ) on R d and let a E c~. Then W(alc~ )
is bounded if and only if a E int cony a.

Proof
For u E •d, u ~ 0, consider the halfline Lu = {a + su : s >_ 0} with initial point a.
We have Lu C W(a[(~), that is,

Ila + su - bll 2 = Ila - bll 2 + 2s(~, a - b) + 8211ull 2


> [[a ÷ su - all 2 -- s211ull~ for every b e o~, s > 0

if and only if (u, a) > (u, b) for every b E a.


Assume a E 0 c o n y (~. Then there passes a support hyperplane to conv a through
a, that is, (u,a) > (u,b) for every b E a and some u E R d,u ¢ 0 (cf. Webster,
1994, Theorem 2.4.12). Hence, Lu C W(a[a) so that W(a[a) is not bounded. Now
assume a E int cony a. Then for every u E ]Rd, u ¢ 0 there exists b E a such that
(u, a) < (u, b). Therefore, W(a[a) does not contain any hairline with initial point a.
This implies the boundedness of W(a[a) (cf. Webster, 1994, Theorem 2.5.1). []
Notice that for arbitrary norms the interior point condition for the generator point
does not imply the boundedness of the corresponding Voronoi region. This is illus-
trated by the following example. See also Figure 1.1(b).

1.11 Example
Let the underlying norm on R 2 be the/l-norm. Consider a -- ((0, 0), (0, - 1 ) , (2, 1),
( - 2 , 1)} and let a = (0,0). Then a E int cony a, but W(a[a) is unbounded since, for
instance, the halfline (s(0,1) : s _~ 0} is contained in W(ala); see Figure 1.4.

Figure 1.4: Unbounded Voronoi region with respect to t h e / l - n o r m generated by an


interior point of conv (~
1. Voronoi partitions 19

Notes

For detailed treatments of Voronoi diagrams of finite point sets we refer to the no-
table book by Okabe et al. (1992), the review article by Aurenhammer (1991), and
the book by Klein (1989). A discussion of random Voronoi tesselations with respect
to the/2-norm may be found in Moiler (1994). Theorem 1.5 (iii) on the geometric
regularity of Voronoi regions is certainly known but we are not aware of a reference.

1.12 C o n j e c t u r e
The assertion of Theorem 1.5, that is, Ad(OW(ala)) = 0 for every a • a, holds for
arbitrary norms and arbitrary dimensions.
20 I. General properties o f the quantization for probability distributions

2 Centers and moments of probability distribu-


tions

We present some facts about centers and moments of probability distributions on R d


needed in the sequel

2.1 Uniqueness and characterization of centers

Let X = ( X b . . . , Xd) be a Rd-valued random variable with distribution P. As


before, II II denotes any norm on R d. Let 1 < r < co and assume throughout

EIIX[Ir <

By a c e n t e r o f P o f o r d e r r we mean a point a C •d such that

(2.1) E I I X - all r -= inf E I I X - bllr.


bCR~

Let C r ( P ) denote the set of all centers of P of order r. Centers of order 1 are usually
called (spatial) medians. The r - t h ( a b s o l u t e ) m o m e n t o f P about the center is
defined by

(2.2) V~(P) = inf E[[X - all r.


aERd

We also write Vr(X) and Cr(X) instead of Vr(P) and Cr(P).


For A • B ( R d) bounded with A4(A) > 0, define the n o r m a l i z e d r - t h m o m e n t of
A about the center by

(2.3) M r ( A ) = V~(U(A))

where U(A) denotes the uniform distribution on A. Normalization yields an important


scaling invaxiance property.
Recall that bijective isometries T : ~d __+ Rd with T(0) = 0 are linear (cf. Semadeni,
1971, L e m m a 7.8.3) and Ad is invariant under bijeetive isometries on R d,

2.1 L e m m a
Let T : l~~ -~ R d be a similarity transformation with scaling number c > O.

(a) C r ( T ( X ) ) = T C ~ ( X ) ,
V~(T(X)) = crVr(X) .

(b) M r ( T ( A ) ) = Mr(A), i r A • B ( R a) is bounded with Ad(A) > 0.


2. Centers and moments of probability distributions 21

Proof
(a) is obvious.
(b) If X is U(A)-distributed, then T(X) is U(T(A))-distributed. Prom (a) it follows
that
V~(T(X)) crV~(X) = Mr(A).
M r ( T ( A ) ) - M(T(A))~/d = (¢tM(A))~/~
[]

Centers of order r are global minima of the function

(2.4) ¢r : R ~ -~ R+, Cr(a) = EI[X - air.


The function Cr is obviously convex and hence continuous. Therefore, local minima
of Cr are global minima of ¢~.
2.2 L e m m a
The level set {¢r < c} is convex and compact for every c E R+. In particular, Cr(P)
is convex compact and nonempty. Moreover,

cr(P) c B(0, 2(2EIIXIF) v~)


and

CI(P) C B(0, 2EIIXII).

Proof
By convexity and continuity of Or, the level sets are convex and closed. Since

Ilall _ IIz - all + Ilxll < 2 max{llx - all, Ilzll}


and hence
Ilalr _ 2r max{fix - air, IIxlF}
< 2r(llx - all ~ + tlxll~), a , x c R d
one obtains for a E (¢r -< c}

Ilalr ~ 2r(c + EIIXllr).


Therefore,

{¢r ~ c} c B(0, 2(a + ElIXIF)l/r).


In case r = i, we have

{¢1 _< c} C B(0, c + EIIXII)).


Choosing c = E[[X[[ r and c - Vr(P), respectively, gives the assertions. []
22 L General properties of the quantization for probabifity distributions

2.3 E x a m p l e
(a) If P is symmetric (about the origin), then 0 E C~(P) and thus, V~(P) = E[[X[[ r.
In fact, Cr(P) is symmetric by Lemma 2.1 (a) and from convexity of Cr(P) follows
o c c~(P).
(b) For the /2-norm, we obtain C2(X) = { E X } and V2(X) = ~-~=1 VarX,, where
VarXi denotes the variance of Xi. If the underlying norm is the /l-norm, then
C I ( X ) = X~ Med(Xi), where Med(Xi) is the set of medians of the real random
variable Xi.

The center of a probability distribution need not be unique; think of the median of
one-dimensional distributions. Conditions for the uniqueness of the center are derived
in the following theorem. Condition (iii) is due to Milasevic and Ducharme (1987)
(for euclidean norms) and Kemperman (1987).

2.4 Theorem (Uniqueness)


Each of the following conditions implies IC~(P)[ = 1.

(i) The underlying norm is strictly convex and r > 1.

(ii) P ( S ( a , b)) < 1 for every a, b E R d, a ~ b, and r > 1.

(iii) The underlying norm is strictly convex, r = 1, and P ( L ) < 1 for every
line L C R d.

Proof
We show that ¢~ is strictly convex. This yields the assertion. Let a, b E R d, a ~ b,
and0<s<l. Then

I1~ - (sa + (1 - ~)b)ll ~ = IIs(x - a) + (1 - s)(x - b)ll r

-< (sllz - all + (1 - s)ll~ - bll) ~


<_ sllx - alF + (1 - s)llx - b[]~, x C R ~.

Let

A = {x e Rd: [Is(x-- a) + (1-- s ) ( x - - b)[ r = s l l x - all ~ + (1 - s ) l l x - bliP}.

In case r > 1, A is contained in the separator S(a, b) since t ~-+ t r is strictly convex.
If, additionally, the norm is strictly convex, then A -- 0.
If r = 1 and the norm is strictly convex, then A is a subset of the line L through a and
b given by L --- (ta + ( 1 - t)b : t c R}. To see this, let x E A and assume x ¢ b. By
strict convexity of the norm, there exists t E R+ such that s(x - a) = t(1 - s)(x - b).
This gives

s ts - t
x-- .a + - - b
s-t+st s-t+st
2. Centers and moments of probability distributions 23

and hence x E L. Notice that s - t + st # 0 since otherwise, a = b.


Therefore, under any of the above conditions we have P ( A ) < 1. This implies C r ( s a +
(1 - s)b) < s ¢~(a) + (1 - s)¢~(b). []
Next, we characterize the centers of P by means of the derivative of the underlying
norm. Let V+H ]](x,y) denote the one-sided directional derivative of the norm at
x E R d in direction y E R ~ given by

V+l I II(x,y)--- lira I I x + t Y l l - Ilxll


t-+0+ t

The norm ]J Jl is said to be smooth if it is differentiable at every point x # 0. In the


smooth case, the (two-sided) directional derivative exists at every x # 0 and coincides
with <VII II(x),Y), that is,

(vii II(x), y) = lim IIx +tyll- Ilxll, y E R d,


t~0 t
d
where VII II(x) denotes the derivative of the norm at x and (x, y) = ~-]xiy~.
1
The /p-norms are smooth for 1 < p < c¢ while the /l-norm and loo-norm are not
smooth.
2.5 L e m m a
For a E R d, we have a E C~(P) f f a n d only if

llx- allr-~V+ll II(a-- x,y)dP(x) >_ 0

for every y E R d. I f the underlying norm is smooth, this condition takes the form

f llx- alF-lVll II(a- x) dP(x) = O, r > 1,


{x#a}

< / V H]](a-x) de(x),y> <<_P({a})llyll


{z~a}
d
for every y E R d, r = 1. (Recall (z, y> = ~ z~y~.)
1

Proof

Notice first that a E Cr(P) if and only if

V + ¢ r ( a , y) ___ 0

for every y E R d. This is a consequence of the convexity of Cr. Furthermore,

Cr(a + ty) - ¢r(a) = f Ila - x + ty[F - [la - xlF


dP(x).
t J t
24 I. General properties of the quantization for probability distributions

The function g: R --~ 1~ given by g(t) = tla - z + tyll ~ is convex and thus satisfies

g(0) - g ( - 1 ) < g(t) - g(0) < g(1) - g(0), 0 < t < 1

(cf. Webster, 1994, Theorem 5.1.1). Therefore, by Lebesgue's dominated convergence


theorem

V+¢~(a, y) = f V+J I II~(a - x, y) dP(x).


d
Since

V÷ll lit(x, ~)= rllxil~-lV÷ll li(x,y)


this yields the first assertion.
Now assume that the norm is smooth. Since V+H I1(0,y) = ilYll one gets

f lix - allr-lv+li I](a - x, y) dP(x)

:( f f Hx-allr-lllYHdP(x)
(xCa} {z=a}

for every y E R a. This yields the second assertion. []


Remark
(a) Using the dual norm of Ilxll given by

IlxllD = sup{(x, y): Ilyll -< 1},


the above equivalent condition for a e R a to belong to C1 (P) in the smooth case
means

f Vii II(a x) dR(x) <_ P({a}).


D

(b) Suppose the underlying norm is smooth. Then ¢~ is differentiable on R d for r > 1
while ¢1 is differentiable at every point a • R 4 with P((a}) = 0. The derivative is
given by

V¢~(a) = r / [ I x - allr-lVll II(a- x) dP(x).


* ¢

{z~a}

The diameter of a nonempty bounded subset A of Rd is the number

diam(A) = sup(Ha - bll: a, b • A}.

Denote by supp(P) the topological support of P.


2. Centers and moments of probaM1ity distributions 25

2.6 L e m m a
(a) (Euclidean norms) Let ]]xl] = (x,x) 1/2 for some scalar product ( , on ~d"
Then

Cr(P) C cl conv(supp(P)).

(b) Suppose supp(P) is compact. Then

sup rain [Ix - all _< diam(supp(P)).


seCt(P) zesupp(P)

Proof

(a) Let K -- clconv(supp(P)) and a ~ K. Then K and a can be strictly separated by


a hyperplane H, that is, K and a lie on opposite open halfspaces determined by H.
Denote by b the orthogonal projection of a onto H. For x • K, let y be the point on
the line segment joining a and x which lies in H. Since (y - b, a - b) = 0, we obtain

IlY - alas -- IlY - bll 2 + li b - all 2 > Ily - bll 2

and therefore,

IIx - bil _< Ilx - yll + Ily- bll


< IIx - vii + ]Iv - all - IIx - all.

This implies a ~ Cr(P).


(b) Let a C R d such that d(a, supp(P)) > diam(supp(P)) and let y • supp(P). Then

fi x - yll -< diam(supp(P)) < lax - all

for every x • supp(P). This implies a ~ Cr(P) []


The assertion of part (a) of the preceding lemma can fail ff f] 11is an arbitrary norm.
This is exhibited by the following example.

2.7 E x a m p l e
Let the underlying norm on R 2 be t h e / o o - n o r m . Consider P = ½(5(-1,0) + 5(1,0)),
where 5~ is the point mass at x. Since P is symmetric about E X = (0, 0), this point
belongs to Cr(P) and thus, Vr(P) = EIOX[]r = 1 for every 1 < r < oo. We find

cl(P) = {¢1 = 1} = {x • R2 : Ix~l + Ix~l < ]},


C~(P) = {¢~ = 1} = {s(0,1): - 1 < s < 1} for r > 1;

see Figure 2.1. Clearly, the assertion of Lemma 2.6 does not hold for P.
26 L General properties of the quantization for probability distributions

Figure 2.1: CI(P) and C r ( P ) , r > 1, with respect to the loo-norm for a discrete
probability P with two supporting points

2.2 M o m e n t s of balls

Balls have minimal moments for measures # which vanish on spheres, i.e.
tz(OB(a,s)) = 0 for every a E R g and every s ~ 0. (Note t h a t OB(a,s) = ( x E
R d : [Ix - a I [ ----s}.) This statement is meant in the following sense.

2.8 L e m m a
Let # be a Borel measure on R d that is finite on compact sets and vanishes on spheres.
Then, for every bounded set A E B(R d) with #(A) > 0 and every a E R d there is an
s >_ 0 with #(B(a, s)) = #(A). Moreover, for such an s,

f
[Ix - a[[r d#(x) > / [Ix - at[r dl~(x).
*g

A B(a,s)

In particular, we have for a E Cr(#(-IA))

V~(~(.IA)) >_ V~(U(.IB(a,s)),

where ~(.[A) = ~(. n A)/~(A).

Proof
Since A is bounded there exists an So > 0 with A C B(a, So), hence 0 < #(A) _<
# ( B ( a , so)) < co. Since the m a p R+ --+ R+, s ~-+ #(B(a, s)) is continuous under the
assumptions for ~ the intermediate value theorem yields the existence of an s > 0
with s _< So and ~(B(a, s)) = #(A). Then ~(A \ B(a, s)) = #(B(a, s) \ A) and we
2. Centers and moments of probability distributions 27

have

/[Ix-all r d"(x) = f [Ix-all r d"(x) + / IIx -aH r d#(x)


A B(a,s) A\B(a,s)

- / IIx-alrd#(x).
B(a,s)\A
Obviously

I[x - a[F d#(x) >_s~#(A \ B(a, s))


A\B(a,s)

and

f Ilx- alff d#(x) < g#(B(a, s) \ A).


B(a,s)\A
This implies

f Hx_a[[rd#(x) - / IIx-alrd#(x)
A\B(a,s) B(a,s)\A
> g(#(A \ B(a, s)) - #(B(a, s) \ A)) = O.
Hence, the lemma is proved. []
We can deduce a well known fact about the moments of unffom distributions on balls.
2.9 L e m m a
We have
M~(B(O, 1)) = min{M~(A) : A E B(R d) bounded, Ad(A) > 0}
and B(O, 1) is the essentially unique minimizer of Mr in that any bounded set A E
B(R a) with Mr(A) = Mr(B(O, 1)), Ad(A) = An(B(0, 1)), and 0 C C~(U(A)) satisfies
Ad(A A B(0, 1)) -- 0.
If, additionally, A is regularly dosed (that is, A = cl(int A)), then A = B(0, 1).
Moreover,
d
(2.5) Mr(B(O, 1)) = (d + r)Ad(B(0, 1)) r/d"
Proof
The first assertion follows from Lemma 2.8 with the choice # = Ad and Lemma 2.1
(b). As for uniqueness, let A be a set with the above properties. Then

/ ''x''rdx= / ['x'[rdx"
A B(0,1)
28 I. General properties of the quantization for probability distributions

It follows that
f
)~d(B(0, 1) \ A) = Aa(A \ B(0, 1)) _< ] Ilzllr dx
. 1
A\B(0,1)

= / HxHrdx < )~d(S(O, 1) \A).


B(0,1)\A

Therefore,

f (llxlr-l) dx = 0
A\B(0,1)

which implies A \ B(0,1) C OB(O, I) Ad-a.s. Since ,~d(OB(O,1)) = 0, we obtain


Ad(AAB(0, 1)) = 0. ff A is closed, then B(0, 1) C A. Otherwise {[[x]l < 1} \ A is a
nonempty open set implying

)~a(B(0,1) \ A) _> ~d({llxll < 1} \ A) > 0,

a contradiction. It follows that {llxll < 1} C int A and hence

Ad(B(0, 1 ) ) = ~a((llxll < 1}) _< )~(int A) < Ad(A).

Therefore, Ad({[[x[[ < 1}) = Aa(int A) which implies {llxil < 1} = int A. From regular
closedness of A follows A = B(0,1).
Moreover, in view of the symmetry of U(B(0, 1)) one gets

Vr(U(B(O, 1))) = f Ilxllr dU(B(O, 1))(x)


oo

= fU(B(O, 1))(llxll r > t)dt


0
1

--
f
0
(1 - t d/r) dt - d + r"

This gives the formula (2.5). []


In view of the above formula for Mr(B(O, 1)) it is worth to recall that the volume of
unit balls with respect to the/p-norms for 1 _< p < co is given by

(2r(1 + ~)V
(2.6) ~(z(0,1)) -
r(1 + ~)
(cf. e.g. Pisier, 1989, p. 11).
2. Centers and moments of probability distributions 29

Notes

Among spatial centers the spatial medians have received special attention. We refer
to the survey article by Small (1990) for a discussion of several notions of spatial
medians. A good source for norm-based medians as defined in (2.1) is Kemperman
(1987).
30 L General properties of the quantization for probability distributions

3 The quantization problem


In this section we will give several equivalent formulations of the quantization problem
for probability distributions on R d with norm-based distortion measure. Let X denote
a Rd-valued random variable with distribution P. For n C N, let ~-u be the set of
all Borel measurable maps f : R d --+ R d with [f(Rd)] _~ n. The elements of ~-~ are
called n - q u a n t i z e r s . For each f E ~'n, f ( X ) gives a quantized version of X. Let
1 < r < c~ and assume

EliXir < oo.

The n - t h q u a n t i z a t i o n e r r o r for P o f o r d e r r is defined by

(3.1) Vnr(P) = inf


' fE.T~
EIIX- f ( x ) l r .
We will also write V,~,r(X) instead of V,~,~(P). A quantizer f C ~-,~ is called n-optimal
for P o f o r d e r r if

V,~,~(P) = E I I X - f(X)lr.

Note that VI,~(P) = V~(P).

al
_A
-w

= vA = ~(x)

Figure 3.1: Quantization scheme


-A l
For fixed n E N, searching for an n-optimal quantizer is equivalent to the n-centers
problem.

3.1 Lemma

V,,,r(P)= inf
aCR d
E~nlJX-alV.
[ai_<n
3. The quantization problem 31

Proof
For f E ~-~, let a = f ( R d) and A~ = { f = a}, a E a. Then

EHX /(x)lr = Ef[]x- a[Fdf(x )


d
aEa A~

= E ~ n IIX - blL

Conversely, for a c N d with Ic~l < n, let {A~ : a E c~} be a Voronoi partition of R d
with respect to a and let f = ~ alA.. Then f E ~ and
aE~

EminllX-all~=a~e~fllx-allrdP(x)=EIIX-aea f(X)l[r"

[]
A set a C R d with lal _< n is called n - o p t i m a l s e t o f c e n t e r s f o r P o f o r d e r r if

V,~,r(P) = E ma Ei n~ IIX - a l l

The proof of Lemma 3.1 shows that if f is an n-optimM quantizer, then f ( R d) is an


n-optimal set of centers. Conversely, if a c R d is an n-optimal set of centers and
{Aa : a E a} is a Voronoi partition of N d with respect to a, then f = ~ alA, is an
aEa
n-optimal quantizer. Recall that by Proposition 1.2, the sets Aa may be chosen to be
star-shaped relative to a.
Let Cn,r(P) denote the set of all n-optimal sets of centers for P of order r. We also
write C,,,r(X) instead of Cn,r(P). Note that CI,~(P) can be identified with Cr(P).
For A E B(R d) bounded with Ad(A) > 0, define the n o r m a l i z e d n - t h q u a n t i z a t i o n
e r r o r for A o f o r d e r r by

(3.2) Mn,r(A)- V,~,r(U(A))


)~d(A)r/d "

The following equivariance, scaling and invariance properties extend those of Lemma
2.1.
3.2 Lemma
Let T: R d --+R d be a similarity transformation with scaling number c > 0.

(a) G , , ( T ( X ) ) = TC~,~(X) ,
V~,~(T(X)) = ~'V,~,(X).
32 L General properties of the quantization for probabifity distributions

(b) M,~,r(T(A)) = M,~,~(A), if A E B(R d) is bounded with Ad(A) > O.

Proof
Obvious. []
Next, we show that the quantization problem is equivalent to a partitioning problem
for the space Rg.

3.3 L e m m a

V~,,(P) =i~f~)-~V~(P(.IA))P(A),
AC~

where the infimum is taken over all Borel measurable partitions ,4 ofR d with ]~4] _< n.

Proof
For f E ~-, let a ----f(R d) and A~ -- ( f -- a}. Then (An: a E a} is a partition of R ~
and

EiIX-f(X)ir=~f[ix-ail~dP(x)

>- ~ V~(P('IA~))P(Aa).
aEa

Conversely, for a Borel measurable partition ~4 of R ~ with [~41 < n, choose a A E


Cr(P. [A)),A E A, which is possible by L e m m a 2.2 and let f = ~ aA1A. (If
ACA
P(A) --- 0, let aA be an arbitrary point in R~.) Then f E 9vn and

V~(P('tA))P(A) = Z / I l x - aAtl~dP(x) = EIIX - f(X)tt'.


AEA AEA A

[]
A Borel measurable partition ¢4 of ]Rd with I~4] <_ n is called n - o p t i m a l p a r t i t i o n
for P o f o r d e r r if

V~,r(P) =~V~(P(.IA))P(A ).
AEA

The proof of the preceding lemma shows that if f is an n-optimal quantizer, then
{{f = a} : a E f(R~)} is an n-optimal partition. Conversely, if .4 is an n-optimal
partition and aA E Cr(P(.IA)) for A E .4, then f -- ~ aA1A is an n-optimal
AEA
quantizer.
3. The quantization problem 33

The quantization problem for P is further equivalent to the problem of approximating


P by a discrete probability with at most n supporting points. For Borel probability
measures/)1,/°2 on R a with f Ilxllr dPi(x) < oc, let

(3.3) P~(P1,P2) = inf (/ I1= - ulF d#(x, y) ,

where the infimum is taken over all Borel probabilities # on R d x R d with fixed
marginals P1 and P2. The L r - m i n i m a l m e t r i c p~ ( L r - W a s s e r s t e i n m e t r i c or Lr-
K a n t o r o v i c h m e t r i c ) is appropriate for the quantization problem. This has been
observed by Gray et al. (1975), Gray and Davisson (1975) and Pollard (1982a). By
P~ denote the set of all discrete probabilities Q on R d with I supp(Q)I < n.
3.4 L e m m a

V,,r(P) = inf p~(P, p I ) = j n f pT(P, Q),


fEY~ qIEP~
where p I denotes the image measure of P under f.

Proof
Given f E ~'~, let # I denote the image measure of P under the m a p R ~ -+ ]R~ x
R d, x ~-> (x, f(x)). Then

EIIX - f(X)lF = f IIx - yll~ d#f(x,y) > p;(P, Pf).


d
This implies

V,~,~(P) > inf p~(P, pf) > inf p~(P, Q).


-- fe.~n -- Qe'Pn

If Q E P~ with Q(a) = 1, ]a[ < n, then for every Borel probability tt on R d × R d with
marginals P and Q

f tl=- ytFd~(~,y) = f I1~ - Yllrd~(~,Y)


Rdx¢~

_>f min.~IIx - all r d#(x, y)


Rd×¢~

: f 2tlx- all"dP(x),
hence

py(P,Q) >_E ~ n l l X - alff.


34 I. General properties of the quantization for probability distributions

By Lemma 3.1, this yields

inf pr(p, Q) > v,,r(P).


QET~,~
[]
A measure Q E P , is called n - o p t i m a l q u a n t i z i n g m e a s u r e for P o f o r d e r r if

Vn,r(P) = prr(p, Q).

If f E 9vn is an n-optimal quantizer, then p I E P , is an n-optimal quantizing


measure. Conversely, if Q E P . is an n-optimal quantizing measure and {A~ : a E a}
is a Voronoi partition with respect to c~ = supp(Q), then f -- ~ alA~ is an n-optimal
aE~
quantizer.
Several functional descriptions of p~ are known. Among them the most famous is the
Kantorovich representation for r = 1

PI(P1,P2) = sup ] / gdPI - / gdP2',


g

where the supremum is taken over all functions g: R d -+ N satisfying the Lipschitz
condition Ig(x) - g(Y)l <- IIx - Yl[ for all x,y E ]Rd In case d = 1, p~ admits the
representation
1

pr (P1, P~) = ( / I F l - ~ ( t ) - F~-l(t)l~dt) 1/~


. ]

and
f
px(Pl, P~) = ] IF,(t) - F2(t) ldt,

where Fi denotes the distribution function and Fi- t the quantile function of P~
(F~-1(t) = inf{x e R: Fi(x) _> t},t e (0, 1)). For this background on Lr-minimal
metrics we refer to Rachev (1991) and Rachev and Riischendorf (1998, Chapters 2.5
and 2.6).
The empirical counterpart of quantization is cluster analysis. Somewhat more pre-
cisely, partitioning methods of cluster analysis for a finite sample according to a
norm-based optimality criterion correspond to quantization for the empirical mea-
sure.

3.5 E x a m p l e ( E m p i r i c a l v e r s i o n , c l u s t e r a n a l y s i s ) k
Let x l , . . . , xk E R d with xi = (xil,... , xid) and let P = ~ ~ 5~, denote the empirical
i=1
measure. We obtain from Lemma 3.3
1
Vn,r(P) = - min ~ min E Hx~ - air,
k c C~C~CaERaieC
3. The quantization problem 35

where the infimum is taken over all partitions C of { 1 , . . . , k} with ]C] _< n. If the
underlying norm is the/2-norm, then
1 .
V~,2(P) = ~ m~n ~ ~ ]]x~ - ~(C)]]2,
CEC iEC

where E(C) = ~ ,~c X,. This is the variance criterion for optimal grouping of data
x l , . . . , Xk. If the underlying norm is the/t-norm, then
1 .
V,,,,(P) = ~-m~n~ ~ I1=, - med(C) ll,
CEC iEC

where med(C) is an arbitrary element of X~=1med(xij, i E C) and med(xij, i E C) is


the set of empirical medians of the real data x~j, i E C (cf. Example 2.3 (b)). This is
the/1-criterion for optimal grouping of data. For treatments of cluster analysis which
contain discussions of the above optimality criteria we refer to Bock (1974) and Sp~th
(1985).

The n-optimal sets of centers for P of order r correspond to global minima of the
function

(3.4) ¢~,r : (Rd) n -+ R+, ¢~,r(al,... , an) = E min IIX - aiiIr.


l<i<n

Notice that ¢1,r = Cr. While Cr is convex, Cn,~ is typically not convex for n > 2.
Therefore, local minimum points of Cn,r may not be global minimum points of Cn,r.
The lack of any straightforward solution for the quantization problem (at least for
d > 2) is a result of the difficulty in dealing with the nonconvex nature of quantization.

Notes

Treatments of the quantization problem with applications in information theory


(analog-to-digital conversion, signal compression, coding theory) are contained in the
March 1982 Special Issue of IEEE Transactions on Information Theory (Vol. 28, pp.
127-202), in Gray (1990), Abut (1990), Gersho and Gray (1992), and in Calderbank
et al. (1993). Some material may also be found in Fang and Wang (1994). The n-
th quantization error Vn,r(P) appears in error bounds for numerical integration; see
Pages (1997). In the one-dimensional case the quantization problem for r = 2 corre-
sponds to the optimal stratification problem for Bowley (or proportional) sampling
schemes of Dalenius (1950). Also in the one-dimensional case the quantization prob-
lem can be seen as optimal knot selection for piecewise constant Lcapproximation.
A review of the problem in this spirit for r = 2 can be found in Eubank (1988). A
fuzzy version of the quantization problem is discussed by Yang and Yu (1991).
Let us mention that n-optimal sets of centers are sometimes called sets of principal
points or representative points.
36 L General properties of the quantization for probability distributions

Occasionally, it may be preferable to use other measures for the quantization error
than Lr-metrics as in (3.1). The limiting case of the Loo-metric ("worst-case error")

ess supllX - f ( x ) l l = inf{c > 0: h ° ( l l / - f ( x ) [ I > c) = 0 } , f e -%-n

is studied in Section 10 and the Ky Fan metric

inf(e > 0: ~°(llX - f(X)ll > ~) _< ~)

is studied in Graf and Luschgy (1999a). While the first metric requires X to be
bounded, the latter does not. The Ky Fan error measure leads to the approximation
problem for P with respect to the Prohorov metric (in the sense of Lemma 3.4). An
investigation of the quantization problem based on the geometric mean error

e x p E log t l X - f(X)tl

as measure of performance can be found in Graf and Luschgy (1999b). Input weighted
error measures of the form

E(X - f(x))tB(X)(X - f(X)),

where B(x) is a positive definite matrix for every x C R a, have proved useful in speech
and image compression systems. For various aspects of the quantization problem
based on this error see e.g. Gray and Karnin (1982), Gardner and Rao (1995), Li et
al. (1999) and Linder et al. (1999).
Basically different quantization problems have been treated by Elias (1970) and more
recently by Bock (1992) and PStzelberger and Strasser (1999).
4. Basic properties of optimal quantizers 37

4 Basic properties of optimal quantizers


As in the previous section, let X be a Rd-valued random variable with distribution
P such that EIIXI] ~ < co for some i _< r < co. Further, we assume (with the only
exception of the last subsection) n > 2 and in order to avoid trivial cases, we also
assume P ~ P u - i , that is, ] supp(P)] >_ n.

4.1 Stationarity and existence

The following two theorems provide necessary conditions for n-optimality of quantiz-
ers. They provide the gateway to most available algorithmic solutions.
4.1 T h e o r e m ( N e c e s s a r y c o n d i t i o n s for o p t i m a l i t y )
Let ~ C C~,r(P) and let {An : a E c~} be a Voronoi partition o f N d with respect to a
and P . Then
[a[ = n, P(Aa) > 0 for every a C a,
e I U Ao)) eorovery C with I 1--m.
aEB

In particular,
(4.1) P(W(ala)) > O, a e C,(P(.[W(ala))) for every a E ~.

Proof
Let 7 = {a E a : P(Aa) > 0} and assume ]7I < n. Obviously, 7 C C~,,(P). Since
P ~ Pn-1, there exists a C 7 such that P(.IAa) is not a point mass. We can conclude
that
P ( g ( a , b) c M An) > 0
for some b e R d. (Recall H(a,b) = {x E R d : [ix -- ail _< ]ix -- bi]}.) In fact, we have
P(A~ \ {a}) > 0 and hence, there is a compact set K C A~ \ {a} with P ( K ) > O.
Since K C U H(a, b) c and H(a, b)c is open, we can find a finite subset B of K such
bcK
that K C U H(a, b)c. This gives the existence of a point b C B with the required
bEB
property. It follows that
V~,r(P) = E m i n [ i X - a[[r > E min [IX - a[[r > V~,(P),
aE7 aETt2{b} '

a contradiction.
As for the assertion concerning/3, assume/~ ¢ C,~,~(P(. I U A~)). Then there exists
aE/~
5 C R d with 15[ < m and

f mni[[x-b[irb¢~ dP(x) > / mini[x-b[[ dP(x).


U.~ U,~
neff aEf~
38 L General properties of the quantization for probability distributions

It follows that

v~,~(P) = E ~ n l l X - a l l ~ > E rain IlX-all" ~ v,~,.(P),


ae6u(~\~)

a contradiction. []
We know from (1.8) and Theorem 1.5 that the Voronoi diagram of every finite subset
of tt[d is a P-tesselation provided the underlying norm is strictly convex and P is
absolutely continuous with respect to Ad. So the following result is of interest for
probability distributions P which are not absolutely continuous with respect to Ad.
(Such probabilities are considered in Chapter III.)

4.2 T h e o r e m ( N e c e s s a r y c o n d i t i o n for o p t i m a l i t y )
Let o~ • C~,~(P) and let r > 1 or P ( a ) = O. Suppose the underlying norm is strictly
convex and smooth. Then the Voronoi diagram of a is a P-tesselation o f R d.

Proof
We have to prove

P(W(al~) n W(bl~)) = 0

for every a,b • a , a 76 b. Fix a,b • oi, a ¢ b and assume P ( W ( a [ o 0 M W(blot)) > O.
Choose a Voronoi partition {Ac : c • a} with respect to a such that A~ = W(ala ) \
W(bla ). Then by Theorem 4.1,

a • Cr(P(.IA~)) n Cr(P(.iW(aloO) ).

From Lemma 2.5 it follows that

f llx - all~-'Vll II(a- x) dP(x) = 0


A~\(~}

and

f IIx - ~ll~-lvll II(a - x)dR(x) = 0


wcala)\{a}

which yields

f IIx - all~-lVll II(a - x) dP(x) = 0.


w(ata)nw(bl~)

Therefore, again by Lemma 2.5, a e Cr(Q) with Q = P ( . i W ( a i a ) n W(bia)). Since


W(ala) Cl W(bIc~) is contained in the separator S(a, b), this implies b C C,.(Q). Thus,
4. Basic properties of optimal quantizers 39

Q has two different centers a and b of order r. By Theorem 2.4, this can happen only
ff r = 1 and Q(L) = 1, where L is the line through a and b. Since

i M W(al~ ) N W(blo~) C i N S(a, b) = ((a + b)/2},

one obtains Q = 5(a+b)/2. It follows

{a, b} C C~(Q) = {(a + b)/2},

a contradiction. []
A set ~ c R d with ](~] -- n satisfying condition (4.1) is called n - s t a t i o n a r y set o f
c e n t e r s for P o f o r d e r r. Let S,~,~(P) denote the set of all these n-stationary sets
for P and denote by SS,~,r(P) the subset of S,~,~(P) consisting of all c~ E S,~,r(P) such
that the Voronoi diagram of (~ is a P-tesselation. Then by Theorem 4.1,

Cn,r(P) C S,~,r(P).

Note that any Voronoi partition {An : a E a } with respect to (~ E SS,~,r(P) and P
satisfies Aa = W(al~ ) P-a.s., a E c~. We also write S,~,~(X) and SS,~,r(X) instead of
S~,r(P) and SS~,~(P), respectively.

4.3 C o r o l l a r y
(a) Let A be an n-optimal partition for P of order r. Then [AI = n , P ( A ) > 0 for
every A E A, C~(P. IA)) n C~(P(.IB)) = 0 for every A, B E A, A • B, and ,4
is a Voronoi partition o f R d with respect to ~ {aA : A C A } and P for any
=

choice of aA C Cr(P(.IA)).
(b) Let f E Y=,~be an n-optimal quantizer for P of order r and let a = f(Rd). Then
a E Sn,~(P), { { f : a} : a E a ) is a Yoronoi partition o f R d with respect to o~
and P, P ( { f = a}) > 0 and a • Cr(P('l{f -- a})) for every a • ~.

Proof

(a) We have

V~,r(P) = ~ V~(P('IA))P(A ) = Z / [ i x - aAllrdP(x)


ACA AEA A

>_ ~ f min IIx - biIr dP(x) = [ min Iix - bllT dP(x)


J ben J bca
AEA A

>_ V,,,r(P).

This implies (~ E C,~,r(P) and

f jlx - r dP(x) = f m i n Iix - bll r dP(x), A E A.


J J boa
A A
40 I. General properties of the quantization for probability distributions

Therefore, Jt is a Voronoi partition of R d with respect to a and P . The remaining


assertions follow from Theorem 4.1.
(b) As in (a) one can check t h a t { { f = a } : a E a } is a Voronoi p a r t i t i o n of ~a with
respect to ~ and P . The remaining assertions follow from Theorem 4.1. []
Under the condition C,~,r(P) C SS,~,~(P) there is a characterization of optimal quan-
tizing measures.

4.4 L e m m a
Suppose C,~,r(P) C SS,~,~(P), that is, the Voronoi diagram of every a E Cr~,r(P) is a
P-tesselation o f R a. Then the set of n-optimal quantizing measures for P of order r
coincides with the set

{ P f : f E .7:,~ n-optimal for P of order r}.

Proof
Let Q = )-~aea PaPa be an n - o p t i m a l quantizing measure of order r. Choose a Borel
probability # on R d × R d with marginals P and Q such t h a t p~(P,Q)~ = f IIx -
yll ~ d#(x, y) and let f = ~ a e a alAa, where {Aa : a E a } denotes a Voronoi partition
of R d with respect to ~. Then f is an n - o p t i m a l quantizer and a E Cnx(P).
Therefore
r
J IIx - alV dlz(x, y) = llx - yll" d,(z, y)
,~eo ~ x {,q Raxa
= pr(P, Q)r = Vn,,(P)

= /min
j bea
IIx - b i t alP(x)

= f
ae aRd x {a}
minb~.IIx - bll r d,(x, y).

This implies

Rd x {a} c W(ala) x Rd #-a.s.


for every a E ~. Hence p~ < P(W(aI(~)) , a E a. It follows from the assumption
t h a t ~ a e ~ P(W(a[o~)) = 1. Since ~ a e ~ P a = ~ a e , P(A~) = 1, one obtains Pa =
P ( W ( a l a ) ) = P(A~), a E ~. This gives Q = P f . The converse inclusion was already
mentioned in Section 3. []
In general, the Voronoi diagram of an n-optimal set of centers (~ C C,~,r(P) need not
be a P-tesselation and also, the assertion of L e m m a 4.4 may fail. This is exhibited
by the following example.

4.5 E x a m p l e
Let the underlying norm on R 2 be the lot-norm. Consider P = ¼(5(-1,0) +5(0,U +5(1,0) +
5(0,-1)) and let n -= 2, r = 1. It is geometrically rather obvious t h a t V2,1(P) = 1/2
4. Basic properties of optimal quantizers 41

and C2,1(P) consists of all sets {a,b} with a,b • {x • R 2 : Ixll + Ix~l = 1} such that
the line segment joining a and b meets the liae {xl = 0}; see Figure 4.1.
Now let a = ( - 1 , 0 ) and b -- (1,0). Then {a,b} e C~,I(P) and S(a,b) contains
the line through ( 0 , - 1 ) and (0, 1). One obtains P(S(a, b)) = 1/2 > 0 and hence,
the Voronoi diagram {H(a, b), H(b, a)} of {a, b} is not a P-tesselation. Furthermore,
the probability Q = ~SaS+ 35b is a 2-optimal quantizing measure for P. In fact, let
X l : ( - 1 , 0), X2 ---- ( 0 , 1), x3 = (1, 0), X 4 = ( 0 , - - 1 ) , and define a discrete probability/z
onR2 × R: by

#({(xl,a)})=~({(xa, b)})=#({(xd,a)})=l/4,
~({(x2, a)})=~({(x~,b)})=l/S.
Then the marginals of # are P and Q, respectively, and

fi x - yll dp,(x, y) = 1/2.

This yields ½,1(P) = pl(P,Q). Obviously, Q ¢ P f for every f • 22.

X2

X1

Figure 4.1: 2-optimal centers of order 1 with respect to the lo~ norm

4.6 R e m a r k ( E u c l i d e a n n o r m s )
Let Ilxll = (x,x> 1/2 for some scalar product ( , > on R ~.
(a) We have

U{o~ : o~ • S,~,r(P)} C clconv(supp(P)).

This follows from L e m m a 2.6(a). Furthermore, by Theorem 4.2, Cn,r(P) C SS,~,r(P)


provided r > 1. Recall that in case r = 2, the second condition of (4.1) means
a = E ( X I X • W ( a l ~ ) ) , a • 4.
42 L General properties of the quantization for probability distributions

(b) If a C SS~,2(P), then

aP(W(a[o~)) = E X .
aE~

In case d --- 1, a simple but sometimes useful consequence is that m i n ~ < E X <
m a x ~ holds for o~ C SS~,2(P) (n > 2).
(c) If ~ C SSn,2(P), then

E min [IX - a[[ 2 = EI[X[[ 2 - ~ [[aH2P(W(a[a))


aCa
aE~

-- ½ ( x ) + IlEX[I 2 - ~ IlaIlZP(W(alod).
o,Eot

Hence, for a C R d we have a C C,,,2(P) if and only if ot E SS,~,2(P) and

Z[la[IZP(W(a[°~)) = max ~-~l[b[[2p(W(bIfl)).


6Eo~ ,SeSSn,r(P) ~Efl

(d) If f e .T~ is an n-optimal quantizer for P of order 2, then


El(X) = EX,
E ( X - f ( X ) , f ( X ) ) = O,
v,~,2(x) = EIIX - . f ( X ) l l 2 -- E I I X I I 2 - E I I . t ' ( X ) I I ~.

This follows from (b) and (c).


The sets Sn,r(X) and SS,~,r(X) have the same equivariance property as Cn,r(X).

4.7 L e m m a
Let T : R ~ --+ ~d be a similarity transformation. Then

S,~,~(T(X)) = TS,~,,(X),
SS,~,r(T(X)) = TSS,~,~(X).

Proof
Easy consequence of the equivariance properties of Voronoi regions and Cr(X) given
in Lemmas 1.6 and 2.1 (a). []
Stationary product quantizers are discussed in the following lemma.

4.8 L e m m a ( P r o d u c t q u a n t i z e r s )
Let the underlying norm be their-norm. Let ni E 1N, t3i C R with [fli[ ~ hi, 1 < i < d,
and ~ = ~d= 1 Zi.

(a) Suppose that X ~ , . . . , Xd are independent and let n = rIi=t


d n i. Then a E
S~,~(X) if aad only if ~i E S~,,~(Xi) for every i.
4. Basic properties of optimal quantizers 43

(b) If t3i E Cn,,'-(Xi) for every i, then


d
Emin IIX - all" = ~ v,,,'-(x,).
4=1

Proof
(a) Let Pi denote the d i s t r i b u t i o n of Xi, 1 < i < d. For a = ( b l , . . . ,bd) E a with
b~ C/~i for every i, we have

d
W(aloO = x W(b, lZ,).

A s s u m e a e Sn,'-(X). T h e n 1-I~=l IZ~I = Io~1 = ~ and

d
1-I P~(W(bd,6~)) = P ( W ( a l a ) ) > 0, a = (b~,... ,bd) C o~
i=1

which gives ]fli[ = n~ a n d P~(W(bil/3i)) > 0 for every i. F i x i a n d let c = ( c l , . . . ,ca) C


]Rd with cj = bj for j # i. T h e n

J=lw(a[a)
f f
wC~la)
IIx - all" dP(x)

Ilx - c[r dP(x)


w(al~)

= "~'.i f ]xj - b J dP(x) + f ]x, - cdr dP(x)


a¢ W(~l~) W(ala)

and hence

/ Ix,-b, lrdP(x)~_ / Ix,-cd'-dP(x).


wCal~) W(~l~)

Since for ci C R
f
f ]xi - cd'-dP~(xi)/P~(W(bil~i)) = [ [x~ - c~l'-dP(x)/P(W(alt~)),
W(bd~O w(~l~)

this yields bi C C'-(P~(.IW(bd~))). Therefore, fli E Sn,,'-(Xi).


44 I. General properties of the quantization for probability distributions

Conversely, assume j3i E Sn,,~(Xi) for every i. Then [a[ = n and P(W(a[a)) > 0. Let
One obtains
C ---- ( C 1 , . . . , Cd) E N d.

w(ala) ~=1w(bil~)
d
f
I,:1 W(bil,5. )
Ix'-c~[r dPi(x~)/P'(W(b~[/3~))

= / IIx-cll~dP(x)
w(~l,~)
and hence a E C~(P(.IW(alo~))). Therefore, oc E Sn,r(X).
(b) We have
d
E rain IIX - all ~ = ~ E min IXi - b r
aEa ~ bE~i
i=1
d
=

i=1
[]
The n-stationary sets for P are related to the stationary points of the function Cn,~
(see (3.4)).
4.9 L e m m a
¢~,r is continuous on (Rd) ~.

Proof
Immediate consequence of the continuity of (al,... , an) ~-~ minl<i<_~ IIx - aill r for
every x E R d and Lebesgue's dominated convergence theorem. []
4.10 L e m m a
Let a l , . . . , am E R d with a~ ~ aj for i ~ j. Suppose the Voronoi diagram of ~ =
( a l , . . . , an} is a P-tesselation o f R d. Then ~n,r has a one-sided directional derivate
at a = (at,... ,an) in every direction y = (Yl,... ,Yn) E (Rd) n given by

V+¢mr(a'y)=r~l / []x-ai][r-lV+[[ [[(ai- x'yi)dP(x)"


= W(ai]a)

/ f the underlying norm is smooth and furthermore, r > 1 or P(a) = O, then Cn,r is
differentiable at the point a with derivative

V¢,~,r(a)--(r / [Ix-ai[[r-tV[[ [ [ ( a i - x ) d P ( x ) ) l < , < n.


w(o~i~)\{o~}
4. Basic properties of optimal quantizers 45

Proof
Recall t h a t

wo(ad,~) = i'-'1 {= • Re: IIx - a, II < IIx - ajll).


j:¢i

For b = (bl,...,b,~) • (Re) '~ set d(x,b) = minl_<i_<,~I1= - bdl. By assumption, the
V o r o n o i d i a g r a m of a is a P - t e s s e l a t i o n of R e which gives P ( 0 Wo(ail°~)) = 1.
i=I
Furthermore, we have

Id(x, a + b) - d(x, a)l < lmax


<i<r~-
] IIx - ai - bdl - IIx - aill I

< m a x Ilbdl
- l<i<,~

and since

lu ~ - vrl <_ rmax{u~-l,v~-*}lu- vl, u , v > o,

we obtain

(4.2) Id(x, a + b)~ - d(x, a)~l < (C~I[xW-1 + C2) m a x Ilbdl,


-- 1<i<,~

x E Rd,b E (Rd) '~ with maxt<i<,~ [Ibi[I < 1 for numerical constants C1,C2 >_ 0 not
depending on x and b.
Let y = ( Y l , - - - , Y,) • (Rd) '~. Then

t-l(¢~,r(a+ty)-¢~,r(a)) = ~ / t-l(d(x'a+ty)r-d(x'a)~)de(x)"
i=lw0(ad~)

For x • Wo(aila), there exists e > 0 such t h a t the Rd-components of a + ty are


pairwise different and

x • W(ai +tyi[{al + t y l , . . . ,a~ + ty,~})


for every 0 < t _< e. This implies

t - ~ ( d ( x , a + t y ) ~ - d(x, a) r ) = t - l ( l l x - (a~ + ty~)ll ~ - IIx - a~ll ~)


-+ V+ll I1"(~, - x, y,) as t -~ 0+.

Thus the assertion about the one-sided directional derivative of ¢~,r follows from
Lebesgue's dominated convergence theorem in view of (4.2).
Now assume t h a t the underlying norm is smooth. Then we have
46 I General properties o f the quantization for probabifity distributions

/ .I,,I'I°,-.l,-I.l,,,))
~=i W(~la)\{~}
P
(lmi~,~_._. ]lbjH) -1 ~ J/ (d(x, a + b) r - d(x, a) r - (V H H"(a, - x),b,)) d P ( x )

d
where ( x , z ) = ~ x j z j , x , z • R d. For x • Wo(aila), there exists ~ > 0 such that the
j=l
Rd-components of a + b are pairwise different and

x • W(ai +bil{al +bl,... ,a,~ +bn})

for every b • (Rd) ~ with maxl<j<, l]bj[] < ¢. This implies

( m a x IIbjll) -1 (d(x,a + b)r - d(x,a) r - (V H II'(a~ - x),bi) ) --+ 0


l<j~n
as max ][bl[ --+0 (x~a,, fir=l).

In view of (4.2), the assertion about VCn,r(a) follows from Lebesgues's dominated
convergence theorem. []
Consequently, in view of Lemma 2.5, n-stationary sets a E SSr,,,.(P) of centers provide
stationary points of ¢~,r, i.e. V+¢~,r(a, y) > 0 for every y E (Rd)% The following
example taken from Lloyd (1982) shows that a n-stationary set of centers does not
necessarily yield a local minimum point of ¢~,~.

4.11 Example
Let P = c t U ( [ - 1 , 0]) + c2U([0, 1]) with c2 > c~ > 0, cl + c2 = 1, and let n = 2, r = 2.
T h e n E X = $ ( 21 c - c l ) , E X 2 = -~
1 and {-½,½} e $22(P).
. . For. - 1 . < al < a2 < 1, we
have

¢(al, a2) -----~-


c.[/a, a., ~
]
b (i + a,)3 -- a~ + ~ [(I -- a2)3 + a 3] if a I -~-g2 --< O,

¢(~,,~.) = -f [(1 + ~)~ - ~] + -f ~ + ~ + (i - ~.)~ ~ ~ + ~. > o,

where ¢ = ¢2,2. One obtains ¢ (-½, ½) = ~ and

¢ - +e, =~+cle 2+c~ - < for e v e r y 0 < e < 5 - - -


C2

provided c2 > ~. Thus ( - ~t, ~)


l is not a local minimum point of ¢ in case c2 > ~. (It
is also not a local maximum point of ¢.) We have

{{ :}} if..,:
4. Basic properties of optimal quantizers 47

4-c2 ' 4c2 and

S2,2(P) = C2,2(P) U
{{ 11}}
- , if cz > ~.

The next theorem ensures the existence of n-optimal quantizers. We follow the lines
of Pollard's (1982a) proof for the euclidean case and r -- 2.

4.12 T h e o r e m ( E x i s t e n c e )
We have Vn,,(P) < Vn-l,r(P). The level set {¢mr <- c} is compact for every 0 <_ c <
Vn-l,,(P). In particular, Cn,,(P) is not empty and U { a : a e C,~,,(P)} is a bounded
subset of R d.

Proof
By L e m m a 4.9, the level sets of Cn,r are closed. Choose 0 < s < S (depending on
n, r, P and c) such that
P(B(O, s)) > 0, (S- s)'P(B(O, s)) > c,
2" / IlxllrdP(x) < Vn-l,r(P) - c.
B(O,2S)c
Let ( a l , . . . , an) C {¢~,r _< c}. Since c < V~-l,r(P), we have ai ~ aj for i ~ j. Assume
without loss of generality ]lax[[ ~ . . . <~ I[an[[.Then Hall[ ~ S. Otherwise

c_> f mAn [ x - a i [ [ ' d P ( x ) > _ ( S - s ) r p ( B ( O , s ) ) ,


J i<_i<_n
B(0,~)

a contradiction. ~ r t h e r m o r e , IIanl[ < 5S. Otherwise

IIx - alll <_ ltz - a lll.(0, s)(x) + 211xtlb(0, s)o(z)


for every x C ]Rd because Ilalll _< S. Therefore, if ( A 1 , . . . ,An} denotes a Voronoi
partition of R d with respect to { a l , . . . , an}, one gets

Vn-l,,(P) < C n - l , , ( a b . . . , an-i)


n
=~-" f min ]lx-aiIFdf(x)
J l_<i_<n-i
3= Aj
n-1
<--J~--~=IA
f llZ- ajllrdP(x) q- /
n
<_zf f
B(0,2S) c
< Cn,r(al,. • - , an) + Vn-i,,(P) - c _< Vn-l,r(P),
48 L General properties of the quantization for probability distributions

a contradiction. We thus obtain

(¢~,r < c} C ~ B(0, 5S).

Hence, if Vu,r(P) < c < V~-I,~(P), the level set (¢~,~ < c) is not empty and compact.
This implies that (¢~,r = Vn,r(P)} is not empty and compact and, in particular,
C,~,r(P) ¢ @provided Vu,r(P) < V,~-I,r(P).
Finally, we observe that this condition holds. We have Cr(P) ¢ q} by L e m m a 2.2.
Therefore, V2,r(P) < Vr(P), since otherwise there exists an 2-optimal set ~ of centers
with ]~1 = 1 which contradicts Theorem 4.1. Proceed inductively: if Vm,~(P) <
Vm-t,r(P) for some 2 ~ m _< n - 1, then C,~,~(P) ¢ q} by the preceding part of the
proof and hence, Vm+I,~(P) < Vm,r(P) again by Theorem 4.1. []
From Example 4.5 we know that there may be more than one n-optimal set of centers
for P. Here is another example of this fact for an univariate symmetric distribution.
4.13 E x a m p l e
Let P denote the uniform distribution on [ - 2 , - 1 ] U [1,2] and let n = 3, r = 2. Then

' 4' 4' ' 2'4' '

(Note that ~ = ( - ~3, 0 , ~)


3
is not a 3-stationary set, since P(W(OIc~)) = P([-¼, ¼]) =
0. However, ( - ~3, 0, ~)
3 is a stationary point of ¢3,2-) Here we have V2(P) -- V a r X --
7/3 and Va,2(P) = 5/96. Thus already 3-level quantization reduces the variance
considerably.
For dimensions d _> 2, typically [C,~,r(P)] _> 2 holds. This is related to the equiv-
ariance property of C,~,,.(P) (see Lemma 3.2). A uniqueness criterion for univariate
distributions is discussed in the next section.

4.2 The functional V~,r

The following simple properties of the n-th quantization error functional turn out to
be useful.
4.14 L e m m a
Let P = siPi, st >_ O, ~ si = 1, f Ilxllr dP~(x) < oo.
i=1 i=l

(a) (Concavity) V,~,r(P)>_~ s~V~,r(P~).

i=1 i=1
m

V~ r(P) < f min IIx - a]V'dP(x ) <_ ~ siVn,r(Pi).


' -- ! aCo~
i=1
4. Basic properties of optimal quantizers 49

Proof
(a) Let a • C~,r(P). T h e n

Vn,r(P) = f min IIx - alF dR(x)


J aQa
m

= ~ s, fminllz-a[I r dPi(x)
/=1 J age

__
i=1

(b) Since [a[ _< n, we have

Vn,r(P) <_ f min IIx - a i r dP(x)


a~Ct

gt~

si f minllx-- alr dPi(x)


i=l
J aea

_< s, f rain Ilz - al r de(z)


i=1 j aec~i
m

i=1

[]
Let X = ( X 1 , . . . , Xd).

4.15 Lemma (One-dimensional marginals) a


Let the underlying norm be t h e / r - n o r m . If ni E IN, [I n~ < n, then
i=1

d
<
i=t

and equality holds if and only if there exists a E Cn,r(X) of the type a = Xi.d=lfli with
fli c R and Iflil = ni for every i. Moreover, such n-optimal product sets ~ satisfy
fli ~ C~,,r(X~) for every i.

Proof
d
For i < i < d, let fli C R with Iflil <- ni and let ol = Xid=l fli . T h e n [c~l _< 1-Ini _< n
i=1
and
d
Y~,r(X) < E ~ n l l X - alF = ~-" E min I X / - b F.
i=l
50 L General properties of the quantization t'or probability distributions

Therefore, if we choose fli C C~,,~(Xi) for every i, we obtain

d
<

(cf. L e m m a 4.8 (b)) and if equality holds, then a E C,~,~(X). In particular, In[ = n
by Theorem 4.1 which gives lfli[ = ni for every i. Conversely, assume a ~ C,~,r(X).
Then
d d

V,~,~(X) : ~-~ E m i n [ X i - bl~ > ~-~ V,~,,~(Xi)


i:1
be& i=i

d
implying Vn,r(X) = ~ Vm,~(Xi ) and fli C Cm,r(Xi ) for every i. []
i=1

4.3 Q u a n t i z a t i o n error for ball p a c k i n g s

Ball packings consisting of n translates of a ball minimize the normalized n-th quanti-
zation error for bounded sets. This observation extends the corresponding statement
of L e m m a 2.9 for balls to the case n > 2. By a / t - p a c k i n g in R ~ we mean a countable
family {Cj: j 6 /5/} of Borel sets Cj c N ~ such that #(Ci M Cj) = 0 for i # j. i
Ad-packing is simply called p a c k i n g .

4.16 Theorem (Ball packing theorem)


Let s > 0 and a~,... , a,~ E R ~ such that {B(a~, s) : i = 1 , . . . , n} is a pacldng in R ~.
Let B = 0 B(a~, s). Then
i:l

M~,r(B) = min{Mu,r(A) : A E B(R d) bounded, Ad(A) > 0}.

Moreover,

i=,r(B) = 1)),
(al,..., a=} e

and f = ~ ailB(a,,s) is (U (B )-a.s. equal to) an n-optimal quantizer for U (B ) of order


i=l
r.

Proof
Let A c B(R d) be bounded with A4(A) > 0 and denote by Q the uniform distribution
U(A). Let C be an n-optimal partition for Q of order r. By Corollary 4.3 we know
that IC]-- n a n d Q ( C ) > 0 for every C E C. Note that Q(-]C) = U(A;3C). One
4. Basic properties of optimal quantizers 51

obtains from Lemma 2.9


v,~,r(Q) = ~ V~(Q(.IC))Q(C)
CEC

= ~ Mr(A n C)Ad(A n C)~/dQ(C)


CEC

_> Mr (B(0, 1))Ad(A) r/d ~ Q(C) (d+~)/d.


CEC

H51der's inequality with p = (d + r)/d and q = (d + r)/r gives

1= ~ Q(C) <_ ( Z Q(C)') 1/pnl/q


CCC CEC

and hence
~_, Q(C) (~+r)/d >_n-~l d.
C~C
This implies
V,~,r(Q) >_ (Ad(A)/n)r/4Mr(B(O, 1))
and
M,~,r(A) >_n-r/dir(B(O, 1)).
Now let B = [J,~=l B(a~, s) and denote by P the uniform distribution U(B). Note
1 n
that P = ~ ~ U(B(ai, s)). By Lemma 4.14 (b), we have
i=1

V~,r(P) < f min [[x -- ai[Ir dP(x)


- j l<i<_,~
n

= Vr(U(B(O, s)))
= (A~(B)/n)~/dM~(B(O, 1)).
The last equality follows from the scale invariance of Mr (see Lemma 2.1). Thus we
obtain
v.(p): f min ilx- ai[Ir dP(x) = Vr(U(B(O, s)))
J 1<i<,~
and

M~,r(B) = n-r/dMr("(0, 1)).


Furthermore, let {A1,... ,A~} be a Borel measurable partition of ]Ca with B n
n
A~ C B(ai, s) for every i and let g = ~ailA~. Then f = g P-a.s. and since
W(ail{al,... , a~}) n B = B(ai, s), {A1,... , A,~} is a Voronoi partition with respect
to {al,... ,a,~} and P. Hence, the theorem is proved. []
52 I. General properties of the quantization for probability distributions

4.4 Examples

We present some examples of optimal quantiziers and stationary sets of centers for
dimensions d > 2. Optimal quantizers for several univariate distributions are given
in the next section.
4.17 E x a m p l e ( U n i f o r m d i s t r i b u t i o n on a c u b e a n d t h e c u b e q u a n t i z e r )
Let P = U([0,1]) d) and consider a tesselation of [0,1] d consisting of n = k d trans-
lates C 1 , . •. , C ~ of the cube [0,1d ~] . Denote by ai the midpoint of Ci. Then
• , L 2~ : i-- 1 , . . . , k } ~ and by the symmetry of Ci about ai,
tl

we have a~ ~ Cr(U(C~)) = Cr(P(.tCi)). Let f~ = ~ a~lc~; see Figure 4.2. From scale
i=l
and translation invariance of Mr it follows that
II /*

E]]X - fn(X)]l r -- Z ] ]Ix - ai[I~ dx


J
i=1 Ci

= y ~ Mr(C~)P(C~)(d÷r)/~
i=1

= n-r/~Mr([O, 1]d).

• • • $

0 1

Figure 4.2: Square quantizer for U([0,1] 2)

We see that n-level quantization for P reduces Vr(P) = Mr(J0, 1] d) at least by a factor
n -rId. Note that

Mr([0,1] a) = / [[xl[rex.
2~2 J

For instance, we have for the/r-norm, 1 < r < oo


d
Mr([O, 1]d) = (1 + r)2 r
4. Basic properties of optimal quantizers 53

and for t h e / ~ - n o r m

d
M~([0, 1] d) = M~(B(O, 1)) -
(d + r)2 r

(cf. L e m m a 2.9). Note further that a E S,~,,.(P) and

E I I X - A(X)II" = E rain IlX - a~ll" -- p T ( P , P + ~ )


i_<i<~

provided Ci = [0, 1]dA W(ail~) for every i. This condition is satisfied, for instance, if
the underlying norm is the/p-norm for 1 < p < c~.
The error of the cube quantizer fu is of optimal order n -r/d but the constant
Mr([0, 1] d) is conjectured to be not optimal for common norms with one exception.
(This will be seen from the asymptotics for the n-th quantization error as n -+ c~
treated in Chapter II.) The exceptional case concerns the/co-norm in arbitraxy di-
mensions. In this case, we have C~ = B(a~, ~ ) and therefore, by Theorem 4.16, fu is
(P-a.s. equal to) an n-optimal quantizer of order r and a E Cu,r(P) for every r > 1.
In particular, for the/co-norm we obtain

(4.3) Vn,r(P) = n -r/d d


( d + r ) 2 r"

4.18 E x a m p l e ( S p h e r i c a l distributions)
Let the underlying norm be the/2-norm on R d and let P be a spherical probability,
that is, P is invariant under the orthogonal group O(Rd). Consider the case r -- 2
and n :- 2. Suppose EIIXII 2 < c~. If (~ = (ai, a2) E SS2,2(X) then 0 = E X -=
2
aiP(W(ailo~)) by Remark 4.6 (b) and hence, one can find T E O(R d) such that
i=1
Ta~ = ( c 1 , 0 , . . . , 0 ) a n d T a 2 = ( c 2 , 0 , . . . , 0 ) w i t h c l < 0 < c2. By L e m m a 4 . 7 ,
T(o~) E SS2,2(X). Since

W (Tai]T(~) ) = W (ci](cl, c~}) × ~d-1,

one gets (cl, c2} C SS2,2(X1). Note that X1 is symmetric (about the origin). Now we
use a uniqueness result which is discussed in the next section. Suppose the distribution
of X1 is strongly unimodal. Then it follows from Theorem 5.1 that S2,:(X1) =
{{-E]X1], E]Xll}). Therefore, cl = -SIX1] and c2 = E]X1]. This yields

c2,z(x) = s&,2(x) = ((-a,a) : a c R ~, IJall = EIX~]).

Since P(W(:t:a](-a, a})) = ½, it follows from Remark 4.6(c) that

v2,~(x) = elJxll ~ - (EIX1 I)2


= d E X ~ - ( E I X d ) 2.
= ( d - 1 ) E X ~ + V2,2(X1).
54 I. General properties of the quantization for probability distributions

4.19 E x a m p l e ( U n i f o r m d i s t r i b u t i o n o n a n e u c l i d e a n b a l l )
Let the underlying norm be t h e / s - n o r m on R d and let P -- U(B(0, 1)). Then P is a
spherical distribution. Consider the case n = 2 and r = 2. The distribution function
of X1 is given by

F(t) = P ( [ - 1 , t ] × R ~-1)
t

_ 1 /Aa-I(Bd_,(0, l~T~--y2))dy
~qB.(0, 1))
-1
t
P
= )~d-l(Bd_l(O , 1))/(1 - yS)(d-W: dy
~d(B~(O, 1))
-1
t
r(i+3) f
- ¢-~r(½ + 3) J(1 - ys)(d-,)/s dy, ttl <_ 1.
-1

We thus see that X1 has a Beta distribution. Since log[(1 - y2)(d-1)/s] is concave on
( - 1 , 1), the distribution of X1 is strongly unimodal. So Example 4.18 applies. We
have

1
2F(1 + ~) f
E[XIJ- v~C(½ + ~) 5 y(1 - y2)(d-WSdy

2F(1 + ~-)
-- (d + 1)v~F(½ + ~)'
1
E X ~ = "d + 2

and hence

v~,s(x) - d
a + 2 (EIX~I)~"
4.20 E x a m p l e ( d - d i m e n s i o n a l s t a n d a r d n o r m a l d i s t r i b u t i o n )
Let the underlying norm be t h e / s - n o r m and let P = Nd(O, I~) where I~ is the unit
matrix. Let r = 2. Here Example 4.18 applies and we obtain

cs,dx)) = & , s ( x ) = { { - a , a } : a ~ R ~, Ilall = V ~ - ; }


and
2
V2,2(X) = d - - .
7(

Now assume d = 2. For n = 3, we immediately find two types of 3-stationary sets


of centers. Let ~1 = 7 × {0}, where 7 = { - c , 0, e} with c > 0 is the uniquely
4. Basic properties of optimal quantizers 55

determined 3-optimal set of centers for X, of order 2 (cf. Theorem 5.1). By Lemma
4.8, al 6 $3,2(X) and

E min ]iX - aJJ2 = Va,2(X,) -5 1.


a6~l

The numerical solution is given by c = 1.2240 and

Z m i n H X - a H 2 = 1.1902
a6at

(cf. Table 5.1). As second configuration consider

2 '

with b > 0. Then conv a2 is a equilateral triangle and P(W(a[o~)) = 1/3, a 6 a2. If b
satisfies

b= 3 f x2 dN2 (0,12) (x)


tJ

W((O,b)]a2)
oo

:3// x2 dN(O, 1)(x2) dN(O, 1)(x,)


I,d/,5

= 3 f ~o(Iz, I/~) dN(O,1)(xl)


3v~
- - - - 1.0364... ,
2v~
where ~ denotes the A-density of N(0, 1), then o~2 6 Sa,2(X) and by Remark 4.6 (c)

E min J]X - aJ]2 = 2 - ~ Hall2/3


a6a2
a6ot2
27
= 2 - ~ = 0.92~7 . . . .

Note that a2 is considerably better than al. Flury (1990) provides numerical evidence
that a2 6 C3,2(X).
For n = 4, we find three types of 4-stationary sets of centers. Let fll = 3' × {0}, where
7 -= { - c 2 , - c l , cl,c2} with 0 < Cl < c~ is the uniquely determined 4-optimal set of
centers for X1 of order 2 (cf. Theorem 5.1). Then by Lemma 4.8, fll 6 $4,2(X) and

E m i n [ [ X - a[[ 2 ---- V 4 2 ( X 1 ) -5 1.
a6~1

The numerical solution is given by cl = 0.4528, c2 = 1.5104 and

E min ]iX - aJJ 2 = 1.1175


56 I. General properties of the quantization for probability distributions

(cf. Table 5.1). Next consider

with b > O, where b solves the equation


oo co

b / ( 2 ~ ( v / - 3 y ) - 1) dN(O, 1)(y) = / ( 2 ~ ( v ~ y ) - 1)y dN(O, 1)(y).


,ur

b/2 b/z

Here • denotes the distribution function of N(0, 1). Then for a E/32, a ¢ (0, 0)

aP(W(ai&)) = f xdP(x),
W(al~2)
P(W(al&)) = P(W((O, b)l&) )
oo

= - 1) dN(O,1)(y).
hi2
Since ~ a = (0, 0), this implies
aE~

S xdP(x) = (0, 0).


w((o,o)lZ2)
Therefore,/32 E $4,2(X) and by Remark 4.6 (c)

E maE~
i n l l X - all 2 = 2 - ~ Ilall2P(W(al&))
aGfl2
oo

= 2 - 3b2 i ( 2 ¢ ( , z ~ 1 - ~) dN(O, 1)(y).


b12

The numerical solution is given by b = 1.2791 and

E m i n [IX - all ~ = 0.8203,


aEfl2

The product quantizer/73 = {-X/2X/~, x / ~ } 2 beats/31 and ~ . In fact, by L e m m a


4.8,/33 E $4,2(X) and

E min ]IX - a][ 2 -- 2V2 2(X1) -- 2 _ _4 _- 0.7267 . . . .


aE/~3 ' 71-

G r a y and Karnin (1982) provide some numerical evidence for their conjecture that
/3i, i = 1, 2, 3 are the only 4-stationary sets of centers of order 2 (up to 12-isometries).
4. Basic properties of optimal quantizers 57

But a formal proof of this conjecture has not yet been given. Figure 4.3 shows the
above stationary sets and the corresponding Voronoi tesselations. (Instead of f13, a
rotated version of fla is used.)
In three dimensions the product quantizer { - X / ~ , vf2-~} 3 can be improved upon.
For n = 8, Gray and Karnin (1982) give three different configurations that beat
the product quantizer. The authors report simulation results to show that these
quantizers are superior. Iyengar and Solomon (1983) provide similar results based on
numerical integration.

4.5 Stability properties and empirical versions


Stability and consistency results for the quantization problem are well known. See e.g.
Pollard (1981, 1982a), Abaya and Wise (1984), Sabin and Gray (1986), Piirna (1988,
1990), Jahnke (1988), Cuesta-Albertos et al. (1988), Graf and Luschgy (1994b).
Let 92~r = 9Rr(Rd) denote the set of all Borel probability measures P on R d such that
f Ilxllr de(x) < oo, 1 _< r < oo. Recall that Pr is a metric on YYcrand for Pk, P E 9Rr
p,(p~, P) -~ 0
if and only if
D
Pk P (weak convergence) and f IIxllr dPk(x) -+ f Ilxll~dP(x)
(cf. Rachev and Riischendorf, 1998, Theorem 2.6.4).
A stability property for the n-th quantization error of order r in terms of the Lr-
minimal metric pr follows immediately from Lemma 3.4. If P1,/)2 ~ 9)It, then

(4.4) IV,~,r(P~)'/~ - V,~,r(P2)Url ~_ pr(P1,P2)


for every n E N. A stability result for n-optimal quantizing measures can also be
based on Pr. The Hansdorff metric given by
dH(A, B) = max(max min la - bll, max min Ila - bll }
" aEA bEB b~B aEA

for nonempty compact subsets A, B of R d is convenient for formulating a stability


result for n-optimM sets of centers. Notice that

J
f min
acA
IIx-all" dR(x)) ~/~- ( jfminllx-bll"
~B dP(x))'/"l
(4.5)
<- - \j,(lminlxa---E A aH- min
bEB
IIx- bll rdP(x))'/r

sup I min llx - a l l - m i n l x - bll}


- - xERd I a E A bEB 1

<_dH(A, B).
58 I. General properties of the quantization for probability distributions

," "X
j ................................

-'/ "\,

Figure 4.3: 3- and 4-stationary sets of centers for P = N2(0,12) of order r = 2 and
Voronoi diagrams with respect to t h e / 2 - n o r m
4. Basic properties o f optimal quantizers 59

Let Dn,r(P) denote the set of n-optimal quantizing measures for P 6 Y)~ of order r.

4.21 T h e o r e m
Let Pr(Pk, P) -+ 0 for Pk, P 6 !3Rr and suppose Isupp(P)I > n, n 6 N.

(a) Let Qk 6 D~,r(Pk), k 6 N. Then the set of pr-cluster points of the sequence
(Qk)k>l is a nonempty subset of D~,r(P) and

pr(Qk, D~,r(P)) --~ 0 as k --+ oo.

(b) Let (~k 6 C~,r(Pk), k 6 N. Then the set of dg-cluster points of the sequence
((~k)k>l is a nonempty subset of Cn,~(P) and

dH(ak, C~,r(P)) --+ 0 as k --~ oo.

The preceding theorem can be derived from a simple statement for arbitrary metric
spaces.

4.22 L e m m a
Let (M, d) be a metric space and let N C M be a nonempty subset.

(a) Let f : N -+ R+ be a lower semicontinuous function and suppose the level set

L(c) = {y 6 N : f ( y ) ~ c}

is compact for some c > ienf f ( z ). Let

D = {y • N : f ( y ) = i g f(z)}

and let (Yk)k>l be a minimizing sequence in N for f , i.e., f(Yk) --~ inf f ( z ) .
-- z6N

Then the set of cluster points of (Y~)k>_l is a nonempty subset o l D and

d(yk, D) -+ O, k -4 oo.

(b) For x • M , set D(x) = {y • N : d(x,y) = d ( x , N ) } . Suppose xk -+ x and let


Yk • D(xk), k • N. Then (Yk)k>l is a minimizingsequence in N for f = d(x, .).

Proof
(a) Let y • M be the limit of a convergent subsequence (Yk,)n_>l of (Yk)k_>l. Then
y • L(c) c N and
i g f ( z ) = lim f ( y k , ) >_ f(y).

We deduce y • D. The existence of a cluster point of (yk)~>l follows immediately from


the compactness of L(c) and the fact that (Yk)k_>lis eventually in L(c). This proves the
first assertion. The second assertion is a consequence of the first one. In fact, assume
lim supk_~oo d(yk, D) > 0. Then there exists e > 0 and a convergent subsequence
60 L General properties of the quantization for probability distributions

(Yk,)n>_l of (Yk)k>l satisfying d(yk.,D) _> ¢ for every n _> 1 and limya~ E D, a
contradiction.
(b) Since

d(x,N) < d(x, yk) ~_ d(x, xk)+d(xk, yk)


=d(X, Xk)+d(xk, N)

and d(xk, N) -4 d(x, N), one gets

d(x, Yk) -+ d(x, N) as k --~ co.


[]

To prove the theorem, the following lemma is required.

4.23 L e m m a
Let B C R d be nonempty and compact. Then

(o~ C R~: 1 < lal _< n, ~ c B}

is dH-compact and
(Q e ~,,: Q(B) = 1}
is pr-compact.

Proof
The dH-compactness of {o~ C Rd: 1 ~ [o4 < n, ol C B} follows immediately from
the dH-continuity of the m a p ( a l , . . . , an) ~ { a l , . . . , a~}, ai e R d. Set Q = {Q e
f~n: Q(B) = 1). It is clear that Q is relatively pr-compact. To show that L~ is
pr-closed, let (Qk)k>l be a sequence in L~ and Q c !)~ such that Pr(Qk, Q) --~ O. Set
~k = supp(Qk) and let ~ C R d, 1 < lal < n, a C B be the limit of a dg-convergent
subsequence ( ~ ) j > l of (~k)k>l. Then for e > 0 there is a J0 E N such that

U cU
j>_jo aea
Since
limsupQkj ( U B ( a , ~ ) )
3-~°° aE~
Q(U
<_
aE~
B(a,c)),

one obtains Q( U B(a,e)) = 1. We deduce supp(Q) c a which yields Q E 2 . []


aE~

P r o o f o f T h e o r e m 4.21
(a) To show that the assertion follows from L e m m a 4.22 applied to the metric space
(ffJtr, pr), N = ~,~, and f = pr(P, "), it suffices to verify that

L(c) = (Q e ~ : pr(P, Q) _< c}


4. Basic properties of optimal quantizers 61

is pr-compact for some c > V,~,r(P)1/r. Choose c such that V,~,~(p)Vr < c <
V,,_l,r(P) V~, where Vo,r(P) = c~ (cf. Theorem 4.12). For Q E L(c) and a -- supp(Q),
we have

f m i n [ [ x - alr dP(x ) < pT(p,Q) < cL


aEa

Hence, by Theorem 4.12 (or L e m m a 2.2 in case n = 1)

n(c) C {Q E ~3,~: Q(B) = 1}

for some compact subset B of R d. Using L e m m a 4.23 we deduce p~-compactness of


L(c).
(b) The assertion follows from an application of L e m m a 4.22 (a) to N = {a C R d : 1 <
]c~[ < n} equipped with the Hausdorff metric d r / a n d f : N -+ R+,

f ( ~ ) = f rain ]Ix - a H r dP(x).


J aCa

Note first that f is dH-continuous. This follows from (4.5). Next, consider the level
set
L(c) = {a E N: f(o~) < c}
for Vn,~(P) < c < V,~-I,r(P). By Theorem 4.12 (or L e m m a 2.2 in case n = 1), there
is a compact set B c R g such that

L(c) c {~ ~ N: o~ c 13}.

Using L e m m a 4.23 we deduce dH-compactness of L(c). Finally, we show that (ak)k_>~


is a minimizing sequence in N for f. For k E N, let {Ak,,~: a E ak} be a Voronoi
partition of R d with respect to ak. Set Qk = ~ Pk(Ak,a)5,,. Then Qk E D,~,~(Pk)
aE~k
and
v,~,~(P) 1/~ _< f(ak) 1/" ___pr(P, Q~).
Moreover, p~(P, Qk) -+ V~,,-(P)1/~ (cf. L e m m a 4.22 (b) for (95~r, p~) and N = ~3~).
This implies
f(~k) -~ v~,~(p), k -~ o0.
Thus, we see that all assumptions of Lemma 4.22 (a) are satisfied. []
Notice that Cu,r(P) is d~-compact and Dn,r(P ) is pr-compact provided [supp(P)[ > n.
The stability results can be applied to the empirical analysis of the quantization
problem. Let X1, X 2 , . . . be i.i.d. Rd-valued random variables with distribution P E
k
and let Pk = ~ 5x~ be the empirical measure of X 1 , . . . ,Xa. The empirical
i=1
(sample) version of Vu,r(P) is given by

k
1 inf ~ min [[Xi - a[[L
62 L General properties of the quantization for probability distributions

4.24 C o r o l l a r y ( C o n s i s t e n c y )
Let P C ff)tr.

(a) V~,r(Pk) I/r --~ V~,r(P) l/r a.s. as k --~ oo uniformly in n E N.

(b) Let Qk = Q~(X1,... ,Xk) e D~,~(Pk), k e N, and suppose [supp(P)l _> n.


Then
Pr(Qk, D~,~(P)) --+ 0 a.s., k -+ oo.

(c) Let uk = ~ k ( X i , . . . ,Xk) E Cu,r(Pk), k E N, and suppose )supp(P)l ~_ n. Then

d.(uk, C~,~(P)) -~ 0 a.s., k -+ ~.

Proof
Since Pr(Pa, P) -~ 0 a.s. by the Glivenko-Cantelli theorem for Pr, the assertions
follow from Theorem 4.21 and (4.4). []
Rates of convergence in empirical quantization can be found in Rhee and Talagrand
(1989a), Linder et al. (1994), Bartlett et al. (1998) and Graf and Luschgy (1999c).

Notes

Some material on the issue of this section is contained in Gersho and Gray (1992)
and Graf and Luschgy (1994a). Theorem 4.1 belongs to the folklore of this area.
Theorem 4.2 seems to be new. The characterization given in Lemma 4.4 is due to
Pollard (1982a) for the/2-norm and r --- 2. The Counterexample 4.5 is new. In case
the underlying norm is the/2-norm, the differentiability of ¢n,r (cf. Lemma 4.10) has
been proved by Pollard (1982b) for r = 2 and for arbitrary r a proof is contained
in Pages (1997). Theorem 4.16 is new. Examples 4.18-4.20 on the quantization
of spherical distributions and the d-dimensional standard normal distribution are
essentially taken from Gray and Karnin (1982), Iyengar and Solomon (1983), Flury
(1990), Tarpey et al. (1995), and Tarpey (1995). See also Tarpey (1998).
Let us mention that n-stationary sets of centers are sometimes called self-consistent
sets.
The central limit problem for n-optimal empirical centers of order r -- 2 with respect
to the/2-norm has been solved by Pollard (1982b) under a uniqueness condition for the
n-optimal population centers. A central limit result in a nonregular setting has been
given by Serinko and Babu (1992) for the univariate case, d = 1, and an extension
to non-i.i.d, sampling can be found in Serinko and Babu (1995) for d = 1. Hartigan
(1978) has conjectured the asymptotic distribution of the empirical quantization error
for a special population distribution where the uniqueness condition fails but has given
no proof.
Consistency results for a quantization (clustering) procedure based on a projection
pursuit technique can be found in Stute and Zhu (1995). Stability and consistency
4. Basic properties of optimal quantizers 63

results for a trimmed version of the quantization problem are contained in Cuesta-
Albertos et al. (1997) and a central limit theorem for trimmed quantizers has been
given by Garci£-Escudero et al. (1999).
Theorem 4.1 provides the basis for the famous Lloyd algorithm used to design quan-
tizers. To construct an approximation to an n-stationary set of centers for P of order
r the iterative method proceeds as follows:
Let ~ > 0 be given.
Step 1.
Choose an initial set a(0) of n points in R ~, calculate Co = E min IIX - aiI r.
aEa(o)
Step 2.
Determine a VoronoLpartition .A(i) with respect to a(i).
Step 3.
For each set A E ,4 (~) with P(A) > 0 choose a center a A for the conditional probability
P(.I A) of order r and set ~(i+1) (aA: A E ,A(~)}.
__

Step 4.
Calculate ei+l = E rain ]lX - a l l r. If (ci - e i + i ) < e e i then stop. Otherwise increase
aE~(i+i)
i by one and repeat Step 2,3 and 4.
This algorithm was independently discovered by Steinhaus (1956) and Lloyd in 1957
(see Lloyd 1982). It is often called Lloyd's method I, since Lloyd developed a second
type of algorithm (Method II) to design quantizers in the one-dimensioned case.
Many people rediscovered Lloyd's method later on. For a description of the history
of the algorithm we refer the reader to Gray and Neuhoff (1998). As it stands the
algorithm is hard to use in practice. But if P is a discrete probability with finite
support then the above algorithm can immediately be applied. The properties of
Lloyd's algorithm in the context of general deterministic descent algorithms have been
discussed in Sabin and Gray (1986). Recently Bouton and Pages (1997) thoroughly
investigated a constant step stochastic gradient descent algorithm for the design of
quantizers which is closely related to the Kohonen algorithms used in the theory of
neural networks.
Mentioning just these two algorithms for the design of quantizers is an arbitrary act
since there exists a vast amount of literature concerning this subject. For a survey
we refer the reader again to Gray and Neuhoff (1998).
64 I. General properties of the quantization for probability distributions

5 Uniqueness and optimality in one dimension

In the one-dimensional case there is a reasonable criterion for the uniqueness of n-


stationary sets. This immediately gives uniqueness of n-optimal sets of centers.
In this section let d = 1. Let X denote a real random variable with distribution P
satisfying E[X[ ~ < co for some 1 < r < co. The probability P is called s t r o n g l y
u n i m o d a l if P = hA such that I = {h > 0} is an open (possibly unbounded) interval
and log h is concave on I. Note that such distributions have all their moments finite.
For this and further properties of (nondegenerate) strongly unimodal distributions
we refer to Dharmadhikari and Joag-Dev (1988).

5.1 Uniqueness
The following theorem is due to Kieffer (1983). See also 2Yushkin (1984).
5.1 T h e o r e m ( U n i q u e n e s s )
I f P is strongly unimodal, then [Sn,r(P)[ = 1 for every n E iN, 1 < r < co.

Strongly unimodal distributions are unimodal about some mode a E /R, i.e., the A-
density h of P is increasing on ( - c o , a) and decreasing on (a, co). Example 4.11 (as
well as the subsequent Example 5.2) shows that the assertion of Theorem 5.1 may
fail for unimodal distributions.
In view of Lemma 4.7, the unique n-optimal set of centers of order r for a symmetric,
strongly unimodal distribution is symmetric. It is a surprising fact that symmetric,
2-stationary sets of centers may fail to be 2-optimal for symmetric, unimodal (abso-
lutely continuous) distributions. This is illustrated by the following example taken
from Abaya and Wise (1981). The same phenomenon occurs for truncated Cauchy
distributions, hyper-exponential distributions and for certain variance mixtures of
normal distributions. See Karlin (1982), Tarpey (1994) and Flury (1990).
5.2 Example
Let P = hA with

/ ¢-~"--~
3 - - 12 ~ fzl<l,
h(x) = ~ 7-1xl 1 _< Ixl 7,
| 72 ~ <
(0, Ixl > 7.
P is symmetric and unimodal about 0. Let n = 2 and r = 2. Then V2(P) = V a r X --
~s~ = 5.611... and it is easily verified that $2,2(P) = {a1,~2,~3) with

a1={-1,3}, a2={-3,1}, a3= 36'36 "

One obtains
47
EminIX - al 2 = EminIX - al 2 = -- = 2.611..
aeal aea2 18 '
5. Uniqueness and optimality in one dimension 65

3551
E m i n l X - a] 2 = - - = 2.739...
,e~3 1296

(Use the formula of Remark 4.6 (c).) Hence, C2,2(P) = {a1,(~2} and ½,2(P) = 4_71s,"
see Figure 5.1. We see that the symmetric, 2-stationary set (~3 is not 2-optimal. It is
the sharp peak of the density which causes asymmetric optimal sets of centers (and
prevents P from being strongly unimodal).

v v

Figure 5.1: 2-optimal centers of order 2

Quantization for a symmetric distribution is related to quantization for its one-tailed


version as follows.

5.3 R e m a r k (Symmetric distributions)


Let P be symmetric with P({0}) = 0 and I supp(P)I _> n. Let Q = P(.[[0, co)), the
one-tailed version of P.
(a) For a C / R and k E 1N, a E Sk,~(Q) implies a c (0, co). This follows from Remark
4.6(a) and L e m m a 2.5. For a c (0, co), one obviously obtains a U ( - a ) E S2k,r(P) if
and only if a E Sk,~(Q). In particular, there always exists a symmetric, n-stationary
set for P of order r provided n is even. Example 4.13 shows that this may fail if n
is odd. (Zopp~ (1997) proved the existence of symmetric, n-stationary sets for every
n in case r = 2 under the assumption that P is absolutely continuous and supp(P)
is convex.) Moreover, a U ( - a ) E C,~,r(P) with n 2k implies a E Ck,~(Q) for
=

a c (0, co). This follows from Theorem 4.1 since U W(ala u ( - a ) ) = [0, co).
aE~

(b) We have V,~,,.(P) <_ Vk,r(Q), n = 2k, and equality holds if and only if there exists
a symmetric set fl E C~,r(P). In this case, a E Ck,T(Q) implies a u ( - a ) E C,~,r(P).
In fact, for a C (0, co), I~1 -< k,

V,,,r(P)< f min IX-bFdP(x)


- ./be~uC-~)

= f min tX - al r dQ(x).
, ] aCa

Choosing a E Ck,r(Q) gives V,~,r(P) <_ Vk,r(Q) and if V,~,~(P) = Vk,r(Q), then fi --
a U (-o 0 E C,~,r(P). Conversely, if j3 E C,~,r(P) is symmetric and ~ = ~ A (0, co),
66 I. General properties of the quantization for probabifity distributions

then

V~,r(P) : / minbe#x - bl~ d P ( x )

-- / mina~aIx - aF dQ(x) > Vk,~(Q)

implying V,,,r(P) = Vk,r(Q).

As yet only few examples of distributions P which are not strongly unimodal but
satisfy IS~,r(P)I = 1 are known. See Fort and Pages (1999) and for the particular
simple case n -- 2 and r = 2, Yamamoto and Shinozaki (1999).

5.2 Optimal Quantizers

Let us describe optimal quantizers of orders r = 1 and r = 2 for some common


univariate distributions. For n _> 2, consider real numbers al < ... < a~. Let
m~ = (ai + a~+1)/2, 1 < i < n - 1, and a = { a l , . . . ,am}. Then the Voronoi region
generated by ai takes the form

w(all ) =
W(ai[a) = [mi-l,mi], 2 < i < n - 1,
w(a.l ) = Ira._,, oo).

We assume that P is continuous so that the boundaries of the Voronoi regions have
P-measure zero. Let F denote the distribution function of P. By Lemma 2.5, we
have a c S~,r(P) if and only if P ( W ( a i I a ) ) > 0 for every i and
al ml

--00 ~l

(5.1) / (a,-x)r-'dp(x)=f 2<i<n-1,


mi- 1 04
6r. oo

f
mn-1 an

In case r = 1, the equations (5.1) take the form

2F(al) = F(rn,),
(5.2) 2F(ai) = F(rai) + F ( m , - l ) , 2 < i < n - 1,
2F(a,~) = 1 + F(m,~_t).
5. Uniqueness and optimality in one dimension 67

One obtains, for instance, {-F-l(¼), F -1 (3) } • $2,1 (P) for symmetric probabilities
p . ( F - l ( y ) = inf{x e In : F ( x ) > y}, y • (0,1).)
In case r = 2, (5.1) takes the form

alF(ml) = / xdP(x),
--00
mi

(5.3) ai[F(mi) - F(m~_l)] = / xdP(x),2 < i < n- 1,


rrtl- 1
0o

a~[1 - F ( m , _ l l ] = / xdP(x).
~ t * - - !.

One obtains, for instance, { - E } X [, E IXI } • $2,2(P) for symmetric probabilities P.


Now assume that P -- hA is strongly unimodal. Then by Theorem 5.1, (5.1) has a
unique solution in I -- {h > 0} which provides the n-optimal set of centers a for P
of order r. If, additionally, P is symmetric, we have c~ -- -c~, that is, ai = - a ~ + l - i
for 1 < i < n. Therefore, mk 0 in case n = 2k and ak+l 0 in case n = 2k + 1. In
: =

both cases it is enough to solve the first (or last) k equations of (5.1).
A remarkable property appears for the exponential distribution. It is the content of
part (a) of the following proposition.

5.4 Proposition
(a) Let P = E(c) and let a = { a l , . . . , as} • C~,r(P) with al < . . . < a,~. Then

v~,r(P) = a r.

(b) Let P = D E ( c ) and let a = { a l , . . . , a,,} C C,,,~(P) with al < ... < am. Then

V,~,r(P) = ak+l,
~ if n = 2k,
ak+2/2c

V~,~(P) = rc r / x r - l e -x dx, i f n = 2k + 1.
0

Proof
We may assume without loss of generality that c -- 1. By Theorem 5.1, c~ is the
unique n-stationary set of order r.
68 L General properties of the quantization for probability distributions

(a) We have

Vn,~(P) = E min IX - a, F
l<_i<_n
ml u-i rni

= f o [ x - a l l r e - ~ d x + i ~ 2f ~ _ ._ 'x-ai]~e-~dx

+ j I x - a, lre -xdx
ran--1
al O4

= f (al - x)re-* dx + ~: 2 . ~ ._ (ai - x)~e-*

n--1 rnri oo

i=1
ai an

Integration by parts yields for b < a < c


a a

/ ( a - x)re-X dx = (a - b)re-b - r f (a - x)r-le-X dx,


b b
c c

/ ( x - a)re-X dx : -(c - a)re-C + r / ( x - a)r-le-Z dx,


a a
oo oo

/ ( x - a)re-* dx = r / ( x - a)r-le-X dx = re-aF(r).


a a

Therefore

V,~,,(P) = a~ + ~ ( a , - m,-1)" e -'~'-1 - Z ( m , - a,)" e -'~'


i=2 i=1
n-1 ml oo

q- r ~ / ( x - ai)r-le-X dx q- r / ( x - an)r-le-X dx -
i:1 o4 an

al ai

By (5.1), this gives

V,,,r(P) = a r1.
5. Uniqueness and optimality in one dimension 69

(b) ff n = 2k, the assertion follows from (a) and R e m a r k 5.3. Now let n = 2k + 1.
Then
mk+l k mk+i

v.>.(P) = x'e-: dx + Ix - ak+il e dx


0 k l
OO

+ i Ix - anlre-x dx
mk+l ]g~-I ak+i

- i'"-'"+z i
0 i~2m.k+i_ 1
k rn~÷i oo

q-i~2 ] (x-ak+i)re-xdxq- f
= ak+i an

Again, integration by parts and (5.1) give the desired formula. []

5.5 E x a m p l e ( U n i f o r m d i s t r i b u t i o n )
12i-a
Let P -- U([0, 1]) and let a ----L-~-n : i = 1 , . . . , n}. By Example 4.17, (~ E C~,r(P)
for every r _> 1 and
1
V,~,,(P) - nr(1 + r)2"
Since P is strongly unimodal, a is the unique set of n - o p t i m a l centers of order r.

5.6 E x a m p l e ( D o u b l e e x p o n e n t i a l d i s t r i b u t i o n )
Let P = D E ( l ) , t h a t is, the A-density h of P is given by h(x) = ½ e x p ( - I x ] ) . Then
F(x) = ½exp(x) for x _< 0 and P is symmetric and strongly unimodal. Let r -- 1 and
note that V~(P) = E[X i = 1. For this distribution there exists a closed-form solution
of (5.2). Let Yi = exp(ai/2). For n -= 2k, the first k equations of (5.2) take the form

2yl = Y2,
2yi = yi+l + Yi-1, 2 < i < k - 1,
2y2 = 1 + Yk-lYk (Yo = 0).
The solution of this difference equation is given by Yi = iyl, 1 < i < k, with Yl =
(k 2 + k) -1/2. One obtains
i
a~ ----2 l o g ( ~ ) , 1 < i < k,

, vf~+ k,
ai = 2 1 O g t n ~ ] - - ~ ) , k+l < i < n.

For n = 2k + 1, the first k equations of (5.2) take the form

2yl = Y2,
70 I. General properties o f the quantization for probability distributions

2yi = yi+l + yi-1, 2 < i < k - 1 ,


2yk = 1 + Yk-1.

This leads to Yi = iyl with yl = (k + 1) -1 and hence

ai = 2 log(~--~-]-),
i 1 < i < k,
k+l
ai = 21Og( n + l _ ~ ), k + 2 < i < n.

In both cases {a~,... , a,~} is the unique set of n-optimal centers for P of order 1. For
the quantization errror we have by Proposition 5.4

V~,I(P) = log(1 + 2), if n is even,


2
V,~,I(P) - n + 1' if n is odd;

see Table 5.9.


5.7 E x a m p l e ( E x p o n e n t i a l d i s t r i b u t i o n )
Let P = E ( c ) , that is, P = hA with h(x) = ~ exp(-~)l(0,oo)(x),c > 0. P is strongly
unimodal. Let

a ~ = 2clo g '[ n~7 1 - -+~ n) , , l < i < n .

Then { a l , . . . , am} is the unique set of n-optimal centers for P of order r = 1. This is
a consequence of Example 5.6 and Remark 5.3 (and the scaling property of C~,I(P)
given in Lemma 3.2 (a)). Furthermore, by Proposition 5.4

1
V,~,I(P) = clog(1 + n).

Since V I ( P ) = E I X - ctog21 = clog2, one has to choose c ---- (log2) -I in order to


achieve the norming VI(P) = 1; see Table 5.10.

The equations (5.2) and (5.3) have been solved using MATHEMATICA for various
strongly unimodal distributions. The numerical solutions are given in the Tables
5.1-5.12. (In case P -- E ( 1 / l o g 2 ) , r = 1 and P = D E ( l ) , r ---- 1, we obtained
coincidence with the exact solutions given in Examples 5.6 and 5.7 up to 5 decimal
places.) The behaviour of V~,r(P) reflects the value of the r-th quantization coefficient
of P introduced in Chapter II.

Notes

In the case of smooth densities and r -- 2, Theorem 5.1 is due to Fleischer (1964),
who provided the first uniqueness result. A proof of Theorem 5.1 (for r -- 2) based
5. Uniqueness and optimality in one dimension 71

on the "mountain pass theorem" has been given by Lamberton and Pag6s (1996).
See Cohort (1997) for a detailed exposition. The property of the n-th quantization
error for the exponential distribution described in Proposition 5.4 is new. In the
non-quantization setting n = 1 it has been noticed by Gilat (1988). Example 5.6 is
essentially contained in Williams (1967). However, Williams intends to find n-optimal
centers of order r = 2, but he deals with the equations (5.2) which correspond to r = 1.
Example 5.7 is new. Tables of n-optimal sets of centers of order r = 2 for the normal
distribution N(0,1), the double exponential distribution D E ( 1 / x / ~ ) , the exponential
distribution E(1), and the Rayleigh distribution W(x/~, 2) (cf. Tables 5.1, 5.3-5.6)
can also be found in Cox (1957), Max (1960), Lloyd (1982), Fang and Wang (1994),
Adams and Giesler (1978), and Pearlman and Senge (1979).

n 1 2 3 4 5 6 7 8
al 0 -0.7979 -1.2240 -1.5104 -1.7241 -1.8936 -2.0334 -2.1520
a2 0.7979 0 -0.4528 -0.7646 -1.0001 -1.1882 -1.3439
aa 1.2240 0.4528 0 -0.3177 -0.5606 -0.7560
a4 1.5104 0.7646 0.3177 0 -0.2451
a~ 1.7241 1.0001 0.5606 0.2451
a6 1.8936 1.1882 0.7560
a7 2.0334 1.3439
as 2.1520
V~,2 1 0.3634 0.1902 0.1175 0.0799 0.0580 0.0440 0.0345

Table 5.1: n-optimal centers and n-th quantization error for the normal distribution
N(0,1) of order r - - 2

n 1 2 3 4 5 6 7 8
at 0 -0.7643 -1.2621 -1.6382 -1.9422 -2.1978 -2.4185 -2.6129
a2 0.7643 0 -0.4569 -0.7947 -1.0671 -1.2971 -1.4971
aa 1.2621 0.4569 0 -0.3270 -0.5862 -0.8033
a4 1.6382 0.7947 0.3270 0 -0.2548
a5 1.9422 1.0671 0.5862 0.2548
a6 2.1978 1.2971 0.8033
a7 2.4185 1.4971
a8 2.6129
V,,2 1 0.4158 0.2307 0.1472 0.1022 0.0752 0.0576 0.0456

Table 5.2: Logistic distribution L ( - ~ ) , r = 2


72 I. General properties of the quantization for probability distributions

n 1 2 3 4 5 6 7 8
al 0 -0.7071 -1.4142 -1,8340 -2.2537 -2.5535 -2.8533 -3.0867
a2 0.7071 0 -0.4198 -0.8395 -1.1393 -1.4391 -1.6725
aa 1.4142 0.4198 0 -0.2998 -0.5996 -0.8330
a4 1.8340 0.8395 0.2998 0 -0.2334
a5 2.2537 1.1393 0.5996 0.2334
a6 2.5535 1.4391 0.8330
a7 2.8533 1.6725
as 3.0867
V,~,2 1 0.5000 0.2642 0.1762 0.1198 0.0899 0.0681 0.0545

Table 5.3: Double exponential distribution D E ( ~ 2 ) , r = 2

n 1 2 3 4 5 6 7 8
al 1 0.5936 0.4240 0.3301 0.2704 0.2290 0.1986 0.1753
a2 2.5936 1.6112 1.1780 0.9305 0.7697 0.6565 0.5725
a3 3.6112 2.3652 1.7784 1.4298 1.1972 1.0305
a~ 4.3652 2.9657 2.2777 1.8574 1.5712
a~ 4.9657 3.4650 2.7053 2.2313
a6 5.4650 3.8926 3.0792
a7 5.8926 4.2665
as 6.2665
Vn,2 1 0.3524 0.1797 0.1090 0.0731 0.0524 0.0394 0.0307

Table 5.4: Exponential distribution E(1), r = 2

n 1 2 3 4 5 6 7 8
a1 1.4142 0.9271 0.7108 0.5847 0.5009 0.4407 0.3950 0.3590
a2 2.7353 1.8420 1.4269 1.1798 1.0136 0.8932 0.8014
a3 3.5501 2.4815 1.9577 1.6363 1.4157 1.2537
a4 4.1445 2.9772 2.3842 2.0119 1.7523
a5 4.6135 3.3829 2.7420 2.3325
a6 5.0012 3.7266 3.0505
a7 5.3318 4.0248
a8 5.6199
Vn,2 1 0.3565 0.1836 0.1120 0.0755 0.0544 0.0410 0.0321

Table 5.5: G a m m a distribution F ( ~ , 2), r = 2


5. Uniqueness and optimality in one dimension 73

n 1 2 3 4 5 6 7 8
al 1.9131 1.2657 0.9772 0.8079 0.6947 0.6130 0.5508 0.5016
a2 2.9313 2.1140 1.7010 1.4421 1.2615 1.1270 1.0222
a3 3.4848 2.6325 2.1738 1.8745 1.6599 1.4966
a4 3.8604 3.0025 2.5237 2.2032 1.9688
a5 4.1425 3.2882 2.8001 2.4675
a6 4.3670 3.5197 3.0277
a7 4.5529 3.7136
a8 4.7111
V~,2 1 0.3408 0.1724 0.1042 0.0698 0.0501 0.0377 0.0294

Table 5.6: Rayleigh distribution W (:i7~7,


2 2), r -- 2

n 1 2 3 4 5 6 7 8
al 0 -0.8453 -1.2898 -1.5864 -1.8067 -1.9810 -2.1244 -2.2460
a2 0.8453 0 -0.4734 -0.7974 -1.0412 -1.2353 -1.3959
a3 1.2898 0.4734 0 -0.3303 -0.5819 -0.7838
a~ 1.5864 0.7974 0.3303 0 -0.2540
a5 1.8067 1.0412 0.5819 0.2540
a6 1.9810 1.2353 0.7838
a7 2.1244 1.3959
as 2.2460
Vuj 1 0.5931 0.4258 0.3331 0.2739 0.2327 0.2024 0.1791

Table 5.7: Normal distribution N(0, ~), r = 1

n 1 2 3 4 5 6 7 8
al 0 -0.7925 -1.2716 -1.6218 -1.9000 -2.1314 -2.3298 -2.5037
a2 0.7925 0 -0.4609 -0.7925 -1.0542 -1.2716 -1.4581
a3 1.2716 0.4609 0 -0.3265 -0.5817 -0.7925
a4 1.6218 0.7925 0.3265 0 -0.2531
a5 1.9000 1.0542 0.5817 0.2531
a~ 2.1314 1.2716 0.7925
aT 2.3298 1.4581
as 2.5037
V~,I 1 0.6226 0.4569 0.3620 0.3001 0.2564 0.2239 0.1988

Table 5.8: Logistic distribution L ( ~ )1, r -- 1


74 L General properties of the quantization for probability distributions

n 1 2 3 4 5 6 7 8
al 0 -0.6931 -1.3863 -1.7918 -2.1972 -2.4849 -2.7726 -2.9957
a2 0.6931 0 -0.4055 -0.8109 -1.0986 -1.3863 -1.6094
aa 1.3863 0.4055 0 -0.2877 -0.5754 -0.7985
a4 1.7918 0.8109 0.2877 0 -0.2231
a5 2.1972 1.0986 0.5754 0.2231
a~ 2.4849 1.3863 0.7985
aT 2.7726 1.6094
a8 2.9957
V,~,I 1 0.6931 0.5000 0.4055 0.3333 0.2877 0.2500 0.2231

Table 5.9: Double exponential distribution D E ( I ) , r = 1

n 1 2 3 4 5 6 7 8
al 1 0.5850 0.4150 0.3219 0.2630 0.2224 0.1926 0.1699
a~ 2.5850 1.5850 1.1520 0.9069 0.7485 0.6374 0.5552
a3 3.5850 2.3219 1.7370 1.3923 1.1635 1
a4 4.3219 2.9069 2.2224 1.8074 1.5261
a5 4.9069 3.3923 2.6374 2.1699
a6 5.3923 3.8074 3
aT 5.8074 4.1699
as 6.1699
V,,1 1 0.5850 0.4150 0.3219 0.2630 0.2224 0.1926 0.1699

Table 5.10: Exponential distribution E ( ~ 1) , r = 1

n 1 2 3 4 5 6 7 8
al 1.5958 1.0650 0.8299 0.6922 0.6001 0.5344 0.4826 0.4423
a2 2.9008 1.9829 1.5572 1.3029 1.1310 1.0058 0.9099
a3 3.6870 2.6090 2.0824 1.7586 1.5355 1.3709
a4 4.2542 3.0882 2.4987 2.1281 1.8688
a5 4.6988 3.4774 2.8449 2.4405
a6 5.0646 3.8053 3.1416
a7 5.3754 4.0887
a8 5.6457
Vn,1 1 0.5883 0.4195 0.3265 0.2675 0.2266 0.1966 0.1737

Table 5.11: G a m m a distribution F(a, 2), a = 0 . 9 5 0 8 . . . , r = 1


5. Uniqueness and optimality in one dimension 75

n 1 2 3 4 5 6 7 8
at 2.2501 1.5347 1.2121 1.0204 0.8906 0.7958 0.7229 0.6647
a2 3.3040 2.4268 1.9835 1.7041 1.5079 1.3608 1.2455
a3 3.8697 2.9631 2.4770 2.1592 1.9304 1.7555
a4 4.2516 3.3430 2.8389 2.5014 2.2539
a5 4.5375 3.6352 3.1233 2.7748
a6 4.7648 3.8714 3.3567
a7 4.9527 4.0688
58 5.1124
1 0.5778 0.4091 0.3173 0.2594 0.2194 0.1902 0.1678

Table 5.12: Rayleigh distribution W(a, 2), a = 2 . 7 0 2 7 . . . , r ---- 1


Chapter II

Asymptotic quantization for


nonsingular probability
distributions

It is difficult to find n-optimal quantizers for a fixed number n of quantizing lev-


els at least in the multivariate case. This chapter is concerned with the theory of
asymptotic quantization for nonsingular probability distributions as n tends to infin-
ity. The asymptotic behaviour of the quantization error is derived and the asymptotic
performance of certain classes of quantizers is compared with asymptotically optimal
quantizers. We introduce quantization coefficients which provide interesting param-
eters of probability distributions. They can be evaluated for univariate distributions
and some of them also for multivariate distributions. Moreover, asymptotic quanti-
zation is related to a geometric covering problem.

6 A s y m p t o t i c s for the quantization error

In this section we derive the exact asymptotic first order behaviour of V,~,,.(P) (up to
constants) as n --+ c~ in case P is not singular with respect to Aa.
Let X be a Rd-valued random variable with distribution P, let [I II denote any norm
oo

on R d, and let 1 _< r < oo. Lemma 6.1 reflects the fact that [.J ~-n is a dense subset
of the Banach space Lr(P, Ra).

6.1 L e m m a
Z f ~ l l X l l ~ < oo, then

lim V, r(P) = 0.
78 H. Asymptotic quantization for nonsingular probability distributions

Proof
Let {aba2, a3,...} be a countable dense subset of ]R~ with al = 0. For ~ > 0,
{B(ak, (e/2) l/r) : k E N} is a covering of ~d. Therefore, one can find a Borel measur-
able partition {Ak : k C N} of R d satisfying Ak C B(ak, (¢/2) l/r) for every k. Choose
n E N such that

[,xlr P(x) <


k>n.4 k

7b
and let fn = ~ akl&. Then fn E 9rn and
k =l

Vn.r(P) _< EIIX - A(X)[r

[]
The following theorem in its present general form is stated in Bucklew and Wise
(1982) for the/2-norm. Under some additional assumptions the result is due to Zador
(1963, 1982) (who is also dealing with the/a-norm). See also Fejes T6th (1959) for a
special case.
For a Borel measurable function h : R d + R and 0 < p < oo let

(6.1) ,,hl,p = ( f lh,PdAd) I/p .

F~rthermore. let P = Pa + P8 be the Lebesgue decomposition of P with respect to


Ad, where Pa denotes the absolutely continuous part and Ps the singular part of P.

6.2 Theorem (Asymptotic quantization error)


Suppose EIIXll ~+~ < c~ for some 5 > 0. Let
(6.2) Q~([0,1] a) = inf nr/dM,~,~([O,1]d).
n~l

Then Q~([0,1] ~) > 0 and

(6.3) limoo n~/dv,~,~(P) = Q~([0, 1]d) d~


d/(dTr) "

The proof is given below. For singular distributions, (6.3) only yields Vn,r(P) =
o(n -r/d) provided the above moment condition holds. An investigation of the exact
order of Vn,r(P) for several classes of singular (continuous) distributions P is contained
in Chapter III.
6. Asymptotics for the quantization error 79

6.3 R e m a r k
(a) The moment condition EHXII r+~ < c~ ensures that the limit in (6.3) is finite.
In fact, h c Ll(Ad), h >_ O, and filxilr+~h(x) dx < ec for some 5 > 0, implies
(r+~)d P = -d.~r
h C La/(a+r)(Ad). To see this, let s = d-~r, t = 4--~7--, d+r Then
- and q = --7-"

f h(x) s dx < c~
B(0,1)

and by HSlder's inequality

/
B(0,1) c
h(x)S dx = f
B(0,1) c
h(x)Sllxlltllxlrt dx

B(O,1) c B(0,1) c

since tp = r + 5 and tq = (~+~)d


7" > d.
(b) The converse of the above implication does not hold. Consider, for instance,

h(z) -- 2~(1+~)n2+
1 ~ if x E [2~, 2~+1), n C N.

Then h E LI(A) and

h 1/(l+r) d.~ = n(2+r)/(l+r) < OO

but
2n-i-t

f ¢¢ l / [x[r+~ dx
~=1 2n

2n(~+e)2~
-> 2n(i+~)n2+~
2n6
7?.2+ r (X:)
n,=l

for every ~ > 0. Note that f ]x]~h(x) dx < c¢.


(c) Without any moment condition we still have
dPa
liminf n~/dV~,,(P) > Q~([0,1] ~)
d/(d+r) "

This is contained in the subsequent proof of Theorem 6.2 (Step 5).


The following example shows that the moment condition in Theorem 6.2 cannot be
dropped.
80 H. Asymptotic quantization /'or nonsingular probability distributions

6.4 Example
Let Xk = 3 • 2k-1 and
C
/P(X =xk) - 2~klog2k, k > 2

with norming constant c = ~ ) . Then


k=2

3rc~ ~ k 1
EXr = ')---~- 1~,1o~2k < co,
k=2

EXr+~ = 3r+~2
r ~ k ,s~o~2
2k~ k = co, (f > 0.
k=2

Foro~CNwith]o~ I=n,letI={k>2:o~N[2 k,2 k + l ) = o } . Then f o r k C I

min ]xk - a] r > (xk - 2~)r = 2 (k-1)r


aCvt

and hence
oo

E min IX - al r = m i n l x k - - al r¸ c
k=2 aea 2krk log2 k
c 1
kEI
oo
c 1
k=,~+2 k log2 k

c f __1 dx
>- --7
2 x log2 x
n+2
C
2 r log(n + 2)

This gives

lim nrV,~r(X) = oo.

Here the order of convergence to zero of V,~,r(X) is (logn) -1. In fact, for f~ =
{x2,... , x=+l} one obtains

minbe]~[Xk -- b[r = 0, k < n + 1,


3r2kr
min[xk--b[ r < x ~ - - - 2---7--, k > n + 2
b~fl - - - -
6. Asymptotics for the quantization error 81

and hence

3 rC = ~~+ 1 < 3rc


E~2lX-bl ~ < 2--;- k 2 k l o g 2 k - 2 ~ l o g ( n + l ) "

It follows that
C
- - < lim inf log nV,~ ,.(X)
3re
<_ limsuplognVn,r(X) <_ -~-.

In case Pa # 0 and EIIXII "+' < oo for some (f > 0, the number

(6.4) Q r ( P ) = Q,([0,1] d) d~_~ d/(d+,)

is called r - t h q u a n t i z a t i o n coefficient of the probability P on R d. (Notice that


Q,(P) depends on the underlying norm.) By Remark 6.3(a), 0 < Q r ( P ) < oc. We
also write Q~(X) instead of Q~(P). For A • B(R d) bounded with Ad(A) > 0, define
the r-th quantization coefficient of A by

(6.5) Q,(A) = Q,(U(A)).

Quantization coefficients provide interesting parameters of a probability distribution.


They are discussed in Sections 8-10.

We shall need the following lemmas for the proof of Theorem 6.2.

6.5 L e m m a
Let P = sP1 + (1 - s)P2, 0 < S < 1, f Ilxllr dP~(x) < c<). Suppose

.'/dY.,.(P1) -~ c e [0, ~ ] a s . -+ ~.

Then

(a)

liminf n'ldv~,,(P) >_ sc + (1 - s)liminfn'ldv,~,,(P2),


lim sup nr/dv,~,,(P) < 8(1 - ¢)-'/dc + (1 - s)C "/d lim sup nr/dvm,(P2)
Tt,'-~O0 I'~--+OQ

for every 0 < c < 1.

(b) If lim nrldv~,~(P2) = O, then


n-~ oo

lim nr/dvn,r(P) = sc.


r~-~ o o
82 II. Asymptotic quantization for nonsingular probability distributions

Proof
(a) The first inequality follows immediately from L e m m a 4.14 (a). For 0 < ~ < 1, let
nl = nl(n,~) = [(1 - ¢ ) n ] and n2 = n2(n,¢) = [¢n]. ([x] denotes the integer part of
x E R.) T h e n by L e m m a 4.14 (b)

nr/dv,~,~(P) <_ snr/~Vm,~(Pl) + (1 - s)n~/dvn~,r(P2)


= s(n/nl)~/dn~/dvm,r(Pi) + (1 -- s)(n/n2)~/dn~/dv~,~(P2)
for every n ->- max{~,l-~_~}.
--
Since n__
~i
_+ ~ and n__
1%2
__+ ! as n -+ (x), this gives the
second inequality.
(b) follows from (a). []
The following result is due to Pierce (1970).
6.6 L e m m a
Let d = 1. Then

V~,r(P) <_ n-~(C1EIXF +~ + C2), 6 > 0, n _> Ca


for numerical constants C1, C2, C3 > 0 depending on 5 and r (but not on P and n).

Proof
First, assume P((1, oo)) = 1. We use a r a n d o m quantizer argument. Consider i.i.d.
Pareto-distributed r a n d o m variables Y1,... , Y~ independent of X with distribution
function
1 - y-(~/r), y > 1
G(y) = O, y <_ l.

Set b = 6/r. T h e n

V,~,~(P) < E min IX - Y~I~ = f E min [x - YiF dP(x)


-- l<_i<_n j l<_i<_n

and for x > 1


oo

E min Ix - 11//[~ = r f/P( min lx - Yil > t) tT-1 dt


l<_i<_n j "l<i<_u
0
oo
= r / [ 1 - G(x + t) + G(x - t)]nt r-i dt
0

x--1

= r f [1 - (x - t) -b + (x + t ) - b ] n t r - 1 dt
0
oG

+ r f (x + t ) - ~ t ~-I dt ---: Jr(x) + J2(x).


X--1
6. A s y m p t o t i c s for the quantization error 83

Since

(x-t) -b- (x + t) -b > 2btx -O+b), O < t < x - 1

and thus

[1 - (x - t) -b + (x + t)-b] n <_ [1 - 2btx-O+b)] n


<_ e x p ( - 2 n b t x - O + b ) ) , 0 < t < x - 1

one gets

OO

Ji(z) <_ r / e x p ( - 2nbtx-( l +b))t r-1 dt


0
= F ( r + 1)x ~(l+b)

= F ( r + 1)rrx r+a O x r+a


--:--, n > l.
nr(25) r nr

Furthermore,
CO

4(x) _< , - / ( 1 + t)-/mt r-1 dt


0
_ r(,- + 1)r(bn - r)
r(b,~)
< C2
-- n>ca
-- ,/l, r ' --

where the constants c2 _> 1 a n d Ca > 0 d e p e n d only on 5 and r. This gives

y~,r(P) <_ ~-r(clEIXI ~+~ + c2), ~ > 0, ~ _> c3.


Since V ~ , ~ ( - X ) = Vn,r(X) by L e m m a 3.2, the above inequality also holds in case
P((-co,-1)) = 1. If P ( [ - 1 , 1]) = 1, choose (~ = { - 1 + 2/-1: i = 1,.. n}. T h e n

V.:(P) <_ E ~ n l X - a l r < n -r, n>_ 1.

Now let P be arbitrary. We m a y write

P = S l P ( - I ( - c o , - 1 ) ) + s a P ( . l [ - 1 , 1]) + saP(-I(1, 0o))

with s~ = P ( ( - c o , - 1 ) ) , s2 = P ( [ - 1 , 1]), and sa = P ( ( 1 , co)). Let nl = In/3]. F r o m


L e m m a 4.14 (b) and the above inequalities we deduce

v~,r(P) < ~ ( c ~ E I X I r+a + ~), ~ > ~3.


84 H. Asymptotic quantization for nonsingular probability distributions

This implies

V~,r(P) <_ n-~(5rclEIXr ~+6 + 5~c2), n _> 5ca.


[]
Lemma 6.6 is easily generalized to arbitrary dimensions d.
6.7 C o r o U a r y
We have

V,~,r(P) < n-r/u(ClEllX[I r+6 + C2), 5 > O, n > Ca


for numerical constants C1, C~, Ca > 0 depending only on 5, r, d and the underlying
norm.

Proof
For p > 1, let I[ [[p denote the /p-norm on R ~. Recall that all norms on R d are
equivalent. Hence

V,~,r(P) < coV,~,r(P, [1 IIr)


where Vn,,-(P, ][ ]Jr) denotes the n-th quantization error for P of order r with respect
to t h e / r - n o r m and Co > 0 is a constant depending only on r and the norm [[ [I- Let
nl = [nl/d]. By Lemma 4.15 and Lemma 6.6
d

v,,,,(P, II II,-) _< ~ v~,,,(x,)


i=1

Ttl r C1 E[Xi[ rWtf+ de2 , 5 > 0, nl _> Ca


i= l

<_n -rId 2rc~ E I N [ r+e+2rdc2 , 5>0, n>(2ea) d


i=1

where the constants cx, c2, ca > 0 depend only on 5 and r. Furthermore,
d

= EllXlk+~ < c4EIIXlr +~


i=~l

for a constant c4 > 0 depending only on r, 6 and [[ [1. This proves the assertion. []
We further need an elementary lemma.
6.8 L e m m a
For m E N and numbers 8i > O, let

B = { (v,,... ,vm) E (O,c~)m : E v i < l}


i=1
6. Asymptotics for the quantization error 85

and

s~/(d+r)
ti-- m , l<i<m.
E s~l(d+~)
j=l

Then the function

F : B --+ R+, F(Vl,... , v,~) = ~ sir'[ ~/d


i=1

satisfies

F(tx,...,tm)= rain F(vl,... vm)= s~/(d+r)


(v~,.., ,v.,)~B
i=I

Moreover, ( t l , . . . , tin) is the unique minimizer o f F .

Proof
By HSlder's inequality (for exponents less than 1) with p = d/(d + r) and q = - d / r ,
one obtains

F(vl, . . . , v , ~ ) >_ v7 rid q

= F(tl,...,t,~)

for (Vl,... , vm) C B. To get equality F ( V l , . . . , v~) = F ( t l , . . . , t~) it is necessary


m

(and sufficient) that ~ vi = 1 and si = cv~/p, 1 < i < m, for some constant c > 0.
i=1
This implies ti = v~ for every i. []

P r o o f o f T h e o r e m 6.2
The proof is given by a sequence of steps from the uniform case to the general case.
S t e p 1. Let P -- U([0,1]d). Let m , n E N, m < n and let k = k ( n , m ) -- [(~)l/d].
Choose a tesselation of the unit cube [0,1] d consisting of k d translates C1,. • • , C ~ of
kd
the cube [0, ~]d. Then P = k-d~_, U(C~). By Lemma 4.14 (b) and the translation
i=l
86 H. Asymptotic quantization for nonsingular probability distributions

and scale invariance of Mm? (see L e m m a 3.2 (b)),


kd

v.,~(P) <_k -~ ~ vm,~(u(c,))


i=1
kd

= k-d E Mm,~(Ci)k-r

= k-rM,~,~([0,11 d)
= k-rVm,~(P)

and hence

~/%,~(p) <_(k ~__A1)~~/dUm,~(p).


This implies
limsupnr/4v,~,~(P) < m~/dvm,~(P)

for every m E N. Therefore, lin~_~oQ nr/dVn,r(P) exists in [0, c~) and

(6.6) l i m nr/dVn,r(P) = Qr([o, 1]d).

From Theorem 4.16 it follows that @([0, 1] d) > 0.


m
S t e p 2. Let P = ~ siU(Ci), where { e l , . . . , Cm} is a packing in N ~ consisting of
i=1
closed cubes whose edges are parallel to the coordinate axes and with common length
of the edges l(Ci) = l > O, si > 0, ~ s~ = 1. Set h = dP/dA d = ~ sil-Qc,. We may
i=1 i:1
assume without loss of generality that si > 0 for every i. For n E N, let
a/(a+r)
8i
ti -- m andni=ni(n)=[tin],l<i<m.

j=l

Then by L e m m a 4.14 (b) and L e m m a 3.2


Ft~

v.,r(P) < ~ s,v.,,r(u(cd)


i=1

= E s i V , , ~ ( V ( [ O , lld))F, n > max (1/ti).


' -- l<_i<m
i=1

From Step 1 it follows that

,~/~v.,,~(u([o, 1]d)) = (~/,~,)'/~/~v.,,r(u([o, 1]~))


-+ tY~r/gQr([O, 1]~) as n --+ oo.
6. Asymptotics for the quantization error 87

This implies

lim sup n~/dV,~,~ (P) <_ Q~ ([0,1] d) E sitY~/dl~


(6.7) rt--~oQ i=-I

= Q~([0, 1]a)llhlld/(d+r).

To prove that Q~([0, 1]d)llhlld/(d+~) is a lower bound for L ----liminf~o~ nr/dv,~,r(P),


let fl ---- ~(n) C C~,r(P) for n • N, ~i -- ~i(n) = / ~ DintCi, and ni = ni(n) = I~d,
1 < i < m. For 0 < e < I/2, let Ci,¢ c Ci be a parallel closed cube with the same
midpoint as Ci and edge-length l(Ci,~) = l - 2~. Choose a finite set 7i = 7i(e) C Ci,e,
]7~] = k = k(~) such that
min [Ix - all < inf ]Ix - y[[ for every x E Ci e, 1 < i < m.
aETi - - yEC~

Then

Vn,,.(P) = ~--~ si f min Ix - bl[r dx l -d


be~
i=1 Ci

_> ~ s i f be,u~,minHx - bHr dx l -d


i=l Ci,~

= E si f be~,u-y,miIIx
n - b[I~ dx l -a
i=1 Ci,e

> ~ siVn,+k,r(U(Ci,e))(l - 2e)d/-d


i=1
m

= EsiV,~,+~,,(V([O, l l d ) ) ( / - 2e)d+r/-d, n • 1~.


i=1

Choose a subsequence (also denoted by (n)) such that


ni
- - -4 vi E [0, 1], 1 < i < m and n~/4V,~,r(P) -4 L as n -4 oo.
n

Since ~ ni < n, we have vi _< 1. Furthermore, vi > 0 for every i. Otherwise,


i=1 i=1
Step 1 yields L -- 0% which contradicts (6.7). By taking a further subsequence we
can assume without loss of generality that
lim (ni + k)~/dV,~,+k,~(U([O, 1]d)) = Qr([O, 1]d), 1 < i < m.

This implies

L _> Qr ([0,1] d) ~ siv~r/d(l - 2c)d+~l -d.


i=1
88 II. Asymptotic quantization for nonsingular probability distributions

Since 0 < ~ < I/2 is arbitrary and by Lemma 6.8, one obtains

L > Qr([o, 1]d) E siv:'/dl"


i=1

_> Q,([O, 1]d s/("+')


i=1

= Q,([o, 1]~)llhlla/(d+,).
Hence, (6.3) holds in this setting.
S t e p 3. Let P be absolutely continuous with respect to ,kd and assume that P has a
compact support. Let supp(P) C C for some closed cube C whose edges are parallel
to the coordinate axes with edge-length l(C) = l. For k E N, consider a tesselation of
C consisting of k d closed cubes C b . . . , Ckd of common edge-length l(Ci) = l/k. Set

k~
P~ = ~ P(c~)u(c~),
i=1

dad - = ~ (cd

where )td(C,) = (l/k) d, and h = dP/d)~ a. By differentiation of measures

hk --~ h )~a-a.s. as k --+ c<~

(cf. Cohn, 1980, Theorem 6.2.3). Therefore, by Scheff6's lemma

lim link - hll, = 0.


k--~-oo

Since

Ilhk - htl~/(d+,) < :ad(C)'/dlihk - hll~,


h a/Cd+,) h dlCd+,) h ~/(d+,)
k all(d+,)- all(n+,) <-[Ihk- d/ca+,)

this implies

lira IIn~ll~/cd+,)= Ilnll~/(~+,).


k--~oo

Furthermore, by Step 2

(6.9) limoon'/dv,~,,.(Pk ) = Q,([0, 1]~)llhklk/(a+,), k • N.

For n • N and 0 < e < 1, let nl = nl(n,e) = [(1 -e)n] and n2 = n2(n,e) = [(en)l/d] d.
Consider a tesselation of C consisting of n2 closed cubes of edge-length I/n~/d and
6. Asymptotics for the quantization error 89

let 7 = 7(n2) denote the set of its n2 midpoints, that is, 7 corresponds to the cube
n2-quantizer for C. Then

min Hx-~ll <_ 2n~/d


cl---kl< (on)l~-----
~g ~, x • C, n > 1/~

for some constant cl > 0 depending only on the underlying norm. Let ~(nl, k) •
C,~l,r(Pk) and 5 = 5(n, k,e) = c~(nl, k) U 7(n2). Then 151 < n and

n ~14 f
J
min
ae6(n,k,e)
Ilx--allrdPk(x) - f min
J ae~(u,k,e)
IIx-aHrdP(x)

- j a~5(n,k,~)

< ~llh~ - hll~

--: c211h~ - hll~, k • N, n > max~


-
1-, ~ .
te 1-cJ

This implies

nr/dV,~,~(P) < n r/d [ rain IIz - ~ll" dP(x)


- J ae6(n,~,~)

< n ~/d f min IIx- all~dPk(x) +~llh~ - hll~


- J ~e6(,,,k,~)

< n r/d jf
-
rain
¢,e,~(m,~)
[ i x - ~lrdPk(x) +c~llhk-- hll~
= n~/dv.~,~(Pk) + c211hk - hll~, k • N, n > m a x ~ 1, ~ }
- ~e 1-e

and hence by (6.9)

limsupnr/dV~,r(P) < (1 -- e)-r/dQr([O , 1]d)llhklld/(d+~) + c211hk- hll~, k • N.


~--->00

Letting k tend to infinity and then letting e tend to zero yields

(6.10) lim snpnr/dV,,r(P) < Q~([O, 1]~) IIhlld/(d+r).

To prove the converse estimate, let j3(nl) • Cm,r(P ) and ~- = 7-(n, e) = ~(nl)UT(n2).
Then H -< n and as above

nr/d f f
_<c2IIh~ hL11,
J a~(,~,~) J ~(,~,~)
k E N , n>_max e , ~ _ e .
1}
90 H. Asymptotic quantization for nonsingular probability distributions

This implies

n'`/dVnbr(P) nr/d f min IIx - all'`dP(x)


J
> nr/d[ min I I x - a l F d P ( x )
-- J ae~'(n,~)

~ n rId f min lix - all" dPk(x) - c211hk- hlh

>_ n'`/dv,~,r(Pk) - c2Nhk - hlh, k • N, n >_max{-[,~_c}.l 1

Therefore
(1 - ~)-'`/"lim~,~;/%,,'`(P) > Q'`([0, lld)llh~ll~(d÷'`) -- c:llh~ -- hill, k • N.
Letting k tend to infinity and then letting c tend to zero yields
lira inf n'lav,~,'` ( P) = lim inf n~ldvm,, (P)
(6.11) ~-~oo
> Q'`([o, 1]d)llhlla/(a+'`).
Hence, (6.3) holds in this setting.
S t e p 4. Let P be singular with repect to /~d and assume that P has a compact
support. Let supp(P) C C for some open cube C whose edges are parallel to the
coordinate axes with edge-length l(C) = l. For any ~ > 0, there is an open set A C C
such that P ( A ) = 1 and AS(A) <_ ~. Moreover, there exists a countable partition
{Ci : i • N} of A consisting of half-open cubes C~ ¢ ~ with edges parallel to the
coordinate axes (cf. Cohn, 1980, Lemma 1.4.2). Choose m ~ 51 such that

F E P(Ci) <_ e'`/d.


i>_m+l
For n • N, let
~d(c~)
8i ~ m ni = ni(n) = [(sinl2)lla} a, l<i<m,
E Aa(cj)
j=l
n~+l = n~+l(-) = [(-/2)1/d] ~.
For 1 ~ i _< m, consider a partition of Ci into r~ cubes of common edge-length
(Ad(Ci)/ni) 1/d and let ai = (~i(nd) denote the set of its ni midpoints. Furthermore,
consider a partition of C into rnm+l cubes of common edge-length #/~l/d and let
m+l
~m+l = O~m+l(nm+l) be the set of its nm+l midpoints. Let ~ = (~(n) : LJ (~i. Then
i:l
I~1 _< n and
c,~d(C~)~l d c,~<~(C~)~la
min IIx - all < <
~e~(~) - 2n~/~ -
(s~/2)'/~
czl/a
<- (n/2)~l----~' x • C~, n >_ 2/s~,
6. Asymptotics for the quantization error 91

cl
min I I z - a l l < - -
aea(,~) -- (n12) lid'
xEC, n>2
--
for some constant c > 0 depending only on the underlying norm. This implies

nr/dVnr(P), _< f oo,.,minIlx - all~ dP(x)

= ~ '~r/~f min ll= - all~dP(=) + '~/~ min [ x - all r dP(x)


aEa(n)
i=1 Ci A\Um=lCi

<_ ¢~z'/~"1~~ P(cO + c"~.l~l" ~ P(C~)


i=1 i:>m+l

< 2c~2rlde ~/d, n > max (2/si)


-- -- l<i<_m

and hence
limsupnr/aV,~,r(P) < 2cr2r/%r/d.
~ - + oo

Since c > 0 is arbitrary, one obtains


(6.12) lira nr/aV,,,~(P) = O.
~--+oo

S t e p 5. Let P be arbitrary. Set h = dPa/dA a. For k E N, let Ck = [ - k , k] d. Then


hlek d P~('ACk)
P(-lCk) = f(Ck) A + P(Ck)
is the Lebesgue decomposition of P(.ICk) with respect to Ad. From Steps 3 and 4 and
Lemma 6.5(b) it follows
hlok
(6.13) nlinaoonr/aVn,r(P(.]Ck)) = Qr([0,1] d) P--~k) a/a(a+r)

Since V,~,~(P) >_ P(Ck)V,~,~(P(.]Ck)), this gives


liminfn~ldv,~,r(P) > Qr([0, l]")llh le~ Ih/(d+~).
Therefore, by the monotone convergence theorem as k -+ cx)
(6.14) lim~f n~/aV,~,,(P) > Q,([0, 1]a)llhHa/(d+,).

S t e p 6. Now suppose EHX[[ r+6 < (x) for some 6 > 0. Set h = dPa/dA d. For k E N,
let C~ = [ - k , k] a. Let 0 < e < 1. Using the decomposition P = P(Ck)P(']Ck) +
P(C~)P(.IC~) , it follows from (6.13) and L e m m a 6.5 (a) that
lim sup n~/dv,,,r(P) < (1 -- e)-~/dQ~([0, 1]d) ]]h lck ]Id/(a+~)
r~--+oo

+ f(C~)c -~/a lim supnr/aV,~,~(P(.IC~)).


92 H. Asymptotic quantization for nonsingular probability distributions

By Corollary 6.7,

P(C~)nr/dV,~,r(P('[C~)) < cl /I[x][r+6dP(z) + c2P(C~), n > 63


c~
(with constants cl, c2, c3 independent of k) and the above moment condition implies

lim f ]lxl[~+~dP(x) = O.
k-~oo , ]
c~

Therefore, letting k tend to infinity in (6.15) and then letting ~ tend to zero yields

(6.15) lira sup nr/dv~,~(P) <_Q~([0,1] d) I]h lld(,+~).

In view of (6.14), the proof is complete. []

Notes

The approximation (6.3) of the n-th quantization error occured apparently for the
first time in Panter and Dite (1951) for univariate absolutely continuous distributions
and r = 2.
The proof of Theorem 6.2 for distributions with compact support is a simplified ver-
sion (and an extension to arbitrary norms) of Bucklew and Wise (1982). The crucial
point in their treatment of distributions with unbounded support is a "compander" re-
sult whose proof is not complete (cf. Linder, 1991). As shown above, the unbounded
case can be resolved via the Pierce-Lemma 6.6 and its generalization to arbitrary
dimensions (Corollary 6.7).
The asymptotics for empirical versions of the quantization problem (or related lo-
cation problems) when both the level n and the sample size tend to infinity were
studied in Hochbaum and Steele (1982), Wong (1984), Zemel (1985), Rhee and Ta-
lagrand (1989a), McCivney and Yukich (1997), Yukich (1998), PStzelberger (1998b),
and Graf and Luschgy (1999c).
7. Asymptotically optimal quantizers 93

7 Asymptotically optimal quantizers


Let X be a/Rd-valued random variable with distribution P, let [[ [[ denote any norm
on /R d and let 1 _< r < c~. In the light of Theorem 6.2 a sequence (an)n>1 with
an C //~d, ]an[ _< n, is called a s y m p t o t i c a l l y n - o p t i m a l set o f c e n t e r s for P o f
o r d e r r if

(7.1) lim nr/dE min IIX - all r = Q r ( P )

provided Pa ~ 0 and EIIXL[ ~+~ < c~ for some 5 > 0. Here Q~(P) denotes the r-th
quantization coefficient of P as defined in (6.4). Notice that if (an)n>1 is asymptoti-
cally n-optimal of order r and {Aa : a c an} denotes a Voronoi partition o f / R d with
respect to an, then (fn)n>~ with fn = ~ alA~ E J:~ is an asymptotically n-optimal
o*E~n
quantizer of order r, that is

(7.2) lim
n--+OO
n~/dEIIx - fn(X)ll ~ -- Q~(P).

7.1 Mixtures and partitions

The following lemma is related to Lemma 6.8.

7.1 L e m m a
Let P = s~p~, s~ > O, ~ s~ = l, f llxllr+~dP~(x) < co for some S > O, P~,, ¢ O for
i=1 i=1
every i.

(a) Q~(P) <_ ( ~=1(siQr(P~)) d/(d+~)) (d+~)/d

(b) Suppose
/ ,n \ Cd+~)ld
(7.3) Q~(P)= (i~=l(siQ~(Pi))d/(d+~)) .

For n E 1N, let


(s~Qr(PO)d/(d+r)
(7.4) t~ =
E(sjQr(Pj))d/(d÷~)
j=l

ni = ni(n) = [tin], ai,~ E Cn,,r(Pi), an = U ai,~. Then (an)~>_l is an asymptot-


i=1
icedly n-optimal set of centers/'or P of order r.
94 H. Asymptotic quantization for nonsingular probability distributions

Proof
(a) and (b). By Lemma 4.14(5)

V~,~(P) < f min llx - a[[~ dP(x)


J aCctn

<_ ~ s~V,~,,r(P~), n >_ max (1/ti)


l<_i<_m
i=-i

and by Theorem 6.2

lira nr/dV,~,,~(P~) = t ~ / d Q ~ ( P i ) , 1 < i < m .


~--~oo

This implies

Q~(P) <_ liminfn ~/~ f min


n-~oo j aEa,~
IIx - alFdP(x)
< lim sup n ~/e f min IIx - all ~ dP(x)
~--¢'00 J ~EOtn

<--~ siQr(Pi)t; r/d


i=1
[ ,~ \ (d+r)/d
= //_~1 (8iQr (P/)) d/(d+r)fl •

[]
For P with P~ = h)t d # 0 and h E Ld/d(d+r)(Ad), define

hd/(d+r)
(7.5) h~ = f hd/(d+r) dad, Pr = h~A~.

Recall that by Remark 6.3, the condition h E Ld/(d+r)()~d) is satisfied in case


EIIXll ~+~ < c¢ for some 6 > 0. The probability distribution Pr will play a cen-
tral role.

7.2 L e m m a
Suppose P~ ~ 0 and EIIXI] r+~ < c~ for some ~ > O. Let {A1,... ,Am} be a P-
packing in j ~ d (i.e., P(Ai n A3) = 0 for i # j) such that P~(Ai) > 0 for every i and
P ( O A,) = 1. Then for the mixture P = ~ P(Ai)P(.]Ai), (7.3) is satisfied and (7.4)
i~-I i=-i
takes the form ti = Pr(Ai) for every i.
7. Asymptotically optimal quantizers 95

Proof
Let P~ = hA d. We have P('[Ai)a = hlA~Ad/P(Ai) ~ 0 and thus

Q~(P('IA,)) = Q~([O,1 ] d ) ( f h d/(d+~)dA d~(~+r)/~


) /P(A,)
Ai

= Q~([O, 1]a)(Pr(Ai) f ha/(d+~)dA a) (a+~)/d/p(Ai)

= Q~(P)Pr(Ai)(d+r)/d/P(Ai).
Therefore

(P(Ai)Qr(P(.IAi) ) ) g/(d+~) = Q~( P)a/(a+~)Pr(Ai).

Since P~ and P~ are mutually absolutely continuous and hence

i=1 i=1

this gives the assertions. []


An interesting consequence concerns univariate symmetric and nonsingular distri-
butions. While there are not necessarily symmetric n-optimal sets of centers (see
Example 5.2 and Remark 5.3), there are symmetric asymptotically n-optimal sets of
centers. More generally, we have:

7.3 C o r o l l a r y ( I n v a r i a n t d i s t r i b u t i o n s )
Let G be a finite group of bijective isometries on IRa. Suppose further P is G-
invariant, Pa ~ 0 and EIIXII ~+~ < co for some 5 > O. Suppose there exists A E B(1Ra)
such that {T(A) : T E G} is a P-packing in 1Ra and P ( ~J T(A)) = 1. Let n i =
TEG
n,(n) = [n/[G}] and a~ • Cm,~(P(.IA)). Then ( (.J T(a~))n> , is an asymptotically
TcG
n-optimal set of centers for P of order r.

Proof
For T E G we have
p = p T = pT + pT,

where P = Pa + P, denotes the Lebesgue decomposition of P with respect to Ad.


Since Ad is G-invariant, p T is absolutely continuous and p T is singular with respect
to Ad. So by the uniqueness of the Lebesgue decomposition, Pa is G-invariant. Let
P~ = hAa. Then h o T = h Ad-a.s., T E G. Therefore, Pr is also G-invariant. This
implies P~(A) = Pa(T(A)) > 0 and

Pr(A) = Pr(T(A)) = 1/IG[, T E G.


96 H. Asymptotic quantization for nonsingular probability distributions

Fhrthermore, since P(.IT(A)) = P(.IA) T, T • G, we obtain from Lemma 3.2. that

T(a,~) • C,~I,~(P(.IT(A))),T • C , n • IN.

The assertion now follows from Lemmas 7.1 and 7.2. []

7.4 Example (Sign-symmetric d i s t r i b u t i o n s )


Let X be sign-symmetric, that is, for every choice of signs ct -- + 1 , . . . , Cd ----±1, X --
( X t , . . . , Xd) has the same distribution as ( c t X ~ , . . . , cdX,~). The corresponding group
G satisfies IGI = 2 d and consists of isometrics if, for instance, the underlying norm
is the/p-norm, 1 _< p _< co. Suppose P ( { x • j~d : xi = 0}) = 0 for every i and let
A = [0, co)d. Then { T ( A ) : T • G} is a P-tesselation o f / R ~.

7.2 Empirical measures

If ( ~ ) ~ > t is an asymptotically n-optimal set of centers for P of order r and {An : a C


c~} denotes a Voronoi partition o f / R d with respect to an, then ( ~ P(A~,)da),~> t is
aEotr~
an asymptotically n-optimal quantizing measure of order r, that is

t~CO~ n

where Pr is the Lr-minimal metric defined in (3.3). In particular, ~ P(A~)Sa con-


fl*E t3tn
verges weakly to P.
On the other hand, c~n is asymptotically Pr-distributed in the sense that the empirical
measure of o~n converges weakly to Pr. This result which is suggested by Lemma 7.2
is the content of the following theorem. A remarkable property is that P~ does not
depend on the underlying norm.

7.5 T h e o r e m
Suppose P is absolutely continuous with respect to Ad and EI]XJl r+6 < co for some
5 > O. Let (o~,~),~>_t be an asymptotically n-optimal set of centers for P of order r.
Then
1
n co,
aC~n

Proof
The proof relies on Theorem 6.2 and the equality case of HSlder's inequality (cf.
Lemma 6.8). Since certainly lirn~_~oo Io~,~l/n = 1, we may assume without loss of
generality that Ic~nl = n for every n > 1. Let

1 Z(ia"
Izn= n
aEan
7. Asymptotically optimal quantizers 97

It suffices to prove t h a t the l i m i t i n g measure of any vaguely convergent subsequence


of (P~)n_>l coincides with P,. Suppose for a subsequence (also denoted by (p~))

#n -+ # vaguely

for some finite Borel measure ~ o n / R a. T h e n ~(/R d) _< 1. Consider a d - d i m e n s i o n a l


interval A = (b, c] w i t h b, c E / R a such t h a t #(OA) = O. By vague convergence

p~(A) --+ # ( A )

a n d hence

# ~ ( A ~) -+ 1 - # ( A ) .

A s s u m e 0 < P(A) < 1. Since P and Pr are m u t u a l l y absolutely continuous, this is


equivalent to 0 < Pr(A) < 1. W r i t e A~ = A, A2 = A c, si -- P(Ai), vl = #(At),
v2 = 1 - ~(A1), Pi = P ( ' I A i ) and ~i,~ = c~ A Ai. For 0 < ~ < mind=l,2 Pr(Ai), choose
b~,ci E /R d, b < b~ < c~ < c, b2 < b < c < c2 such t h a t the sets B1 = [bx,cl] a n d
B2 = [52, c2] c satisfy P(Bi) > 0 and

P , ( B ~ ) > P , ( A ~ ) - ~, i = 1, 2.

T h e n choose a finite set ~h (on the b o u n d a r y of Bi) so t h a t

min [Ix - el[ < i nf c [Ix - y[[ for every x e Bi, i = 1, 2.


aE~/i -- yEA i

Say h'~l -- k. T h e n we o b t a i n
2 P
f/ rain IIx - all" dPi(x)
m i n IIx - all" dP(x) = E s i j / aean
J aEan i=1
2

> E
--
s,
J
f a Emin
~nUTi
IIx - ~11" dP,(x)
i=1 Bi
2

= E si f min IIx - all~ dPz(~)


J aEai,nUTi
i=1 Bi
2
>- E siV'*'+k'r(P('lBi))f(Bi)/P(A~)'
i=1

where ni = np~(Ai) = I~i,,]. This implies vi > 0, i = 1, 2. If not, t h e n Q r ( P ) = co,


a contradiction. Using T h e o r e m 6.2 we deduce

Qr(P) = lim n ,/d [ min IIx - all r dR(x)


n~oo j aea.
2
>- ~ siv~'/aQ,(P('lBi))P(Bi)/P(A~) -
i----1
98 H. Asymptotic quantization for nonsingular probability distributions

We have

Qr(P(.IB~))P(Bi) = Q~(p)pr(Bi)(d+r)/~
>_Qr(P)(Pr(Ai) - e) (d+~)/d.

Since 0 < ~ _< mini=l,2 Pr(A~) is a r b i t r a r y and by Lemmas 6.8 and 7.2, one obtains

2
Qr(P) >- E s'v[r/dQ~(P)Pr(Ai)(d+r)/d/P(Ai)
i=l
2
= ~ s, Qr(Pi)v~ rid
/=1
2

>_~ siQ~(Pi)P~(Ai) -~/d = Or(P).


i=l

Using Lemmas 6.8 and 7.2 again, this yields vi = Pr(Ai), i -- 1, 2. Thus #(A) = Pr(A).
If P(A) = 0, then omit the first summands in the above considerations. One gets
v2 = P~(A2) = 1 and thus we have #(A) = Pr(A) = 0. If P(A) = 1, then omit the
second summands. One obtains vl = Pr(A1) --- 1.
Now we have p(A) = P~(A) for every (bounded) d-dimensionai interval A = (b, c]
with #(OA) = 0. This implies ~ = P~. []
C o m p u t a t i o n s of Pr can be found in Tables 7.1. and 7.2.

P p~
d-dimensional Normal
Nd(O, E ) Nd(O, - ~ E )
~ positive definite
Uniform
U(B) U(B)
B E B(IRa) bounded,Aa(B)> 0

Table 7.1: Probability distributions Pr


7. Asymptotically optimal quantizers 99

d
P (~P)r
1
Logistic
d
L(a) ® aL(a,
i
Double exponential
d
DE(a) N~ DE(~(~+O~
1
Double Gamma
d
Dr(a, b) ® Dr(~(~r), d--~-
~+~ J
1
Hyper-exponential
d
HE(a, b) N HE(a( lib, b)
1
Exponential
d a d+r
E (a ) ® E ( ~ ~-J~-~~2d)
1
Gamma
d
r(a, b) ® r(~(d+r
--~,
b~+r~
d ) ' d+r /
1
Weibull
d
W(a,b) ¢¢~p(~:d+r~llb
"o'~--',~L d I
b~+r b)
,-y~-,
1
Pareto
d
P(a,b),bd>r ® P ( a , bd-~
d÷r /
1

Table 7.2: Probability distributions ( ~ P ) r

7.3 Asymptotic optimality in one dimension

For univariate distributions, the necessary condition of Theorem 7.5 can be turned
around and used to construct asymptotically n-optimal sets of centers.
Let d -- 1 and let P -- hA such that I = (h > 0} is an open (possibly unbounded)
interval and h is continuous on I. Suppose EIXI r+a < oo for some 5 > 0. For
n E /N, let ai denote the ~ - q u a n t i l e of Pr, 1 < i < n, and let rn~ = (ai ÷ ai+i)/2,
100 H. Asymptotic quantization for nonsingular probability distributions

1 < i < n - 1. T h e n ai E I and

E min
l<__i<_n
I X - a i ] r~- f(al-z)~h(x)dx+ i=2 mi-1
(ai-x)rh(x)dz
--oQ

~-i t'mi
+ ~ j~ (x- ~,)~h(~)dx+
f~ o(~_ ~.)~h(~)~.

The m e a n value t h e o r e m yields


al al

I
-- (1 + r)2 r+lh(u')(ai - °4-1)~+I'

a~-i < m ~ - 1 < u i _ < a i , 2<i<n,

f (x - a,)rh(x) dx = h(vi+l) f (x - ai)r dx


04 ai
1
-- (1 + r)2 r+' h(v,+l)(a,+l - a,) ~+',

a~<vi+l<mi<ai+l,l<i<n-1,

1 2i- 1 2(i- 1 ) - 1
n 2n 2n
04

dx = hr(w,)(ai - ai-1),
= a4-1
f hr(~)
ai-1 < wi < ai, 2 < i < n.

This gives
~1 oo

E min IX -ail ~ - f ( a , - x)~fi(x) dx + f ( x - a , , ) ~ h ( x ) d x


--00 On*

+ (1 + ~)2~ -h~(w~)
- - - ~ ( ~ - ~-*)
i " h(v,) (ai - ai-1)].
+ ~ ~ h~(~,)~
i-=2
7. Asymptotically optimal quantizers 101

Note that a~, u~, vi, w~ depend on n. Under suitable assumptions on the density h, we
have

~l i- m
~~ h(yi)
h~(wi)r'(ai -- ai-1) =
/ - ( h~ dA = Ilhlll/(t+~),
i:2

yi C {ui, v~}, and the two remaining summands are of order o(n-~). One obtains

1
(7.7) lim nrE min IX - - Ilhlll/c,+r) = Qr(P).
~-~oo 1<~<~ (1 + r)2 ~

For the latter equation see Example 5.5. Thus ({a~,... , an)})~>, is asymptotically n-
optimal for P of order r. The same result holds for the --~f-quantiles of Pr, 1 < i < n.
Sufficient conditions for the above result to be valid can be found in Cambanis and
Gerr (1983) (with a gap in the proof), Linder (1991), PStzelberger and Felsenstein
(1994).

7.6 Example (Double exponential distribution)


Let P = DE(c). The '--~+l-quantiles of Pr = D E ( ( 1 + r)c) are given by

2i n+ i
ai = (1 + r)clog(~--~-), 1 < i < T '
n+l n+l
ai = (l + r)cl°g(2n + 2 - 2i)' 2 < i < n.

In case n odd and r --- 1, they coincide with the n-optimal set of centers for P of
order 1 (cf. Example 5.6).
The error of quantizers of the above type for various distributions is evaluated in
Tables 7.3 and 7.4. These values should be compared with the n-th quantization
error given in Tables 5.1 - 5.12.
102 H. Asymptotic quantization for nonsingular probability distributions

P\n 2 3 4 5 6 7 8
N(0,1) 0.5006 0.2466 0.1456 0.0960 0.0680 0.0506 0.0392
0.3661 0.1913 0.1180 0.0802 0.0581 0.0441 0.0346
L(~) 0.7332 0.3373 0.1978 0.1304 0.0925 0.0690 0.0535
0.4188 0.2319 0.1478 0.1026 0.0754 0.0578 0.0457
DE(,-~) 1.0826 0.3657 0.2418 0.1508 0.1112 0.0811 0.0641
0.5234 0.2648 0.1782 0.1200 0.0903 0.0682 0.0546
E(1) 0.4836 0.2223 0.1282 0.0834 0.0586 0.0434 0.0335
0.6112 0.2763 0.1551 0.0987 0.0680 0.0497 0.0378
r(~,2) 0.4625 0.2232 0.1311 0.0861 0.0609 0.0453 0.0350
0.4761 0.2269 0.1323 0.0866 0.0610 0.0453 0.0350
W(~,2) 0.4214 0.2040 0.1196 0.0784 0.0554 0.0411 0.0318
0.3896 0.1935 0.1151 0.0762 0.0541 0.0404 0.0313

Table 7.3: r -= 2, V=(P) = 1. Quantization error for ~ l - q u a n t i l e s (first line) and


i
,~7-t-quantiles (second line) of P2, 1 < i < n

P\n 2 3 4 5 6 7 8
N(0,2 ) 0.6512 0.4613 0.3565 0.2904 0.2450 0.2119 0.1866
0.5965 0.4279 0.3345 0.2748 0.2334 0.2029 0.1795
l
L(21-ET-~2) 0.7284 0.5151 0.3987 0.3253 0.2749 0.2380 0.2099
0.6226 0.4569 0.3620 0.3001 0.2564 0.2239 0.1988
DE(l) 0.8863 0.5556 0,4504 0.3600 0.3091 0.2653 0.2358
0.6998 0.5000 0.4063 0.3333 0.2879 0.2500 0.2232
E(lo-~) 0.6497 0.4459 0.3402 0.2752 0.2310 0.1991 0.1749
0.6890 0.4694 0.3553 0.2856 0.2387 0.2050 0.1796
F(a, 2) 0.6396 0.4476 0.3443 0.2797 0.2356 0.2035 0.1791
a---- 0.9508... 0.6351 0.4434 0.3409 0.2771 0.2335 0.2017 0.1776
W(a, 2) 0.6177 0.4324 0.3323 0.2698 0.2270 0.1960 0.1724
a - - 2.7027... 0.5990 0.4222 0.3260 0.2655 0.2240 0.1937 0.1706

Table 7.4: r -- 1, VI(P) = 1. Quantization error for ~nl-quantiles (first line) and
~-@f+l-quantiles (second line) of P1, 1 < i < n

7.4 Product quantizers

We will compare the asymptotic performance of n-optimal product quantizers with


n-optimal quantizers. The comparison is based on the relation between Qr(X) and
Q,-(Xi), 1 < i < d. For this, the following lemma is useful.
7. Asymptotically optimal quantizers 103

7.7 L e m m a
Let
d
8 = {(.,, ,vd) (0,co)d: IIv, 1}
i=1

and for si > 0 let


1/r
ti-- d si l<i<d.

j=l

Then the function


d

F : B--+ ~+, F ( v l ) . . . ,Vd) =ESiv~r


i=i

satisfies
d
F(tl,... te) = min F ( v l , . . . , ve) = d I I s~/~"
) (Vl ,... ,Vd)EB i=1

Proof
By the arithmetic-geometric mean inequality one obtains
d
[..-~ r\ l/d
F(Yi)'--)Yd) ~-- d t H s i v ; )
i:1
d

= F ( t l , . . . ,td)
for ( v l , . . . ,v~) E B. []
7.8 L e m m a
Let the underlying norm be the l~-norm with 1 <_ r < co. Suppose E[]X[] ~+~ < co
for some 6 > 0 and Q~(Xi) > 0 for I < i < d.
d
(a) Qr(X) < d H Qr(Xi) lid
i=-1
d
(b) I f t i = Q r ( X i ) i / r / H Qr(Xi) 1/rd, ni = [til~l/d] for n E z1N, ~i,n C Cni,r(ii) and
/=1
an = Xd=l/~i,~, then
d
lim nr/dS min HX - all ~ = d H Q~(Xi)Wd"
n--~O0 aEotn
i=l
104 II. Asymptotic quantization for nonsingular probability distributions

Proof
The choice of ti comes from L e m m a 7.7. We have
V,~,~(X) < E min IIX - all r
d d
----~-~ E min ]Xi - b]r = E Vm,~(Xi)
i=1 i=1

provided ni >_ 1 for every i. Since by Theorem 6.2

n~/dV~,,~(X,) = (nl/---~d~n[V~,~(X,) --> t~-~Q~(X,)


\ni ]
as n -4 co, one obtains

Q~(X) <_ lira nr/dE min ]IX - all r


~'OO aEO~n

i:1
d
:

i=[

[]

In view of Lemmas 7.7 and 7.8, the number


d
d 1-[ Q~(Xi) 1/d
(7.8) i:1
Q~(xI,..., Xd)
represents the "vector quantizer advantage" provided the underlying norm is the It-
norm and assumptions of L e m m a 7.8 are satisfied. For the d-asymptotics of the vector
quantizer advantage see Remark 9.6 (a).

7.9 Remark
Suppose EIIXII r+~ < co for some 6 > 0. Suppose further t h a t the one-dimensional
marginal distributions Pi of P are absolutely continuous with respect to A, 1 < i < d.
d
Let ni = hi(n) C 1N such t h a t I I ni <_ n, let (fli,n)n>l be an asymptotically ni-optimal
i=l
d
set of centers for Pi of order r and let c~n = Xi= 1 ~i,n. Then as n -+ co
d
1
aCan i=1

In fact, we have

aEan = bEfli,n
7. Asymptotically optimal quantizers 105

and by Theorem 7.5


1
P ,r, l < i < d .

Note that in case d > 2


d d

~i=l ~r i=l

unless P~ is an uniform distribution for every i.

Notes

The observation in Corollary 7.3 seems to be new. Theorem 7.5 extends considerably
a corresponding result of McClure (1975) for one-dimensional distributions with com-
pact support. See also the review article by McClure (1980). Rates of convergence
in Theorem 7.5 for some one-dimensional distributions P (Pareto, exponential and
power-function distributions) with respect to various local distances have been com-
puted by Fort and Pages (1999). A discussion of the vector quantizer advantage as
defined in (7.8) can be found in Lookabaugh and Gray (1989). Na and Neuhoff (1995)
provide a nice result about the asymptotic performance of suboptimal quantizers (like
product quantizers).
106 H. Asymptotic quantization for nonsingular probability distributions

8 Regular quantizers and quantization coefficients


As before, let X be a Rd-valued random variable with distribution P, let ]1 I] denote
any norm on R g, and let 1 _~ r < co. The r-th quantization coefficient of P as
defined in (6.4) consists of a geometric part Qr([0,1]) 6) which depends only on r and
dP,
the dimension d (and on the underlying norm) and a second part II~-~ IId/(d+~) which
is related to the distribution P. For instance, the r-th quantization coefficient of a
d-dimensional normal distribution N~(0, E) with positive definite covariance matrix
E is given by

Qr(Nd(O, E)) ----Qr([0, 1]d)(27r)r/2(~-~)(d+r)/2(det E)~/2d.

The constants Qr([0,1] d) are only known for d -- 1, d -- 2 (/l-norm,/2-norm), and in


"trivial" cases for d _> 3. It appears to be a rather challenging problem to determine
the values of these constants.
First, note that the scaling property of Vn,~(P) carries over to Q~(P).
8.1 L e m m a
Let T: R d -+ R d be a similarity transformation with scaling number c > O.

(a) /fP~ ~ 0 and EIIXII ~+~ < co for some ~ > 0, then
Q~(T(X)) = CQ~(X).

(b) I r A E B(R d) is bounded with A~(A) > 0, then


Q~(T(A)) = CQr(A).

Proof
Immediate consequence of L e m m a 3.2 (or of (6.4)). []
The following lemma shows that the largest quantization error among distributions
concentrated on a fixed bounded set appears (asymptotically) for the uniform distri-
bution.
8.2 L e m m a
Let A E B(R d) be bounded with Ad(A) > 0. Then
m a x ( Q r ( P ) : P(A) = 1, Pa ~ O} = Q~(A)
-- Qr([0, 1]d)&~(A)r/d.

Proof
Immediate consequence of HSlder's inequality. []
Since one may not expect to be able to find the precise values of Qr([0,1] d) for all
dimensions d (and all norms), it is of great interest to find bounds. These bounds
immediately yield bounds for Qr(P).
8. Regular quantizers and quantization coefficients 107

8.1 Ball lower bound

The following lower bound for Q~([0,1] d) indicates that the members of an n-optimal
partition for a uniform distribution tend to look like a ball. For the/2-norm it is due
to Zador (1963, 1982) and for arbitrary norms this bound is contained in Yamada et
al. (1980).

8.3 P r o p o s i t i o n

Qr([0,1] d) _> M~(B(0, 1))


d
(d + 1))

Proof
By (6.2) and Theorem 4.16

Q~([O, 1]4) = inf nr/dM,~,r([O, 1]d)


n>l
>_ M~(B(O, 1)).

For the formula for Mr(B(0,1)) see Lemma 2.9 []

8.2 Space-filling figures, regular quantizers and upper bounds

For low dimensions good upper bounds for Qr([0,1] d) can be obtained by the nor-
malized r-th moment of space-filling figures in R d. Here a set A C R d is called
space-filling if A is compact with Ad(A) > 0 and there is a countable family T of
bijective isometries on R d such that {T(A) : T E T} is a tesselation o f ~ d. This notion
depends on the underlying norm. In case the isometries T c T can be chosen of the
form T(x) = x + t, t E R d, then A is called space-filling b y translation. Note that
if S: R d --+ Rd is a similarity transformation and A is space-filling (by translation),
then S(A) is also space-filling (by translation).
Let us mention some properties of space-filling sets.

8.4 L e m m a
Let A C R d be space-filling and let T be a correponding family of bijective isometries.
Choose a • Cr(U(A)) and let aA = ( T ( a ) : T • T}.

(a) ~T(A) : T C T~ is a locally finite tesselation o f R d.

(b) aA is locally finite.

(c) Ad(OA) = 0 and int(A) ~ 0.


108 H. Asymptotic quantization for nonsingular probability distributions

Proof
Set u -- diam(A).
(a) Let I = {T • T : T(A) N B(O, s) ~ O} for some s > O. Then

U T ( A ) C B(O,s+u).
TEl

Therefore
IliAd(A) = ~ Ad(T(A)) = Ad(U T(A))
TEI TEI

< Aa(B(O, s + u)) < oo


which implies ]I] < c~.
(b) Let s > 0 and let I = {T C 7 - : T(A) NB(O,s+u) ¢ 9 } . By (a), I is finite.
Let T C T \ I. Then IlY[] > s + u for every y • T(A). Since T(a) • Cr(U(T(A)))
by Lemma 2.1, it follows from Lemma 2.6 (b) that there exists x • T(A) such that
]ix - T(a)]] < u. This gives

IlT(a)ll _> I I x [ I - I l z - T(a)]l > s + u - u = s.


Hence {T • 7" : T(a) • B(O, s)} C I. This implies that aAM B(0, s) is finite.
(c) Let S • T. Then {S-1T(A) : T • 7-} is a tesselation of R a. We have

Od C U (S-1T(A) MA)
T~T
TC,S

and thus

Ad(OA) <_ ~ Ad(S-'T(A) M A) = O.


TET
TC~S

This implies int(A) ¢ 0. []


Quantizers for U([0,1] d) can be built with space-filling sets A as follows. For n C / N ,
consider a rescaled version cA of A, c = c(n) > 0, such that Ad(cA) = 1/n. Observe
that {cT(A) : T E T} is a tesselation of R d consisting of isometric copies of cA. In
fact, T = L + T(0), where L = T - T(0) is a bijective isometry with L(0) = 0 and
hence, L is linear. The bijective isometry :~ = L+cT(O) then satisfies T(cA) = cT(A).
Let
I = I(n) = {T E T : cT(A) c [0, 1]d}
and m = re(n) = tII. Then m _< n. We thus obtain a tesselation of the cube
[0, 1]d consisting of m isometric copies cT(A), T E I, of cA and a region D near the
boundary,

D = P(n) = [0, 1]d \ U cT(A);


TEI
8. Regular quantizers and quantization coemcients 109

see Figure 8.1. Choose a • C,.(U(A)). Define a r e g u l a r n - q u a n t i z e r f,~,A with


respect to A by

(8.1) fn,A = E cT(a)lBru(( U "r)¢nBr) '


TEl Tet

provided I ¢ ~, where {BT : T E I} is a Borel measurable partition of U cT(A)


TCI
with BT C eT(A) for every T C I and {/~T : T E I } is a Voronoi partition of R d with
respect to (cT(a) : T • I}.

Figure 8.1: Tesselation of [0,1] 2 into m = 6 regular hexagons and a boundary region,
n = 10

8.5 T h e o r e m
Let A C R ~ be space-filling. Then

lim
7/,--~00
f
nr/'~ Itx
J
- A,A(X)II"dU([O, 1]d)(x) ---- Mr(A).

In particular

Qr([0,1] d) < Mr(A).


110 H. Asymptotic quantization for nonsingular probability distributions

Proof
By Lemma 2.1, cT(a) E C~(U(cT(A))) and Mr(cT(A)) = M~(A). Therefore

f llx- fn, A (~)11~ dU([O, 1141(x)

-- ~ f [Ix-cT(a)ll~ dx + fminTezI x - - c T ( a ) l l ~ dx
TCIcT(A) D

= n -~/4 mM~(A) + f min llx cT(a))lr dx


-

r~ d TEl
D

= n -~/4 ~ M ~ ( A ) + f minllx
g TCI
- cT(a)ll~dx.
D

Since A is space-filling, we have A~(D) = A~(D(n)) = 1 - m(n)/n -~ O. Let u denote


the diameter of A. Then cu is equal to the diameter of cT(A) and

cu = c(n)u = n-1/dA4(d)-l/du = O(n-t/d).

There exists a constant 3' > 1 such that for every x E [0,1] 4 and sufficiently small
s > 0 one can find y C B(x, 7s) satisfying

B(y, s) C B(x, 7s) M [0, 1]a.


This is clearly true for t h e / ~ - n o r m and hence for any norm. For x ff D and n large
we deduce

B(x, Tcu) A ( U cT(A) ) ~ 0.


TEI

Otherwise, choose y E B(x, 7cu) such that B(y, cu) C B(x, 7cu) M [0, 1]d and then
choose S C T such that y E cS(A). One obtains

cS(A) C B(y, cu) C D,


a contradiction. In view of Lemma 2.6 (b) it follows that

min Ix - cT(a)I ] <_ (7+ 1)cu, n large.


TEl

Therefore

fDmin Hx -- cT(a)tl T dx < ((7 + 1)cu)rAd(D) = o(n-T/4) .


TEl

This implies the assertion. []

Let the r - t h r e g u l a r q u a n t i z a t i o n coefficient of [0, 1]d be defined by

(8.2) Q(~R)([0, 1]d) = inf{Mr(A): A C R d space-filling}.


8. Regular quantizers and quantization coet~cients 111

By the preceding proposition, we get

(8.3) Qr([0,1] d) _< Q(R)([0, 1]~).

A basic question is whether equality holds in (8.3). The regular quantizer problem
consists in finding a space-filler A c R d such that Q(R)([0,1]d) ---- Mr(A). Both
problems are unsolved for d > 3.
One technique for obtaining upper bounds is to select space-fillers in higher dimen-
sional spaces by forming products of two (or more) lower dimensional space-fillers.

8.6 L e m m a
Let the underlying norm be the It-norm, 1 <_ r < oo. Let A c R d and B C R k be
space-filling. Then A x B C R d+k is space-filling and

Mr(A x B) = Mr(A)A~(A)r/d + Mr(B)Ak(B)r/k


A~(A)r/(d+k)Ak(B)r/(d+k)

Proof
Clearly A × B is space-filling in R a+k. Furthermore,

V~(U(A x B)) = V~(U(A)) + V~(U(B))


= Mr(A)Ad(A)r/d + Mr(B)Ak(B) r/k.

This implies the assertion. []

8.3 Lattice quantizers


The Voronoi regions of lattices in R ~ provide an interesting class of regular quantizers.
A (d-dimensional) lattice in R d is a locally finite additive subgroup of R d which spans
R ~. Equivalently, a lattice is a subset of the form A -- Zyl + . . . +Zyd for some (vector
space) basis {Yl,-.- , Yd} of R~; such a basis is called basis of A. The volume of a
f d
fundamental parallelotope ~ tiyi:O _< ti _<1 for l < i < d ~ of a lattice A does not
depend on the choice of the basis { y l , . . . , Y~} of A and is denoted by det(A). Then
det(A) is the absolute value of the determinant of the matrix with rows Yl,... , Yd.
Fundamental parallelotopes of A are space-filling by translation with A as set of
translation vectors and any space-filler A C Nd of this type satisfies Ad(A) = det(A).
Now consider the Voronoi diagram {W(a[A) : a E A} of a lattice A c R d. Obviously

W(alA) = W(0IA) + a, a • i .

If the Voronoi diagram of A is a tesselation of R d we say that A is admissible. For


strictly convex norms every lattice is admissible (see (1.7) and Theorem 1.5).
112 II. Asymptotic quantization for nonsingular probability distributions

8.7 Lemma
Let A C R ~ be a lattice.

(a) W(0[A) is compact and Ad(W(0IA)) _> det(A).


(b) A is admissible if and only if Ad(W(O[A) ) = det(A) and Ad(OW(OIA)) = O.

Proof
(a) Choose s > 0 such that B(0, s) contains a fundamental parallelotope of A. Then
{B(a, s) : a E A} is a covering of R a. This implies W(01A ) C B(0, s) and hence,
W(0IA) is compact. Furthermore, if B denotes a fundamental parallelotope of A,
then
= n ( B + a))
aEA

= - a) n B )
aEA
> Ad(B) = det(h),
where W = W(0IA ). Here the inequality follows from the fact that the Voronoi
diagram of A is a covering if Rd; see Proposition 1.1.
(b) If A is admissible, then in view of (a), W(01A ) is space-filling by translations
with A as set of translation vectors. This implies Ad(W(01A)) = det(A). If A is not
admissible and Ad(OW(O]A)) = 0, then

int W(all A) N int W(a2]A) ¢ 0

for some al,a2 E A, al ~ a2. Hence, there exist xl,x2 c int W(0IA), Xl ¢ x2, such
that Xl - x2 C A. Let b = xl - x2. Choose c > 0 such that B(xi, c) C W(0IA) and
B ( x , , ~) n B(x2, c) = 0. Set A = W(0IA) \ B(xl, ~). Then
B ( x l , ~) = B(x2, c) -t- b C A + b
which yields

W(01A ) C A O (A + b).
This implies that {A + a : a E A} is a covering of Rd, hence Ad(A) _> det(A). We
obtain

Ad(W(0IA)) > Ad(A) _> det(A).


[]
It is remarkable that by part (b) of the preceding lemma, the volume of W(01A ) does
not depend on the underlying norm as long as admissibility holds.
There are lattices which are not admissible. This is illustrated by the following
example.
8. Regular quantizers and quantization coefficients 113

8.8 Exaraple
Let the underlying norm on R 2 be the/t-norm and let A = Z ( - 1 , 1) + Z(4, 0). Then
A={a•Z 2:al+a2•4Z}, det(A)=4,

W(01A) = (x e R2: ]xll + 121 -~ 1}


u ([-2,0] 2 n {x • R~: xl + x : > - 2 } )
u ([0, 2]: n {x • R2: zt + x~ < 2})

and A2(W(0IA)) = 5; see Figure 8.2. It follows from Lemma 8.7 that A is not
admissible (for the/1-norm).

• ~ • Xl

Figure 8.2: Voronoi region W(0IA) with respect to the/1-norm for a nonadmissible
lattice

As concerns the convexity of W(01A ) one can modify Remark 1.9 as follows: if W(0IA )
is convex for every lattice A C R d, then the underlying norm is euclidean (cf. Gruber,
1974, Theorem 2).
If A c R d is an admissible lattice, then we know from Lemma 8.7 that the Voronoi
region W(0IA) is space-filling by translation with A as set of translation vectors.
Thus Therorem 8.5 applies to the n-quantizer f,,,h -- f~,w(olh) for U([0, lid). Note
that W(0IA) is symmetric (about the origin) and hence

fw(0,A) Ilxll r dx
(8.4) Mr(W(01A)) = det(A)(d+,)/a ;

cf. Example 2.3. For n E / N , let

o~n,h = {ca : a E A, c W ( a l A ) c [0, 1]d},


(8.5)
c = c(n) = (n det(A)) -1/d.
114 H. Asymptotic quantization for nonsingular probability distributions

8.9 T h e o r e m
Let A C N e be an admissible lattice and let X be U([0, 1]e)-distributed. Then

lim n r/a
n--+O0
f
J
min ]Ix - bllrdU([O, 1]d)(x) = Mr(W(01A))
bean, A

and

n '/a min IIX - bll ~ ~ ~ n ~ ~,


bcan,A

where # is a probability on R+ with distribution function

F,(t) = Aa((det(A)-'/dw(OIA)) N B(O, t)).

In particular

Q~([0,1]4) _/,(W(01A)).
Proof
We have

E min I[X - bll ~ = E I I X - f,~,AIl"


bE~n,h

(when f,~,his defined in (8.1) with respect to the center a = 0). So the first assertion
follows from Theorem 8.5. N o w observe that a~,h does not depend on r and

Mr(W(Olh)) = d#(z).

Since supp(#) is compact, the convergence of moments,

lira nUdE min IIX - b]l~ -- f z r d # ( z ) for every 1 < r < OO,

implies the desired distributional convergence (cf. Hoffmann-Jorgensen, 1994, 5.13).


[]
Let the r - t h l a t t i c e q u a n t i z a t i o n coefficient of [0, 1]4 be defined by

(8.6) Q(rL) ([0,1] d) -- inf{Mr(W(0]A)) : A C R a admissible lattice}.

Then

(8.7) Qr([0,1] d) _< Q(R)([0, 1]d) _< Q!L)([O, 114).

The lattice quantizer problem consists in finding an admissible lattice A such that
@L)([0, 1]4) = Mr(W(01A)).
8. Regular quantizers and quantization coefficients 115

8.10 R e m a r k
(a) Suppose A C R d is space-filling by translation, where the corresponding set of
translation vectors is an admissible lattice A. Then

Mr(W(0IA)) _< Mr(A).

In fact, since

E min
bE~n,h
IIX - bll r _< EIIX - A,AIL

where X is U([0, 1]d)-distributed, the above inequality follows from Theorems 8.5 and
8.9.
(b) Suppose A C R d is a convex space-filler by translation. Then A is a centrally
symmetric polytope (i. e. - ( A - x) = A - x for some x E R d) and admits as set of
translation vectors a lattice A (cf. McMullen, 1980). By (a) we have

Mr(W(0IA)) __ Mr(A)

provided A is admissible. Thus we obtain

Q!L)([0, 1]a) = inf{Mr(A) : A C R d space-filling polytope by translation}

for euclidean norms.


(c) Suppose the ball B(0, 1) is space-filling. Then the ball is obviously space-filling
by translation. By (b), there exists a lattice A as set of translation vectors. Then
W(0IA ) = B(0, 1) and hence A is admissible. In view of Proposition 8.3 we obtain

Mr(B(O, 1)) = Qr([0,1] d)


= Q~n)([0,1] 4) = Q!L)([O,1]d).

Conversely, if Mr (B (0, 1)) ----Q(L)([0, 1]d) holds (for some r) and if the lattice quantizer
problem has a solution A, then B(0, 1) is space-filling. To see this, choose s > 0 such
that Ad(B(O, s)) = det(A). By (the proof of) Lemma 2.9, we have B(0, s) C W(01A ).
To verify the converse inclusion, assume that there exists x E W(0, A) with s <
[[x[[. Since the distance function d(., A) is continuous on R d, one obtains B(x, ~) C
( U B(a, s)) c for some ~ > 0. Choose a e A such that Ad(B(x,¢) M W(a[A)) > 0.
aEA
Then
Ad(B(O, s) ) < Aa(B(a, s) ) + Ad(B(x, ~) n W(alA))
_< Ad(W(aIA)) = det(A),

a contradiction. Hence B(0, s) = W(01A ) and so B(0, 1) is space-filling.

The lattices in the following examples are related to optimality results.


116 H. Asymptoticquantizationfor nonsingularprobabilitydistributions
8.11 E x a m p l e ( S t a n d a r d l a t t i c e Z a)
Let A = Z d and let the underlying norm be the/p-norm, 1 _< p _< oo. Then det(A) = 1
and W(01A) = t - : ,:[ ~J:]d" In particular, A is admissible. For computations of the
normalized r - t h moments of W(0[A) see Example 4.17. In case p = :x~, the limiting
measure # in Theorem 8.9 is given by

F,(t) = (2t) a, 0 < t < 1/2.

For d = 1, one obtains # = U([0, :]).1

8.12 E x a m p l e ( H e x a g o n a l l a t t i c e i n R 2)
Let d -- 2 and let A = Z(1, 0) + Z ( 1 / 2 , v ~ / 2 ) . Here we have det(A) = v/3/2. If the
underlying norm is t h e / 2 - n o r m , then W(01A ) is a regular hexagon,

W(01A) = {x e R=: Ix:l <_ 1/2, Ix, I + ~ l x = l _< 1}


1
=cony{±( 1, 1,+(0, ~), +(:,1 2~1};
1

see Figure 8.3.

::~i~iiiil~i~
• O I O • •

Figure 8.3: Voronoi region W(0{A) with respect to the /2-norm for the hexagonal
lattice

We have
8- 2r/2 I/2 (1-=:)lv"~
M , ( W ( 0 1 A ) ) - 3(2+,)/, / / (x~ +x~)'/2dx2dx:.
0 0
In case r = 1 and r = 2 one obtains

3 log(V"-3) = 0.37771...
M:(W(01A)) - 2 + 37/4~

and
5
M2(W(O[A)) - - - -- 0.1603 ....
18v~
8. Regular quantizers and quantization coefllcients 117
If the underlying norm is the/t-norm, then W(0IA) is the (nonregular) hexagon

W(01A ) --- {x • R2: Ixll < 1/2, Ix, I + Ix2[ _< (1 + Vr3)/4}

Since A2(W(01A)) -- v/'3/2, A is admissible by Lemma 8.7.

8.13 Example (Lattices Dd)


d
Let A = {a • Z ~ : ~ ai even}. This lattice is usually called of type Dd or checker-
i=1
board lattice. Here we have det(A) = 2. If the underlying norm is the/2-norm, we
obtain

W(01h) = [ - 1 , 1}, d = 1,
W(01h) = (z • R": ~ I~1 < i for every
iEI
I C {1,... ,d},lil = 2}, d > 2.

In case d = 2, W(0[A) is the unit/1-ball and in dimension d = 3, W(0tA ) is a rhombic


dodecahedron. Let r = 2 and write W = W(01A ). Since W is invariant under
permutations of the coordinates, one gets

f llxll2dx = d/x~dx
W W
1

=a/~a~-~(W~,)dxl,
-1

where Wxl denotes the xl-section of W,

W~ = {y • R d - t : y~. lYil -< 1 for every I C {1,... , d - 1},


iEI

111=2, l Y i l ~ l - l x , [ f o r l < i < : d - 1 } , IxlI<l.

Using

Ad-I(W~) : 2 - 2d-llXl[ d-l, [x1[ ~ i ,


I
~d-~(w~l) = 2~-~(1 - lx~L)d-~, ~ < tztt < 1

we deduce
d 1
Ilxll2dx = ~ + d--~?
W
118 H. Asymptotic quantization for nonsingular probability distributions

This implies the formula


1_3__( d 1
M2(W(0]A)) = 22/~ 12 + 12(d
- - -+- ~ )' d > 1

(cf. Conway and Sloane, 1993, p. 462). In particular


3
M2(W(0IA)) = 41/38 = 0 . 2 3 6 2 . . . , d = 3,

13
M2(W(0IA)) = ~ = 0 . 3 0 6 4 . . . , d = 4.

If the underlying norm is the /1-norm, then W(0IA ) coincides with the above 12-
Voronoi region and A is thus admissible. Here we obtain for r = 1
1 d 1
MI(W(0]A)) = 2-~/d( ~ + )---------~)'
1 2(d+ d _> 1.

In particular,

MI(W(0IA)) = 21/3----
~ = 0 . 6 9 4 4 . . . , d = 3.

8.14 E x a m p l e ( D u a l l a t t i c e s D~)
The dual lattice of the lattice Da is defined by

D*a = { x C R d : E aixi E Z f o r every a e Dd •


i-=l

Then

D*a = zd + z ( 1 , . . . ,21-)

and

2D~ = (2Z) d U (2Z + 1) d

Note that 2D~ = Z and 2D~ = D2. Let A = 2D~. Then det(A) = 2d/det(Dd) = 2 a-1
If the underlying norm is the/2-norm, it is not difficult to verify that

W(0IA) = x e Rd: ~ [xd < n [-1, 11d.


i=1

For d = 3, this Voronoi region is a truncated octahedron; see Figure 8.4. It is more
difficult to compute the normalized second moment of W(01A). We obtain
19
M2(W(0IA)) - 21/36------
~ - 0.2356... , d = 3.
8. Regular quantizers and quantization coefficients 119

Figure 8.4: Truncated octahedron

Note t h a t this moment is slightly smaller t h a n M2(W(OID3)). For d = 4, A and 394


are similar. In fact, the similarity transformation

T : R 4 - ~ R 4, T(x) = ( x l + z 2 , z l - x 2 , ~3 + x~, z 3 - ~)
with scaling factor yr2 satisfies T(D4) = A. Therefore, W(01A ) = TW(OID4) which
yields
13
M2(W(01A)) = M2(W(OID4)) = 21/23------
6 = 0.3064..., d = 4

A general formula for M2(W(01A)) can be found in Conway and Sloane, 1993, pp.
470-471. The above upper bounds for Qr([0, lid), d = 3, 4, are close to the ball lower
bounds given in Proposition 8.3. We have

( 3 ~ 2/33_ = 0.2309 , d = 3,
M2(B(O, 1)) = \47r/ 5 "'"

M2(B(0, 1)) - 23---~-


v ~ _ 0.3001 " " ' d = 4.

If the underlying norm is the /1-norm, the Voronoi region generated by 0 does not
change and so A is admissible. For d = 3 and r = 1, we obtain
35
MI(W(0[A)) = 41/33------
~ -- 0.6890..., d -- 3.

This moment is smaller t h a n MI(W(O[D3)). The corresponding ball lower bound is


given by

M I ( B ( 0 , 1 ) ) (=3 ) 4/3 = 0.6814... , d = 3.


120 H. Asymptotic quantization for nonsingular probability distributions

If the underlying norm is the/2-norm, Qr([0,1] d) is only known for d = 1 and d =


2. We will see below that for d -- 2, Q~([0,1] 2) = M~(W(0IA)) holds, where A is
the hexagonal lattice described in Example 8.12. In particular, A solves the lattice
quantizer problem. For dimension d --- 3, the Example 8.14 shows that the normalized
second moment of the lattice D] (truncated octahedron) is very close to the ball lower
bound. It is known that D~ is a solution of the lattice quantizer problem for r = 2
(cf. Barnes and Sloane, 1983), so

(8.8) Q~L)([0,1] a) ----M2(W(O[D~)) - 1___~9_ 0.2356.. /2-norm.


21/364 ",
For d _> 4, solutions of the lattice quantizer problem are not known. Conway and
Sloane (1993) give a comprehensive survey of the best known lattice quantizers for
r = 2 among them D~ (or D4) and D~. For recent improvements see Agrell and
Eriksson (1998). (Note that these authors present the value of M2(A)/d for A C Rd.)
If the underlying norm is t h e / l - n o r m , the Example 8.14 shows that the normalized
first moment of D~ is close to the ball lower bound and hence, D] provides a good
quantizer for U([0, 1]a) in case r = 1. However, optimality results are not known
for d > 3. A trivial case occurs for d = 2, where the ball B(0,1) -- W(0[D2) is
space-filling. Therefore
2
(8.9) Q~([0,1] 2) = M~(B(O, 1)) - (2 + r)2r/~' /1-norm.

A further trivial case concerns the loo-norm. Here B(0, 1) = W(01Z d) is again space-
filling and so

d
(8.10) Qr([0,1] d) = Mr(B(0, 1)) - (d + r)2 ~' l~-norm

(cf. Example 4.17).


The following result is due to Fejes T6th (1959, 1972).

8.15 T h e o r e m
Let d = 2 and suppose the underlying norm is the 12-norm. Let A be the hexagonal
lattice. Then
Qr([0,1] 2) = M~(W(0[A)).

Proof
Set A -- W(01A ) and recall that A is a regular hexagon. For every n E / 5 / a n d every
a C A with la] -- n, we have

/ ma~n ][x - allr dx >- n f I]x[rdx


A n-1/2A
8. Regular quantizers and quantization coe~cients 121

(cf. Fejes T6th, 1972, p. 81). Let a C Cn,r(U(A)). It follows from Theorem 4.1 and
Lemma 2.6 (a) that (~ C A. Therefore

det(A)Vn,r(V(A)) = / minaeaIIx -- allrdx


A

>n f Ilxllrdz
n-ll2A
: n-rl2Mr(A) det(h)( 2+r)/2
which yields
nr/2V~,r(U(A)) >_Mr(A) det(A) r/2, n c/hr.
This implies
Qr(A) >_Mr(A)det(A) r/2
and hence
Q,([0,1]``) _> Mr(A).
This together with Theorem 8.9 gives the assertion. []
8.16 R e m a r k
The above results allow to prove by a quantization argument that a~,h is uniformly
distributed in [0, 1]`` for every admissible lattice A C R`` with convex Voronoi region
W(01A ) in the sense that
1
I'~,~,AI ~ ~b~ u([0, i]``) as ~ -+ ~.
bEan,A

First, observe that A : W(0tA ) is the unit ball of some norm [[ [[o- Then forget
the underlying norm which was only used to form the Voronoi region W(0[A) and
through this (~,A and proceed with the norm II II0. The Voronoi region W(01A , I] II0)
with respect to II II0 coincides with A. Therefore, by Theorem 8.9 and Proposition
8.3

lim
7t--}~X}
nrl``f
j
min
bEo~, A
IIx - blrodU([O, 1]'~)(x) = Mr(A, II Iio) = Q,.([o, 1]", II Iio).
The assertion now follows from Theorem 7.5.

8.4 Quantization coefficients of one-dimensional distribu-


tions

In Tables 8.1-8.3 one can find the quantization coefficients


l+r
1 ( f ( d P ~ ll(l+r) )
Qr(P) = (1 + r)2 r \ dA) dA
122 II. Asymptotic quantization for nonsingular probability distributions

of several univariate absolutely continuous distributions. As an illustration, the Figure


8.5 shows the densities of three hyper-exponential distributions with variance equal
to one and small, moderate and large second quantization coefficient, respectively.

Figure 8.5: Densities of hyper-exponential distributions P = H(a, b) with variance


equal to one and Q2(P) = 1.8470 (top), Q2(P) = 3.3106 (center), Qz(P) ~ 8.1000
(bottom).
8. Regular quantizers and quantization coefficients 123

P Q~(P)
Normal
N(0, a 2) ( ~ ) r / a ( 1 + r) (~-')/2
Logistic
p[ I X2+2r
L(a)
'2"" (I + r)p(:_::)'+~
Double Exponential
DE(a) (a(1 + r)) r
Double Gamma
p/b+r~l+r
Dr(a, b) ar(l+r)b+r-1 -~V~J
p(b)
Hyper-exponential
HE(a,b) (~)~(1 + r) O+r-b)/b
Uniform
1 .b-a, r
U([~,b]) 1+~ ( - ~ )
Triangular
2 +T)(b--
T(a, b; c) (2+r)'+'((1 2 a))~
Exponential
E(a)
Gamma

r(a, b) (a)r(1 + r)~r-, -,l+r,


r(b)
Weibul!
(a(l+r)Ub)rp. b-t-r 1+~
w(a,b)
Pareto
b a 1 + r).r
P(a,b),b > r b-r (::b-~)

Table 8.1: Quantization coefficients


124 H. Asymptotic quantization for nonsingular probability distributions

P Q2(P)

N(0, 1)
V•- -- 2.7206...
2
L(v/'3/~r) F(~)~ - 3.7709...
47r2r(~) 3
9
DE(1/x/~) - -- 4.5
2
3b+ip(b+2~3
Dr(a, b) --~ 3 J
r(b + 21
a 2 - _
1
_
range: (0,3F(2) 3) ----(0, 7.4488...)
b(1 + b)
HE(a, b)
a~_ r(~) r(~)~ 3(~-b)/b
r(~)
range: (1, oo)
U([a, b]), 1
b = a + 2x/3
T(a, b; ~),
27
b=a+2v~ 1.6875 - -

16
9
E(1) 2.25
- -

4
3b+l r(-r)
bd-2 3
r(~, b)
4r(b + 1)
a2 _ 1
range:
( 3r(~)
~- ~ , ~ . 1 = (1.8622...,2.7206...)
W(a,b)
a 2
1 91/br(~) 3
r(~)- r(~F 452[r(~_ _ r(~_~)2]
range: [2.1555, c~)
9b-1 2
P(a, b), b > 2,
a2 = ( b - 2)(b- 1) 2
range: (9, c<))
b

Table 8.2: r = 2. Quantization coefficients of distributions P with V2(P) = 1


8. Regular quantizers and quantization coet~cients 125

P QI(P)
7[
N(O, ~) - = 1.5707...
2
L ( ~ )1 - 1.7798...
8 log 2
DE(l) 2
Dr(a, b)
2br( )
r(b + 1)
1
range: (0,r)
b
HE(a,b)
br( )
r(~)
a = p(}) range : (1, co)

v([a, hi), 1
b=a+4
4
T(a, b; a+b~
2 Y'
- ----1.3333...
3
b=a+6

E 1 1
---- 1.4426...
log 2
P(a, b), b > 1,
b-1
a - b(21/b _ 1) (b--1)(21/b--1)
1
range: ( ~ , c o )

Table 8.3: r -- 1. Quantization coefficients of distributions P with VI(P) = 1

Notes

The issue of space-filling sets in the quantization setting was raised by Gersho
(1979) for the /2-norm. Expositions concerning tesselations (tilings) can be found
in Griinbanm and Shephard (1986) and Schulte (1993). For a background on lattices,
we refer to the books of Cassels (1959) and Gruber and Lekkerkerker (1987).
Gersho (1979) contains upper bounds for Qr([0,1] ~) of the type (8.3) for the/2-norm.
Theorems 8.5 and 8.9 provide a rigorous derivation. The obervation in Theorem 8.9
concerning the distributional convergence seems to be new. Different proofs for the
Hexagon-Theorem 8.15 can be found in Newman (1982) (for r -- 2), Wong (1982)
(also for r = 2) and Haimovich and Magnati (1988). Discussions and applications
126 II. Asymptotic quantization for nonsingular probabifity distributions

of this theorem axe contained in Bollob£s (1972,1973). The quantization coefficient


QI([0,1] d) appears in upper bounds for limiting constants in the euclidean traveling
salesman problem; see Goddyn (1990).
According to Theorems 8.9 and 8.15, the hexagon quantizer is asymptotically n -
optimal for P = U([0, 1]2) of every order r. This result can be extended to bivariate
nonuniform distributions P with a continuous density using a piecewise hexagon
quantizer depending on P and r in the spirit of L e m m a 7.2 (but now with m -+ co,
m/n -+ 0 and the ni-optimal quantizer for P(.IAi) replaced by a hexagon quantizer).
See McClure (1980, p. 197) for r = 2 with an unpublished proof. Su (1997) showed
that this design yields an asymptotically n - o p t i m a l quantizer for every r.

8.17 C o n j e c t u r e
Qr([0,1] a) -- Mr(W(OID~) ) for every r E [1, co) when the underlying norm on R 3 is
t h e / 2 - n o r m (cf. Example 8.14, (8.8), and Remark 10.11(c)).
9. Random quantizers and quantization coefficients 127

9 Random quantizers and quantization coefficients

In this section we determine the asymptotics of a stochastic version of the quantization


problem and derive further upper bounds and the d-asymptotics for the quantization
coefficients.

9.1 Asymptotics for random quantizers

Let X be a Rd-valued random variable with distribution P, let [[ [[ denote any norm
onR d,andlet l<r<c~.

9.1 T h e o r e m
Suppose P = Pa. Let Y1, Y2, . . . be i.i.d. Rd-valued random variables with distribution
Q independent of X and let g = dQa/dA d.

(a) Assume P(g > 0) -- 1. Then

l_<i_<n

where u in a scale mixture of Weibu11 distributions with distribution function

{01 t_<O
F~(t) = ' - f exp(-Ad(B(O, 1))g(x)d) dP(x), t > O.

If additionally E[[X - YI[[r < co, then f g-r/~ dP < oo and

lim nr/dE min ,,X- Yi,{r = F(2, + r ) M r ( B ( 0 , 1 ) ) J f g-r/d dP


u~oo t <_i<n a

if and only if (~r/~mm,_<,_<~



IIX - y ~,{Fr )~>1 is uniformly integ~ble.
- -

(b) Assume P(g = O) > O. Then the sequence (nl/gminl<~<n I[X - Yi[[)n>l is
stochasticaIIy unbounded.

Proof
(a) Let t > 0. For x • R ~, we have

Jl~(TI,lid min I1~: - Y~II < t) = 1 - [1 - Q(B(x, tn-1/d))] ~.


i<i_<n

By differentation of measures

nQ(S(x, tn-i/~)) Q(B(x, tn-1/d))


= -+ g(~)
tdAd(B(O, 1)) Ad(B(x, tn-V~))
128 H. Asymptotic quantization for nonsingular probability distributions

as n -+ oo for Ad-almost all x and hence for P-almost all x C R d (cf. Chatterji, 1973,
Chapitre V). Therefore, by Lebesgue's dominated convergence theorem

1P(nWd min []X - Yi[[ _~ t) = f~'(n ~/d min [Ix - Yd] < t) dP(x) -+ F~(t).
l<_i<_n J l<_i<n --
Obviously, IP(nl/dminl<i<n ]IX - Yi]] -- 0) = 0 -- F.(0). Since P ( g > 0) = 1, F , is a
distribution function and we obtain
n 1/a min [[X - Yi[[ D v.
t<i<n
This, together with

f,r r(1 + 1))-r/dfs/"eP


= F(2 + d)Mr(B(O, 1)) / g-r~, dP
implies the second assertion (cf. Hoffmann-Jcrgensen, 1994, 5.2).
(b) As above (but now F~ is not a distribution function)
IP(n x/d min [[X - Yi[[ > t) -+ 1 - F,(t) >_P(g = O) > 0
l<i<n
for every t >_ 0. []
For P ----hAd such that h e L,/(,+~)(A"), let Pr = h~ A4 be defined as in (7.5). Set

B = {g~ Ll(Ad): g >_O,f gdA~ <_l}.


Then the function
f
F : B -~ [0, oo], F(g) = Jg-r/ddP
satisfies
(9.1) E(hr) = min F(g) = [Ihlld/(d+r)
gEB

(cf. Lemma 6.8). In fact, by H51der's inequality with p = d/(d + r) and q = -d/r,
one obtains

F(g) ~_ [,h[,p (/(g-r/d)qdAd) l/q

> Ilhllp = F(hr)


for g C B. This shows that an asymptotically optimal random quantizer for P of order
r is given by the distribution Q --- Pr provided the uniform integrability condition
holds for Pr. In this case Theorem 9.1 yields

(9.2) lim nr/dE rain IIX - Y~llr -- F ( 2 + r ) Mr(B(0,1))llhlld/(d+~ ).


n--coo l ~i ~_n a
We will not discuss uniform integrability of (nr/d mint <_i_<~ItX - Y~IIt ) ~_>~ in general
but consider only the case of uniformly distibuted 1I//.
9. Random quantizers and quantization coetficients 129

9.2 T h e o r e m
Let A C R a be a compact set with Aa(A) > 0 and tet II1, Y2,... be i.i.d. U(A)-
distributed random variables independent of X . Assume P = Pa, supp(P) C A or
P(int A) = 1. Assume further that there are constants c > 0 and to > 0 such that

(9.3) A~(B(x, t) f3 A) >_ ct d for every x e supp(P), t E (0, to).

Then

n 1/d min IIX - Ydl -~


l<i<n

and

lim nr/dE min IIX - Y~llr = F ( 2 + d)Mr(B(O, 1))Ad(A)r/a,


n---~oo l<i_<n

where u denotes the Weibull distribution with distribution function

{01 t<0
F.(t) = ' - exp(-Ad(B(0, 1))td/Ad(A)), t > O.

Proof
Let Q = U(A). In case P = Pa and supp(P) c A, the first assertion follows from
Theorem 9.1. In case P(int A) = 1, we have for t > 0 and x E intA

IP (n t/d min I I x - Y~II < t) = 1 - 1


Ad(B(0,llltd "

for sufficiently large n. This implies

n 1/a min [IX


l<i<n
- Ydl & --
Furthermore

(9.4) supnS/dE min IIX - Yd[ 8 < co for every s c [1, co).
n>l l_<i<r~

This property implies that (n rid mint _<,<_nIIx- ~llr),~___lis uniformly integrable which
yields the second assertion.
To prove (9.4) note that supp(P) C A. Let x E supp(P). We have
oo

E min I I z - Y, IIS = / h ° ( m i n I I z - Y d l >tl/S)d t


l<i<n ,] l<i<n
0
diam(A)
= s f [1 - O(B(x,t))]"t '-~ dt.
0
130 II. Asymptotic quantization for nonsingular probability distributions

Observe that (9.3) holds (with a different constant c) for every t c (0, diam(A)) if
diam(A) > to. In fact, choose tl E (0, to). Then for every to _< t < diam(A)

Aa(B(x, t) N A) >_Aa(B(x, tl) rh A) >_ ct~ >_c(tl/ diam(A))at a.

Using the inequality (1 - z) ~ _< e -nz, 0 < z < 1, one obtains with cl = c/Ad(A)

diam(A)

E l<_i_<nmin[Ix - Yi[[8 _< / (1 - cltd)nt s-1 dt


0
oo

<_ S / exp(--ncltd)t s-1 dt


0

_ sF(s/d) _. c2n_8/a.
d(r~Cl)S/d
This gives

supnS/dE min IlX -


n>l l<i<n
Y, II s = supnS/dfE min [Ix -
u>_l j l<_i<_n
Yi[[sdP(x) < c2.
--

[]
The regularity condition (9.3) with supp(P) replaced by supp(U(A)) is discussed in
Section 12. Compact convex subsets A of R d with Ad(A) > 0 satisfy this condition.
Next we will deal with consequences for the quantization coefficients.

9.2 Random quantizer upper bound

Clearly, random quantizers cannot be better than optimal quantizers giving the fol-
lowing upper bound for Qr([0, 1]d). For the/2-norm, it is due to Zador (1963, 1982).

9.3 P r o p o s i t i o n

Qr([O, 1]d) <_F(2q-d)Mr(B(O, 1)).

Proof
Choose A = [0, 1]d and P = U([0, 1]d) in Theorem 9.2. Then

Mn,r([O, 1]d) -<- E l<_i<_n


min ]IX - Yi[f, n > 1

which yields the assertion. []


9. Random quantizers and quantization coet~cients 131

A comparison of the random quantizer and the cube quantizer upper bound shows
for the/2-norm and r = 2

F(2 +2)M2(B(O, 1)) < M2([O,1]~), d >_7.

For the/1-norm and r -- 1 one obtains

F ( 2 + d ) M I ( B ( 0 , 1 ) ) < MI([0, 1]~), d_~ 5.

A comparison of the random quantizer upper bound and the ball lower bound given
in Proposition 8.3 can be found in the Tables 9.1 and 9.2.

d M2(B(0, 1)) F(2 + ~)M2(B(O, 1))


1 0.0833 0.5
2 0.1592 0.3183
3 0.2309 0.3471
4 0.3001 0.3989
5 0.3676 0.4566
6 0.4338 0.5165
7 0.4991 0.5774
8 0.5636 0.6386
9 0.6276 0.7000
10 0.6910 0.7614
20 1.3105 1.3714
30 1.9169 1.9744
40 2.5175 2.5733
50 3.1149 3.1696
100 6.0801 6.1325

Table 9.1: /2-norm, r = 2. Ball lower bound and random quantizer upper bound for
Q2([O, 1]d)
132 H. Asymptotic quantization for nonsingular probability distributions

d MI(B(O, 1)) r(2 + ~)MI(B(0, 1))


1 0.25 0.5
2 0.4714 0.6267
3 0.6814 0.8113
4 0.8853 1.0031
5 1.0855 1.1960
6 1.2831 1.3887
7 1.4788 1.5809
8 1.6730 1.7725
9 1.8662 1.9636
10 2.0585 2.1542
20 3.9545 4.0422
30 5.828O 5.9128
40 7.6920 7.7526
50 9.5506 9.6330
100 18.8083 18.8886

Table 9.2: /1-norm, r = 1

9.3 d-asymptotics and entropy

From the ball lower bound and the random quantizer upper bound for Qr([0,1] 6) we
deduce the following approximation for large d.
9.4 C o r o l l a r y
Let the underlying norm be the Ip-norm, 1 <_p < c~. Then

lim d-~/PQr([O, 1]6) = P~


d--~ 2r(ep)~/v r(~) ~"

Proof
By (2.5) and (2.6)

g)r/6
dF(1 + p~
Mr(B(O, 1)) =
(d + r)2rr(1 + i)r"

From Stirling's formula for the F-function, i.e.

r ( x ) ~ v ~ x * - ½ e -x as x -~ o~,

we deduce

TJ,F(1 rip
9. Random quantizers and quantization coefficients 133

Since lim~_~ F(2 + ~) = P(2) = 1, the assertion follows from the Propositions 8.3
and 9.3. []

For the/2-norm and r = 2, one obtains from the preceding corollary


1
lim d-'Q2([O, 1] d) = = 0.0585 . . . .
d-~o~ 27re

In this case the cube quantizer yields


1
d-lM2([0, 1]d) = ~ = 0 . 0 8 3 3 . . . , d > 1

and the lattice quantizer based on the lattice Dd gives


1
lim d-I M2(W(O[Da) ) = --~
d--+o~

(cf. Example 8.13). For t h e / l - n o r m and r = 1, one gets

dlim
- ~ d-*Ql([O, 1]d) = ~i = 0.1839... ,
1
d-tMl([0, 1]d) = ~, d > 1,

lim d-IMI(W(OIDd)) 1
d-~ 4

The d-asymptotics for the quantization coefficients of arbitrary product measures


with identical one-dimensional marginals now follows from the preceding corollary
and the well known fact that the Renyi entropy of order s approaches the Shannon
differential entropy as s --~ 1. For P = hA a, the R e n y i e n t r o p y o f o r d e r s is defined
by

(9.5) Hs(P) = 1 log f h 8 dA d, O < s < l.

The d i f f e r e n t i a l e n t r o p y is defined by

(9.6) H(P)=- f hloghdAd=- f loghdP

provided the integral exists. Note that in (9.5) and (9.6) the entropies are calculated
in nats and not in bits.
For the/2-norm, the following result is contained in Zador (1963). []
9.5 P r o p o s i t i o n
Let X 1 , X 2 , . . . be i.i.d, real random variables with distribution P. Suppose Pa ~t 0
and E]X1] ~+~ < co for some 5 > O. Let the underlying norm be the lp-norm, 1 <_p <
co. Then
lim Q~(X,,.. . , Xa) = 0 if Pa ~ P,
d--+ oo
134 II. Asymptotic quantization for nonsingular probability distributions

prerH(e)
liraoo d-~/vO~(X1,...
d-~ ,Xd) - 2,.(ePl,./pF(1) r if P~ = P.

Proof
It follows from the assumptions that
d d
(®.)
1 1
,x.)ll

The r-th quantization coefficient of (X1,... , X~) is given by

Qr(X1,... ,Xd) = Q~([O, 1)d)Pa(R) d (~1


d h d/(d+~)

where h : dPa/dA d and/54 -- Pa/Pa(R ). Since h E /W0+r)(A) by Remark 6.3, the


differential entropy H(/Sa) is well defined and H(/Sa) E [-co, co) (cf. Vajda, 1989, p.
316). We have
d

log ~ h :dlogllhlld/(~+~)
1 d/(d+r)

= (d + r) log f h d/(d+r) dA

= rHd/(,+~)(P~).
Therefore
d
lim log (~) h = rH(D~).

Furthermore

lim d~/PP~(R)d = 0 if P~(R) < 1.


d-+oo

Thus both assertions folow from Corollary 9.4. []


We see that d r/p is the correct order of convergence of Qr(X1,... ,Xd) to infinity
(under the/p-norm) provided P = Pa and H(P) E

9.6 R e m a r k
(a) Proposition 9.5 immediately yields the d-asymptotics for the vector quantizer
advantage as defined in (7.8) in the i.i.d, case:

(9.7) lim dQ~(Xl) --- 2rerr(~)rQ~(Xl) > 1.


d-~ Qr(X1,... ,Xd) rre rH(xl) --
9. Random quantizers and quantization coefficients 135

(Here the underlying norm in t h e / r - n o r m , EIXI[ r+a < c~ for some 5 > 0, and the
distribution of X1 is absolutely continuous with respect to A.)
(b) We know from Table 8.2 that infQ2(P) -- 0 and supQ2(P) = 0% where the
infimum and the supremum are taken over all univariate symmetric (absolutely con-
tinuous) distributions P with variance equal to one. On the other hand, the normal
distribution N(0, 1) is the unique maximizer of the differential entropy among all
probabilities P with mean zero, variance equal to one and supp(P) -- R. For such
distributions P it follows from Proposition 9.5 that
d d

for sufficiently large d. Consider for instance, the hyper-exponential distribution


P = HE(a, b) with a u -- F(~)/F(~) and b --- 1/10. Then Q2(P) ~- 37.0908... while
Q2(N(O, 1)) = 2.7206... (cf. Table 8.2). The inequality (9.8) holds for d _> 2 (cf.
Table 9.4).
Anlogous statements are valid for distributions which maximize the differential en-
tropy among other classes of univariate distributions.

Notes

A discussion of the uniform integrability condition in Theorem 9.1 can be found in


Zador (1963) and Stadje (1995). Gersho (1979) contains an extension of Proposition
9.5 for the/2-norm to stationary ergodic sequences.
136 H. Asymptotic quantization for nonsingular probability distributions

P H(P)
N(0, o.2) ½log(21ra 2) + 1

L(a) loga + 2
DE(a) log(2a) + 1
Dr(a,b) log(2aF(b)) + (1 - b)¢(b) + b
HE(a,b) log(2ar )/b) ) +
U([a, b]) log(b - a)
E(a) log a + 1
V(a, b) log(aV(b)) + (1 - b)¢(b) + b
W(a,b) log(~) + b-b~:~-+ 1
b+l
P(a,b) log(~) + b

Table 9.3: Differential entropies. ¢ = F'/F, 7 = Euler's constant = 0.5772...

d
Qr(® P)
P Q,~ = exp(rHd/(d+r)( P) )
N(O,o
L(a) t~ ( T !

DE(a) (2a)r d+r


DF(a, b) 2a~_)f_( d+r~bd+r~{bd+r~d+r
Fib)a ~ d / ~ d+r ]

HE(a, b)

Table 9.4: Quantization coefficients for product probability measures up to Qr([0,1] d)


10. Asymptotics for the covering radius 137

10 AsyInptotics for t h e c o v e r i n g r a d i u s

Let P be a Borel probability measure on R d a n d let II II denote any norm on R d. For


1 < r < oo and g : R d -+ R Borel measurable, define

~ (f lglrdP) l/r, 1 <_ r < oo


1191b',~ = Ilgtlr = [inf{~ > 0: Igl < c P-a.s.}, r = oo

and
(10.1) e,,,dP) = i~r Ildoll.
oCR d

where do(x) = d(x, ~) = inf IIx - aiI. It follows from L e m m a 3.1 that, for 1 _< r < cx~,
aEo

(10.2) V~,r(P) 1/~ = e,,~(P).

In this section we consider the case r = oo and discuss its relation to the quantization
problem (r < oo).

10.1 Basic properties

Note t h a t e~,o~(P) < oo if s u p p ( P ) is compact. It follows from the continuity of do


t h a t Iidoiloo = sup do(x) and hence
xCsupp(P)

e,,,oo(P) = inf sup m i n i i x - a[].


acRd xEsupp(P) uEo
]o]<_n

So, if we define for a nonempty compact set A C R d,

(10.3) en,~(d) = inf m a x min mix- a]l


oC~ d xEA aCo
Iot<_~
then e,~,oo(P) = e~,oo(A) for every probability P with s u p p ( P ) = A. A set ~ C ~d
with ]~1 <- n for which the above intlmum is attained, is called an n - o p t i m a l s e t
o f c e n t e r s for A o f o r d e r c~. Let C~,~(A) denote the set of all n - o p t i m a l sets of
centers for A of order c~. Since

max
xEA
minllx-~ll--min{s
aC~
>0:
--
U B(~,8)~A}
aEo

searching for ~ E C~,oo(A) is equivalent to the geometric problem of finding the most
economical covering of A by at most n balls of equal radius. The number e~,oo(A) is
called n - t h c o v e r i n g r a d i u s f o r A . Recall t h a t the Hausdorff metric is given by

dtt(A, B) = m a x ( m a x min Ila - bll, max min II~ - bll }


" aEA bEB bEB aEA

for nonempty compact sets A, B c R d.


138 H. Asymptotic quantization for nonsingular probability distributions

10.1 L e m m a
Let n C N.

(a) I l l < r < s < co, then e,~,~(P) <_ en,8(P).


(b) I r s u p p ( P ) is compact, then lim e~,r(P) = ea,oo(P).
r--~oo

(c) Let A denote the support of P and suppose A is compact and [A[ >_ n. Let
a~ C C,~,~(P), 1 <_ r < co, and let (rk)k>__t be a sequence in [1, oo) converging to
infinity. Then the set Of dH-cluster points of the sequence (ark)k>_1 is a nonempty
subset of C~,oo(A) and
lim dH(ar, C~,oo(A)) = O.
r-~oo

Proof
(a) follows from the fact that [[ [It -< [[ [Is for r _< s.
(b) and (c). Let u = diam(A) and choose s > 0 such that A c B(0, s). It follows
from Theorem 4.1 and L e m m a 2.6 that
C,~#(P) C {a C Rd: 1 < lal < n, ~ c B(0, s + u)}
for every r E [1, co) provided ]A[ > n. Using L e m m a 4.23 we deduce the existence
of a dH-cluster point for (a~k)k>l. Now let a be the dH-limit of a subsequence of
(ar~)k>l which is again denoted by (ark)k>> We have
[[d~rk - d~[[r~ _< IIda~ - d~[[oo
< sup[d(x,o~rk) -- d(x,a)l
xER d
= dH(a~, ~)
and hence
[Idar, llrk >__[Idal[r, - [Id,*rk - da[[rk
>_ lid, lit~ - dH(ark,a).
By (a), l i m e~,r(P) exists and is less than or equal to e,~,o~(P). Therefore

e~,o~(P) > ,~lim~,r~(P)= lira Ila~r~lit,


>_ l i r a [[da[[r~ = [[dalloo >_ e~,o~(P).

This gives (b) and the first assertion of (c). The second assertion of (c) follows from
the first one. []
For A C R a compact with Ad(A) > 0, set

(10.4) M,~oo(A)- en,oo(A)


, Ad(A)l/d"

In spite of the slight inconsistency in notation we continue to write M ~ (A), M,,,~ (A),
Qo~(P) etc. for the corresponding notions in case r = oo.
10. Asymptotics for the covering radius 139

10.2 L e m m a
Let A c R a be a nonempty compact set and let T: R ~ --+ R d be a similarity trans-
formation with scaling number c > O.

(a) C~,oo(T(A)) = TC~,oc(A).


(b) e~,~o(T(A)) = ce~,oo(A).
(c) M,~,~(T(A)) = M,~,oo(A) ifAd(A) > O.

Proof
Obvious. []
The existence of n-optimal sets of centers of order co can be derived from the existence
of n-optimal sets of centers of order r < co (cf. Theorem 4.12) and Lemma 10.1(c).

10.3 L e m m a ( E x i s t e n c e )
If A C R d is a nonempty compact set, then

C,~,~(A) ¢ O.

Proof
We assume without loss of generality that [A[ _> n. To show that the assertion follows
from Lemma 10.1(c) it suffices to note that A -- supp(P) for some Borel probability
measure P on N ~. If A is finite, set P = ~ 5aliA]. Otherwise let B = {bl, b2,... }
aEA

be a countable dense subset of A and set P = ~ 2-n(~b~. Then A -- supp(P). []

The covering problem can be formulated in terms of the Hausdorff metric and the
Loo-minimal metric.
10.4 L e m m a
Let A C R d be a nonempty compact set. Then

e,~,oo(A) = inf dH(a,A).


lal<_n
If e,~,oo(A) < e.-1,~(A) (eo,oo(A) := co), then
C~,~o(A) = {a C Rd: 1 <_ [~l <_ n, dH(~,A) = e~,oo(A)}.

Proof
Let ~ • Cn,oc(A) and set

t3 = {a • a: d(a,A) <_ e~,oo(A)}.


Then fl ~ 0 and

e~,oo( A ) = mea~ d(x, ~) = mea2 d(x, fl) = dH(fl, A ).


140 H. Asymptotic quantization for nonsingular probability distributions

This yields
e,~,~(A) > inf dH(O~,A).
-lal<~
The converse inequality is obvious. Furthermore, the inclusion

C~,o~(A) D C : = {a C Rd: 1 _< [a[ _< n, dH(a,A)= e~,~(A)}

is also obvious and fl E C. Now assume e~,oo(A)_< en-l,oo(A). Since fl E C,~,oo(A),


one gets [fl[ = n. This gives a = fl and thus a E C. []
The L ~ - m i n i m a l m e t r i c po¢ is given by

Pc~(P1,P2) = inf{e > 0: PI(B) < P2(dB ~_ ¢) for all B E B(Ra)}


for Borel probability measures P1, P2 on N d with compact support.

10.5 L e m m a
Suppose supp(P) is compact and let X be a Na-valued random variable with distri-
bution P. Then
e~,~(P) = inf p~(P,Q)

= inf poo(P,PS)
feYn
---- inf esssupIiX - f(X)][.
feY~

Proof
Let A denote the support of P. If Q E P~ with Q(a) = 1, Io4 <_ n, let c > 0 such
that Q(B) < P(d8 <_~) for all B E B(Rd). Then

1 = Q(a) = P(d~ <_c)


which gives A C {d~ _< ¢}. Therefore

]Idol] = m a x d ( x , a ) < poo(P,Q)-


xEA

This implies

e,~,oo(P) <_ ~cn~pp:c(P,Q) < )nf poo(P,Pf).

If a C R d with 1 < [a[ < n, let ¢ = m a x d ( x , a ) . Choose a Voronoi partition


xEA
{Aa: a E a} o f R a with respect to a and let f = ~ alA~. Since AaMA C B(a,¢) for
aEa
every a E a, one obtains for ~ C a

"aE,6 " "aE/~ "


10. Asymptotics for the covering radius 141

Therefore
Poo(P,P f ) (_ c = m a x d ( x , o:).
xEA
This implies
inf p~(P, Pf) < e,o~(P).
fEY~ -- '

For f E E~, let c~ = f ( R d) and Aa = {f = a}, a E ~. Then


ess supllX - f(X)ll

= inf{c_>O: E P ( A ' ~ f - I {XE Ra: I l x - a l l > c } ) : O}


aE~

>_ inf{c_> O: E P ( A , , n { d , ~ > c } ) = O }


aE~

= IId lW,
and if {A~ : a E a} is a Voronoi partition of R d with respect to a, then

esssupllX - / ( X ) l l = IId lW.


This implies
e~,~(P) = inf esssupllX - f(X)] I.
fEY~
[]
Since poo(P1,P2) > dH(supp(PO, supp(P2)), p~-eonvergence implies weak conver-
gence and dH-Convergence of the supports.

10.2 Asymptotic covering radius

Clearly, if A c ]~d is nonempty compact then e~,oo(A) decreases to zero as n -+ oc.


We need the following simple lemma.
10.6 L e m m a
(a) If A, B C Rd are nonempty compact sets with A c B, then e~,~(A) < e,~,~(B).
m

(b) If Ai C R d axe nonempty compact sets and ni E N with ~ ni <_ n, then


i=l
m

e~,oo( U A,) < max e,~,,o~(A,).


i_<i -- i_(i_<m

Proof
(a) Let fl E Cm~(B ). Then
e,,,.(A) < maxmin I l x - bll < e,~(B).
-- xEA bEfl -- '
142 H. Asymptotic quantization for nonsingular probability distributions

(b) Let ai • Cm.~(Ai ) and let ~ = 0 ai. Then I~l -< ~ and therefore,
i=1
m

e ~ , ~ ( U A~) _< ma~


rt~
min
O,EC¢
If~ -- ~1[
i=1 xE U Ai
i=l

---- max m a x m i n l l z - all


t<_i<_mxEAi aEa
< max m a x m i n l l x - all
-- l<_i<_m x E A i aEai

= max e,~,,oo(Ai).
l<_i<_m
[]
Now we can derive the exact asymptotic first order behaviour of the covering radius
e,~,oo(A) for compact Jordan measurable sets A with Ad(A) > 0.

10.7 T h e o r e m ( A s y m p t o t i c c o v e r i n g r a d i u s )
Let A C R 4 be a nonempty compact set with Ad(OA) = 0. Let

Qoo([0,1] 4) = inf nt/de~,oo([O, 114).


n>_l
Then Q~([0, 1]4) > 0 and

lira nl/den,oo(A) = Qoo([O, 1]4)Ad(A)I/d.

Proof
The proof is given in three steps.
S t e p 1. Let A - - [0,1] d. Let m , n e N, m < n a n d l e t k = k ( n , m ) = [(~)t/d].
Choose a tesselation of the unit cube [0, 1]d consisting of k d translates C1,... , Cad of
the cube [0, ~]
1 d. Then by Lemmas 10.2 and 10.6,

e~,oo([O, 1] d) < max e.~,~(Ci)


t<_i<_ka
- -

= max Mm,o~(Ci)k -t
1<i<~
= k-lMm,oo([O, 1] e)
= k-le~,~([O, 1] ~)
and hence

l'~t/den,oo([O, 1] d) ~ ~--~Tr~l/dern,oo([O, 1]d).

This implies

lira sup nl/de~,oo([0, 1]d) _< mVde,,~,oo([O, 1]d)


10. Asymptotics for the covering radius 143

for every rrt • N. Therefore, lin~-,oo nl/den,oo([O, 1]d) exists in [0, cx3) and

(10.5) limoo nW%n,oo([O, 1] <l) =- Qoo ([0, 1]d)-

From the subsequent Proposition 10.10 (a) it follows t h a t Qoo([0,1] d) > 0.

S t e p 2. Let A = 0 Ci, where { C 1 , . . . ,C,~} is a packing in R ~ consisting of closed


i=1
cubes whose edges are parallel to the coordinate axes and with c o m m o n length of the
edges l(Ci) = l > 0. Let nl = nl(n) = [~]. T h e n by L e m m a s 10.2 and 10.6

e.,oo(A) < max en<,oo(Ci)


-- l<i<ln
---- m a x Mm,~(C~)I
l<i<_m
= e,~,oo([O, lid)l, n _> m.

From Step 1 it follows that

nllUe,~,oo([O, 1] d) = [t~11}
n ~ Udnl1/den,,~t[u,
. . . . ±J~,) --+ ml/gQoo([O, 1]d) as n _+ co"

This implies

(10.6) limsupnllden,oo(A) < Q~([0, 1]d)ml/dl : Qoo([0, lld)Ad(A) 1/d.

To prove t h a t Qoo([0, 1]a)Aa(A) TM is a lower bound for L = liminfn_.~ nl/aen,~(A),


let Z = Z(n) • C . , ~ ( A ) for n • N, ~ = Z~(~) = ~ n i n t C ~ and n~ = n~(~) = IZ~I,
1 < i < m. For 0 < c < I/2, let Ci,~ C Ci be a parallel closed cube with the same
midpoint as Ci and edge-length l(Ci,c) = l - 2~. Choose a finite set 7i = 7i(e) c Ci,c,
17il = k = k(v) say, such that

minllx - ~11 < i~f I1~ - yll for every x E Ci,~, 1 < i < m.
aCTi - - yEC~

Then

e~,~o(A) = m a x min Ix - bI]


xcA bE~
>_ m a x max min IIx - biI
l <_i<m xECi bE~U'#i
>_ m a x m a x r a i n I Ix - bll
l <_i<m xCCi,~ bE~U'Ii

max m a x min IIz - b l l


l <_i<m xCCi,e bEfliUTi
>_ m a x en,+k,o~(Ci,e)
l <_i<m

m a x em+k,oo([0, 1 ] a ) ( / - 2e).
l~i_<rn
144 H. Asymptotic quantization for nonsingular probability distribu tions

Choose a subsequence (also denoted by (n)) such that


ni
-- --~ vi E [0, 1], 1 < i < m and nl/den,oo(A) -+ L as n --~ co.
n

Since ni _< n, we have ~ vi _< 1. Furthermore, vi > 0 for every i. Otherwise, Step
i:1 i:1
1 yields L = co, which contradicts (10.6). By taking a further subsequenee we can
assume without loss of generality that

lirn (ni + k)l/deni+k,oo([O, 1]d) = Qoo([o, 1]~).

This implies
L > max Qoo([0, 1]d)v~-t/d(l -- 2C).
l <i<_m

Since 0 < e < I/2 is arbitrary and maxl<i_<m v~ 1/a >_ m l/e, one obtains

L > max Qoo([0, 1]d)v:~l/gl


-- l<i<rn

(10.7) _> Q~¢([0, 1]d)ml/dl


= Q~([0, 1]d))~d(A) lid.
S t e p 3. Let A be an arbitrary compact subset of R d. Let A c C for some closed
cube C whose edges are parallel to the coordinate axes with edge-length l(C) = I. For
k C N consider a tesselation of C consisting of closed cubes C1,... , C ~ of common
edge-length l(C d = I/k. Set

Ak = [.J { Ci : A n Ci ¢ O, i < k d}
Since A C Ak, it follows from Step 2 and Lemma 10.6 (a) that

limsupnl/de,~,oo(A) <_ lim n~/de,~(A})


= Qoo([0, 1]d),kd(Ak)l/d, k E N.

If U C R d is an open set with A c U, then Ak C U for sufficiency large k. Therefore


he(A) <_inf ~d(Ak)
k>l

_< inf{)~d(U) : U C R d o p e n , A C U} = )~d(A)

which yields

(10.8) limsupnUden,ac(A) <_ Qo.([0, 1]d)Ad(A)l/d.

Now s u p p o s e )~d(OA) ---- 0. To prove that Qoo([0, 1]d))~d(A)t/d is a lower bound for
liminf~_~o~ nt/de,~,oo(A) we may assume that Ad(A) > 0. Set

Bk = LJ {Ci : Ci c A, i < k d}
10. Asymptotics for the covering radius 145

Since Bk C A, it follows from Step 2 and Lemma 10.6 (a) that


liminfnUden ~(A) > lim nl/de,~,~(Bk)
= Q~([0, 1]a)Ad(Bk) Ud, k C N with Bk ¢ 0.
Since Ad(0A) = 0, A is Jordan measurable and hence suPk>l Ad(Bk) ----A4(A). Thus
we obtain
(10.9) liminf nl/de~,oo(A) >_ Q~([0, 1]d)Ad(d) 1/d.

Combining (10.8) and (10.9) the theorem is proved. []


For compact sets A with ~ ( A ) = 0 the preceding theorem only yields e,~,~(A) =
o(n-1/d). An investigation of the exact order of en,~(A) for several classes of compact
sets A with ~d(A) = 0 is contained in Chapter III.
For A C R d compact with Ad(A) > 0 and ~(OA) = 0, define the covering coeffi-
cient (with respect to coverings of A by balls of equal radius) by
(10.10) Q~(A) = Q~([0,1]d)A~(A) 1/d.
Qoo(A) is sometimes called q u a n t i z a t i o n coefficient of o r d e r ~ . It follows from
Lemma 10.1(a), Theorem 6.2 and Theorem 10.5 that Qr(P) 1/r is increasing in r and
(10.11) lim Qr(p) ~/~ = lim Qr(A) ~k <_Qoo(A)
r--~OQ r-~oo

provided supp(P) -- A and P is absolutely continuous with respect to Ad.


10.8 R e m a r k
(a) We conjecture that equality holds in (10.11) (cf. the special cases given in (10.17),
(10.19) and (10.20)). This would show in a precise manner that the covering problem
is a limiting case of the quantization problem.
(b) Possibly the condition Ad(OA) = 0 can be dropped in Theorem 10.7. This is true
if the conjecture in (a) can be resolved for the unit cube. In fact, we have for an
arbitrary nonempty and compact set A C R ~ with )~d(A) > 0
Qr([0, 1])l/r~d(A)i/d = Q~(A) 1#
= lim nl/~e,,~(U(A))
~-4OO

< lim inf nl/de,~,~ (A)


n--~oo

< limsupnl/%m~(A )
Qoo([o, 1]d))~d(A)1/d, i <_r < co,
Here the first inequality follows from the Lemmas 10.1 and 10.6 (a) while the last in-
equality follows from (10.8). Since by assumption l i m ~ Q~([0,1]~) Ur = Qo~([0,1]~),
this implies
lim nl/de~,~ (A) = Q~ ([0,1] d)A~(A)1/d.
146 II. Asymptotic quantization for nonsingular probabifity distributions

10.9 R e m a r k
Let A C ]~d be an infinite compact set with Aa(0A) = 0. For ¢ > O, let N(c, A) be
the minimal number of balls of radius 6 > 0 which are necessary to cover A, i.e.,

N(¢) = N(E, A) = min(n _> 1: e~,oo(A) <_~}

(
=min. n>l:3aCR a, l a l < n , C
aE(~
B(a, 1) }
Then
>_eN(~),oo(A),
hence by Theorem 10.7

lim infN(¢)¢ d > lim N(¢)eN(c),oo(A) d


e--~0 -- ¢-~0

= Qoo([0, 1]d)dAd(A).

The definition of N(¢) implies

< eN(c)-l,oo(A),
hence again by Theorem 10.7

lim sup N(e)¢ d < lin~ N(c)eN(e)-l,oo (A) d


¢-~0
= Qoo([0, 1]dldAd(A).

We obtain

(10.12) lim N(~, A )~d = Q~ ([0, 1]d)dAd(A).


g--~0

(Actually, this limit result is equivalent to the assertion of Theorem 10.7). From
(10.12) we deduce that Ad(B(0, Qoo([0, lid))) coincides with the density of the thinnest
covering of the whole space by translates of B(0,1) (cf. Gruber and Lekkerkerker,
1987, p. 237, Definition 6). The existence of the limit Iim N(~, A)~ d appears in Gruber
~--+0
and Lekkerkerker (1987, p. 237, Theorem 7) for convex compact bodies A.

10.3 Covering radius of lattices and bounds

As for the quantization coefficients, the covering coeffcient Qoo([0,1] d) is only known
for d = 1, d = 2 (/1-norm, /2-norm), and in "trivial" cases for d > 3. Lower and
upper bounds for Qoo([0,1] d) which correspond to those given in Proposition 8.3 and
Theorem 8.9 can easily be derived.
For A C R d compact with Ad(A) > 0, set

Moo(A) = Ml,oo(A).
10. Asymptotics for the covering radius 147

Note that if A C R d is an admissible lattice, then

(10.13) Moo(W(01i)) = max{llx[[ : x • w(0, A)} _ supxeR~ minaeh I[x - all


det(A)Ud - det(A)Ud '

that is, Moo(W(0IA)) is the normalized covering radius of A (with respect to the
whole space).

10.10 P r o p o s i t i o n

1
(a) lirar~oo Qr([0,1]d) Ur _> M~(B(O, 1)) = Ad(B(0 ' 1))1/d.

(b) /I'A C R d is an admissible lattice then

Qo~([0, 11d) _< Moo(W(OIA)).

Proof
(a) By Proposition 8.3

lira Qr([0, lid) l/r >_ lim Mr(B(O, 1)) Ur


r-~oo r-~oo

= io~(B(O, 1)).

(b) For n • N, let

c = c(n) = (n det(A))-l/d,
/3~,A = { c ~ : a • A, c W ( ~ l h ) n [0,1] d # 0},
k -- k(n) -- IZ,~,AI

and

A,~ = U W(blcA).
bE,8,,.,A

Then 1 __5Ad(An) = k/n. If U c R d is an open subset with [0, 1]d C U, then A , c U


for sufficiently large n. Therefore

1 < inf k(n) : inf Aa(A,~)


n_>l n n_>l

inf{Ad(U) : U C R d open, [0, 1]d C U}


= .xd([o, 1] d) = I.
148 II. Asymptotic quantization for nonsingular probability distributions

Furthermore,

ek,o+([0,1] d) < max min ]lx-bll


-- xC[O,1] d bef~u,A

< max min IIx - bl]


wEAr, bE~n,A
= max{l[xll : x e W(OIcA)}
= cmax{llxlr:x e W(OIA))
-_ n-'/'~Moo(W(OlA)).

This implies

Q~([0,1{) = i.f kl/%,~([0,1{)


n>l

<_i~>_l(k(~nn) )l/d Moo(W (OlA))


=/~(W(01A)).
[]
Let the lattice covering coefficient of [0,1] d be defined by

(10.14) Q~)([0,1] d) = inf{M~(W(0[A)) : A c R d admissible lattice}.

Then

(10.15) Qoo([o, 1]d) _< Q~)([o, 1]d).

Note that Q(L)([0, 1]d)1/r is increasing in r and

(10.16) li+m Q(L)([0, 1]d) 1/r < Q~)([0,1]d).

If the underlying norm is the/2-norm, Q~([0,1] ~) is known for d = 1 and d = 2. For


d = 2 and the hexagonal lattice A c R 2, it follows from Theorem 8.15 that

lira Qr([0,112) 1/r = lira


/'--+(30 r--+OO
Mr(W(OIA)) 1/r
=/oo(W(0[i))

and hence, by (10.11) and Proposition 10.10 (b)

Q~([0,1] 2) -- lira Q~([0,112) 1/~


r=~OO

(10.17) = M~(W(0[A))

= (~_~)1/2 = 0.6204... , l~-norm


10. Asymptotics for the covering radius 149

This result is due to Kersher (1939). Solutions of the lattice covering problem are
known for dimensions i < d < 5 among them the hexagonal lattice for d -- 2 and D~.
We have

(L) d 1)1/24 ( d(d + 2). )1/2, 1 < d < 5, /2-norm


Qoo ( [ 0 , 1 ] ) - - ( 4 + 12(4+1)
21/a51/~
Q(oo
L) ([0,1] a) - - - - 0.7043... ,
(lO.18) 4
21/2
Q~)([O, 1]4) = ~ = 0.7733...,
351/~
Q~)([O, 1]5) - 21/269/1----------
~ = 0.8340...

(cf. Conway and Sloane, 1993, p. 12 and Chapter 2, Section 1.3 and the subsequent
Remark 10.10 (a)).
A trivial case occurs for the/1-norm and d = 2, where B(0, 1) -- W(OID2) (cf. Example
8.13). Therefore

Qoo([0,1] 2) = I i m Qr([0, 112)i/r


(10.19) 1
= M~o(B(O, 1)) = ~ = 0.7071... , /i-norm.

A further trivial case concerns the/oo-norm. Then B(0, 1) = W(01Z 4) and so

Qoo([0,1] d) = lim Qr([0,1]d) 1/r


(10.20)
1 lo~-norm
= Moo(B(O, 1)) ----2'

(cf. (8.8) and (8.9)).


10.11 R e m a r k
(a) The lattice covering coefficient Q(oo
L) ([0,1] d) coincides with the so called lower abso-
lute inhomogenious minimum of the ball B(0, 1) and A~ (B(O, Q~) ([0,1]~))) coincides
with the density of the thinnest lattice covering of the whole space with B(0, 1) pro-
vided every lattice is admissible (cf. Gruber and Lekkerkerker, 1987, p. 230 and p.
236). Note that
= 1

sup det(A) l/d'


where the supremum is taken over all admissible lattices A such that {B(a, 1) : a E A}
is a covering of R d. For arbitrary (not necessarily admissible) lattices A one can show
that
Qoo([o, 1] d) < max{ilxll: x e W(01A)}
- det(A)Ud
(cf. Gruber and Lekkerkerker, 1987, Theorem 6, p. 235).
150 H. Asymptotic quantization for nonsingular probability distributions

(b) ( d - a s y m p t o t i c s ) If the underlying norm is the/v-norm, with 1 < p < co, then
P
lira d-~ooinfd-U'Qoo([O, iI d) _> 2(ep)l/pF(}).

This follows from Proposition 9.4. The upper bound

Qoo([O,1]d ) < (d log d + d log log d+5d) Vd


- Ad(B(0, 1))Ud , d _> 3

which is due to Rogers (1957) gives


P
limd_~oosupd-1/PQoo([0,1]d) -< 2(ep)I/PF(1/P)"

Therefore
(10.21) lim d-UPQoo([O, 1]d) = P /p-norm.
d-~oo 2(ep) VVF(1/p)'
Thus we find for the covering coefficient Qoo([0,1] d) exactly the same d-asymptotics
as for the r-th roots Q~([0, lid) 1/~ of the r - t h quantization coefficients, 1 _< r < co
(cf. Proposition 9.4).
(c) Let the underlying norm be the/2-norm. For d = 2, the solution of the lattice
quantizer problem - - given by the hexagonal lattice - - does not depend on r (cf.
Theorem 8.15). For d --- 3, the lattice D~ solves the lattice quantizer problem for
r -- 2 (cf. (8.8)) and the lattice covering problem (r -- co). So possibly D~ solves the
lattice quantizer problem for every r. For d = 4, the solution of the lattice covering
problem is unique up to linear similarity transformations (cf. Baranovskii, 1965) and
differs from the best known lattice quantizer/94 for r = 2 (cf. Conway and Sloane,
1993, p. 12 and p. 61). Therefore, solutions of the lattice quantizer problem must
depend on r.

One may use n-optimal (or asymptotically n-optimal) sets of centers of order r -- co
as quantizers. Their asymptotic performance depends on the covering density of the
unit ball (cf. Remark 10.8).
10.12 P r o p o s i t i o n
Let A c R d be a nonempty compact set with Ad(A) > O, let (O/n)n_>1 be an asymp-
totically n-optimal set of centers for A of order co, i. e., [an[ _< n and
lim n TMmax min [[x - a[] = Qoo([O, 1]d)Ad(A) l/d,
n-+co x c A aEan

and let 1 < r < co. Then

limsupn ~/d f min [[x - a[[r dU(A)(x) < o(d+~)/dM~(B(O, 1))Ad(A) r/d,
n--~oo , ] aEan

where Od ----Ad(B(O, Qoo([O, 114))). In particular


Qr([o, 1]d) < zg(d+r)/dMr(B(O, 1)).
10. Asymptotics for the covering radius 151

Proof
Let sn = mea~ d(x, c~,~). Then

s~=max max IIx-all


aea, xCW(alan)nA
which gives
W ( a l ~ ) n A c B(a, s~), a e ~ .
This implies

f min IIx - aiF dU(A)(x)


acorn

<_~ f IIz-allr dx/)~d(A)


aea~W(alan)nA

-<)-]" S IIx-allrdx/'>'<~(A)
5 nM,.(B(O, 1)),,td(B(O, S,,))(d+")ldlxd(A )
= ns~+';td(B(O, 1))(d+')ldMr(B(O, 1))l;kd(A).
Therefore
nr/d
f
] min IIx - allr dU(A)(x)
J aE O~n
<_ (nsg~d(U(O,1)))(d+r)ldMr(U(O,1))/~d(A)
for every n E N. This yields the assertion. []
The above covering density upper bound for Qr([0,1] d) is better than the "trivial"
bound Q~([0, 1]d) r as long as ~d < (d + r)ld while for the r-th root

lim (#(~d+~)/dMr(B(O,1))) 1/r = Q~([0, lid).


1"~00

10.4 Stability properties and empirical versions


A stability property for the n-th covering radius in terms of the metric dH follows
immediately from L e m m a 10.4. If A, B C R d are nonempty compact sets, then
(10.22) le~,~(A) - en,o~(S)l < dH(A,B)
for every n E N. A stability result for n-optimal sets of centers of order r --- co can
be derived from L e m m a 4.22.
10.13 T h e o r e m
Let dH(Ak, A) -+ 0 for nonempty compact sets Ak, A C R d and let c~ E C,,,~o(Ak),
k E N. Suppose
(10.23) e,~,o~(A) < e,~_l,~(A).
152 H. Asymptotic quantization for nonsingular probability distributions

Then the set of dH-cluster points of the sequence (ak)k>l is a nonempty subset of
C~,oo(A) and
dH(O~k,C~,oo(A)) -+ 0 as k -+ oo.
Proof
To show that the asserton follows from Lemma 4.22 applied to the space of all
nonempty compact subsets of R d equipped with the Hausdorff metric du, the subset
N = {a C Rd: ]a[ _< n, a ~ 0}, and f = dH(A, .), it suffices to verify that

i(c) = {c~ E N: dg(a,A) < c}


is dg-compact for some c > eu,oo(A). By Lemma 10.4, this setting meets the covering
problem because the assumptions imply en,oo(Ak) < e,~-l,oo(Ak) for all large k. Choose
s > 0 such that A c B(0, s). Then

L(c) C {a e N: a E B(O,c+s)}.
Using Lemma 4.23 we deduce the dH-compactness of L(c). []
If in the preceding theorem the sets ak C Cuoo(Ak) satisfy maxd(a, Ak) < emoo(Ak)
for all large k (such a choice is always possible), then the assumption (10.23) can be
dropped.
Under suitable conditions, weak convergence of probability distributions implies the
dH-Convergence of their (compact) supports. The following special case will be needed.
10.14 L e m m a
Let Pk ~ + P for Borel probability measures on R d with compact supports Aa and
A, respectively. Then
max min [[x - y[[ -+ 0, k -+ oo.
xcA yEAk

Hence, if Ak C A for every k E N, then


lim dH(Ak, A) ----O.
k--~oo

Proof
For e > 0, choose a finite subset a of A such that A c U B(a, e). For a E a, define
aEt~
a bounded continuous function f~: R d --+ R+ by

fa(x) = max{O, 1 - [Ix - all/e }.


Then
max~e~df f ~ d P k - / f~dP --+ 0, k ~ o o .

Since rain f f~ dP > 0, one gets


aE~

minPk(B(a,c))
aEa
> min f fadPk > 0
-- aEa j
10. Asymptotics for the covering radius 153

and therefore
max min Ila - Yll <
aE~ yEA~

for sufficiently large k. This implies the assertion. []


From the stability properties one immediately obtains consistency results for empirical
versions of the covering problem. Let X1, X2,... be i.i.d. Rd-valued random variables
with distribution P. The empirical version of e~,oo(P) is given by

en,oo(Pk) = eu,o~({Xl,. . . , Xk}) ---- inf max min llXi - all ,


lalgn l <_i<_k aEa

where Pk denotes the empirical measure of X1,... , Xk.

10.15 C o r o U a r y ( C o n s i s t e n c y )
Let A denote the support of P and suppose A is compact.

(a) d , ( { X 1 , . . . , X k } , A ) = m a x min IlY- Xi[I --~0 a.s., k -+ oo.


yEA l ~ i < k

(b) e~,oo({X1,... , Xk}) --+ e~,oo(A) a.s. as k --+ co uniformly in n.

(c) Let otk = ~ k ( X l , . . . ,Xk) C C,~,oo({Xl,... ,X~}), k E N. Suppose (10.23) for


A = supp(P). Then

dH((~k,C~,oo(A)) -+ 0 a.s., k --+ oo.

Proof
Since Pk D> p a.s., the assertions follow from Theorem 10.13, Lemma 10.14, and
(10.22). []

Notice that uniqueness IC~,oo(A)l = 1 implies (10.23). Under this uniqueness condi-
tion, Corollary 10.15 (c) is contained in Cuesta-Albertos et al. (1988, Theorem 12).
Part(a) has been observed by Wagner (1971).

Notes

Some material about the issue of this section for the/2-norm and/co-norm may be
found in Niederreiter (1992), Chapter 6. In particular, the exact order n - l I d of
e~,~(A) is well known ff A~(A) > 0. However, we are not aware of a reference
concerning Theorem 10.7. Examples of n-optimal sets of centers for [0,1] 2 of order co
can be found in Johnson et al. (1990) for the/1-norm and the/2-norm. The covering
density upper bound for the r-th quantization coefficients given in Proposition 10.12
seems to be new. A discussion of the relation between the quantization problem for
r ----2, the covering problem and the packing problem can be found in Forney (1993)
for the/z-norm. For general treatments of the covering problem we refer to Gruber
and Lekkerkerker (1987) and Conway and Sloane (1993).
154 II. Asymptotic quantization for nonsingular probabifity distributions

Consistency and central limit results for a trimmed version of the covering problem
have been proved by Cuesta-Albertos et al. (1998) and Cuesta-Albertos et al. (1999).
Empirical versions of related covering problems and their asymptotics when both the
level n and the sample size k tend to infinity were studied e.g. by Zemel (1985) and
Rhee and Talagrand (1989b) for the/2-norm.
Let us mention that n-optimal sets of centers of order co are often called best n-nets
and Chebyshev-centers in case n -- 1. (cf. Garkavi, 1964, and Singer, 1970, Section
II.6.4).
10.16 C o n j e c t u r e
l i m r ~ Qr([0, lid) Ur = Q~([O, 1]d) (cf. (10.11) and Remark 10.8).
If Conjecture 8.17 can be resolved, then Conjecture 10.16 is true for d = 3 and
/2-norm. Furthermore, if Conjecture 8.17 can be resolved, then the lattice D~ pro-
vides a solution of the covering problem in R a for the /2-norm, i.e., Qo~([0,1] a) --
Moo(W(OID~) ). This is a long standing conjecture in geometry.
Chapter III

Asymptotic quantization for


singular probability distributions

In this chapter we consider some classes of continuous singular distributions on R d


and determine the asymptotic first order behaviour of their quantization errors.

11 The quantization dimension


Here we determine the order of convergence for the sequence of quantization errors
of a given distribution. X is an Rd-valued random variable and P is its distribution.
In some cases we abbreviate e,~,r(P) by en,r, Vn,r(P) by Vn,r, and C~,r(P) by Cn,r. In
this section we always assume either that 1 <_ r < co and E(IIXII r) < + c o or that
r = co and supp(P) is compact.

11.1 Definition and elementary properties

11.1 D e f i n i t i o n
D_~ := D__~(P) = liminf ~--~OO
l°g~
-- l o g e n , r
is called the l o w e r q u a n t i z a t i o n d i m e n s i o n o f P
o f o r d e r r.
D--~ := D r ( P ) = lim sup ~- l o g e n , r is called the u p p e r q u a n t i z a t i o n d i m e n s i o n o f
P o f o r d e r r.
If the two numbers D~ and Dr agree then their common value is denoted by Dr
(= Dr(P)) and called the q u a n t i z a t i o n d i m e n s i o n o f P o f o r d e r r.

11.2 R e m a r k
and Dr do not depend on the underlying norm. D__ooand Doo depend only on
the support of P. Using the definition of e,~,oo(K) in (10.3) we also define D__oo(K),
D--~(K), and Doo(K) for an arbitrary (nonempty) compact set K C R d.
156 III. Asymptotic quantization for singular probability distributions

11.3 Proposition
(a) If O <_ t < ~ < s then

lim ned,r.. = +oo and lim inf ne~ r = 0

(b) IfO <_ t < Dr < s then

l i m s u p n e ~ r = + o o and lim ne~, r = O.

Proof
Let us first prove (a). If e~,r = 0 for some n E N then D_~ = 0 and (a) is obvious.
Suppose en,r > 0 for all n E N. For 0 _< t < D_~ choose t' E (t, D__~). Then there exists
an no E N with
en,r < 1 and log n > t'
- log en,~
for all n _> no. This implies
ne~, r > 1
and, hence
t-t'
ll.etn,r > en,r
for all n _> no. Since lim en,r = 0 we deduce
n--+O0

lira ne~# = +c~.

For D_~ < s there is an s' E (Dr, s) and a subsequence (enk,r) of (en,r) with

enk,r<land log nk _<s'.


- l o g en~,r

for all k E N. This implies


st
nkenk,r <_ 1
and, hence
nkeSk,r --~ es-S'n~,r

Since lim eukr = 0 this leads to


rb--~oO

l i m i n f n e ~ r < h" m nke-s = O.

P a r t (b) can be proved in a similar way. []

11.4 C o r o l l a r y

__ m
(a) If l < r < s < oo then D_~ ~_ ~ and Dr ~_ Ds.
11. The quantization dimension 157

(b) I f D e (0, +oo) is such that


0 < liminfneDr < limsupneDr < + o o
~-+00 ~-+OO

then Dr = D.

(c) Let 1 < r < c¢ and suppose E(llZll r÷~) < + c ¢ for s o m e ~ > 0 then Dr < d. I f
the absolutely continuous p a r t P~ of P does not vanish then Dr = d.

(d) Let r = oo. Then -Doo < d. /fAd(supp(P)) > 0 then Doo = d.

Proof
(a) follows from L e m m a 10.1 (a) and Proposition 11.3
(b) follows immediately from Proposition 11.3
(c) and (d) follow from Proposition 11.3, Theorem 6.2, (10.8), and the fact that
ne,,oo(P) d > Ad(supp(P))/Ad(B(O, 1)).
[]

11.2 Comparison to the Hausdorff dimension

Next we will investigate the connection of the quantization dimension to other types of
dimension. Let us first consider the relationship of D_~(P) to the Hausdorff dimension
of the support of P.
For a set A C R ~ and e > 0 an e-cover of A is a cover of A by sets U~ each of diameter
at most c, i. e., diam(Ui) -- sup{Hx - YH: x, y E A} < e. For s >_ 0 let

~/~(8A ) = inf{ E diam(U~)8 : (Ui)~el is an e-cover of A}


iEI

Then ~/~ (A) = lira 7/~ (A) is the s - d i m e n s i o n a l H a u s d o r f f m e a s u r e of A. It is easy


¢--#0
to check that ?-/S(A) is non-increasing with s and that 7/t(A) > 0 implies ~/S(A) = c¢
for all s < t. The H a u s d o r f f d i m e n s i o n of A is defined as
dimH(A) = sup{s > 0: 7-/~(A) = oo}
= inf{s >_ 0: ~/~(A) = 0}.
While the definition of the Hausdorff measure depends on the underlying norm the
Hausdorff dimension has the same value for all norms on R ~.
11.5 P r o p o s i t i o n
Let K C R d be compact and let P be any probability measure with supp(P) = K.
Then, for every t > O,
(11.1) 7-lt(K) < 2 t liminf ne,~,oo(P) t,

in particular
dimH(K) _< D__oo(P).
158 III. Asymptotic quantization for singular probability distributions

Proof
Let an E C~,oo. Then (B(a, e~,oo))aca~ is a cover of K. Let c > 0 be arbitrary. Since
(e~,oo)neN converges to 0 there is an ne E N with en,oo _< ~ for all n ___n~. This implies

?-/t6(g ) < inf ~ diam(B(a, en,oo))t


aE~n

< inf n(2e,~oo)t

By letting ¢ tend to 0 we obtain

~/t(K) < 2t lim infne~ oo.

The remaining claim in the proposition follows from Proposition 11.3. []


The H a u s d o r f f d i m e n s i o n o f a (probability) m e a s u r e P is defined to be

(11.2) dimn(P) = inf{dimH(A): A e B(Rd),P(RU\A) = 0}.

11.6 T h e o r e m
For all r > 1,
dimH(P) < D_Q,(P).
Proof.
The proof will be given in Corollary 12.16. []

Since dimH(P) < d, the above inequality can be strict (see Exmaple 6.4). In
the case that r = 2 and supp(P) is compact the above theorem was proved by
PStzelberger(1998a).

11.3 Comparison to the box dimension


m
Now we will consider the relationship between D__oo(P), Doo(P) and the upper and
lower box dimension of supp(P).
For our purposes the box dimension (entropy dimension, Minkowski dimension) of a
compact subset K of R d is most conveniently defined in the following way:
For s > 0 let
N(e) = N(e, K) = min{n E N: e,~,oo(K) < ~}.
Then

(11.3) dim B (K) = lim inf log N(¢)


~o - log¢

is called the lower b o x d i m e n s i o n of K and

(11.4) dimB(K) = lim sup log N(c)


e-~0 - l o g e
11. The quantization dimension 159

is called the u p p e r b o x d i m e n s i o n of K.
If dimB(K ) = dimB(K) this value is denoted by dimB(K) and called the b o x di-
m e n s i o n of K.
This definition suggests that there is a close relationship between D__~(K) and
dim.B(K ) and between D ~ ( K ) and dimB(K). We have the following result.

11.7 T h e o r e m
Let K C R d be compact. Then

(i) dimB(K ) = D__~(K),

(ii) dimB (K) -- Doo (K).

Proof
(i) To prove dimB(K ) < D__oo(K) let n > 1 be a natural number. Then

Nn := N(en,oo) <_ n and eN,,,oo = en,oo.

We deduce
log n log N~
D__oo(K) = lim inf > lim inf
n~¢~ - log en,oo - r , - * o o - log en,oo
>_ lira inf log N(e) = dimB(K).
6-*0 - loge

Next we show dimB(K ) > D_oo(K). For e > 0 the definition of N(e) implies

eN(6),oo _~ C,

hence
log n log N(e)
D__~(K) = liminf < liminf
n-*oo - log e~,oo - 6-*0 - log eg(e),oo
_< lim inf log Y ( e ) _ d i m B ( g ) .
6-,0 - log e

(ii) First we will prove that there is a k _> 1 such that

ekn,oo "( 1-e

for all n _> 1. For n _> 1 choose a E Cu,oo. Let fl C R d be of minimum cardinality
with
d~(x) _< 1
for all x E B(0, 1). Let k ----I~l. For y E R d, e > 0 and/~(y,e) = e~ + y we have

dz(u,6) (x) _< l e


160 III. Asymptotic quantization for singular probability distributions

for all x E B(y, ~). Let (~' = U / 3 ( y , emoo).


yea
T h e n ]a' I < kn a n d for every x E K there is a y E a with IIx - yI] < e~,oo a n d hence
a z E fl(y, e,~,oo) with
l l x - zll <__ ~ r%oo"

T h u s we o b t a i n d(x, o~') < ½e,~,oo.


T h i s implies
ek.,oo < sup d.,(x) < ½e.,=.
xEK
Next we will show t h a t , for n E N with e~,oo > 0, we have

(11.5) N(e,~,~) <_ n < kN(e~,~).

Let N,~ = N(e,~,~). T h e n N,~ _< n and since eu,,~ = e.,oo > 0, we have
1
CkNn,oo ~-- ~eN,,oo < Vn,oo.
T h i s implies n < kN,~.

Using (11.5) a n d lim en,oo = 0 we get


rt--+oo

d i m B ( K ) = lim sup log N(¢) > lim sup log_,e~,~,N(~


c-~0 - log e - ~ - ~ - log en,~

log I n
> l i m sup
- ~-~o0 - log en,c,~
[ -logk logn ]
= l i m s u p [--]-og--'~n,~ ~ - log en,~ J
= Doo(K).

To prove the converse inequality observe t h a t , for small ~ > 0,

eN(v)_l,oo > E.

T h i s leads to

dimB ( K ) = lim sup log N ( e )


c~0 - log e
log N ( e )
_< lim sup
~0 - log eN(~)-l,~
log(N(e) - 1)
logU(e)
= lim sup
~-~0 - log eN(~)-l,oo log(N(e) - 1)
= l i m s u p log(N(e) - 1)
~-~o -- logeN(~)-l,oo
< Doo

a n d completes the proof of the theorem. []


11. The quantization dimension 161

11.8 Corollary
Let K c R ~ be compact. I f the box dimension of K exists then the quantization
dimension of K of order co also exists and equals the box dimension.
11.9 Proposition
Let P be a probability on R d with compact support K. Then, for 1 < r < s < co

D---~(P) <_-D,(P) 5 -Doo(P) = -D~o(K) = d i m B ( K )


and
D_~(P) _< D__~(P) _< D__oo(P) = D
D_oo(K) = dimB(K).

Proof
The result follows immediately from Corollary 11.4 and Theorem 11.7. []

11.4 Comparison to the rate distortion dimension

T. K a w a b a t a and A. Dembo (1994) introduced the concept of rate distortion di-


mension for probability distributions. They showed t h a t for norms on R d the rate
distortion dimension is the same as R~nyi's information dimension• Here we will
compare the quantization dimension to the rate distortion dimension•
Let us recall its definition• For x = 0 set x log(x) -- 0. For a probability Q on the
Borel a-field of R d x R a denote by Q1 and Q2 the marginals on the first a n d second
component, respectively. If P is a probability on R d and Q is a probability on R d x R d
with P = Q1 then the a v e r a g e m u t u a l i n f o r m a t i o n I(P, Q) of Q is equal to

f h(x, y) log h(x, Y) dQ, ® Q2(x, y)

if Q is absolutely continuous with respect to Q1 @ Q2 and h is the corresponding


R a d o n - N i k o d y m derivative and equal to co otherwise.
Let 1 < r < co. The r a t e d i s t o r t i o n f u n c t i o n o f o r d e r r, Rp,r : (0, q-OO) --+ R, is
defined by

Rp, r(t) = inf{I(P,Q): Q probability on R d x ~ with Q, = P and

]Ix - yH~dQ(x, y) <_ t}.

The u p p e r r a t e d i s t o r t i o n d i m e n s i o n (of order r) of P is defined to be

• R p,(e r )
dima(P)=llmsup "'" .
e-.0 - log
The l o w e r r a t e d i s t o r t i o n d i m e n s i o n (of order r) of P is

. . l m RP'~(¢r)
d i m R ( P ) = .i ,. m
E~o - loge
162 III. Asymptotic quantization for singular probability distributions

and, if the two values agree, it is called the r a t e d i s t o r t i o n d i m e n s i o n (of order r)


of P and denoted by dimn(P).
It is shown in Kawabata and Dembo (1994, Proposition 3.3) that the (upper, re-
spectively lower) rate distortion dimension does not depend on r and equals the cor-
responding (upper, respectively lower) information dimension introduced by R~nyi
(1959).

11.10 T h e o r e m
Ill < r < oo then
dimn(P ) _< D__~(P).

Proof
Let c > 0 be given. Let n E Nsatisfye~,r <_ e. Let f : R d - + R d be an n-optimal
quantizer of order r and Q the image of P on R d × R d under the map x -+ (x, f(x)).
ThenQl=P, Q2=P/=Pof-land

: f - S( )II dP(z)=f II - yl[~ dQ(x,y) <_e r .

Set a = f ( R d) and define h: R d x R d --~ R by

h(x,y) J'o, y¢
~ l 1{ I = y } ( x ), y E c~.

For a Borel set A C R d x R d we obtain

h(x, y) dQ1 ® Q2(x, y) = / / 1 A ( X , y)h(x, y) dQ2(y) dP(x)


A

= / ~ P ( f = a)lA(X, a)h(x, a) dR(x)


aE O~

= ~ / V ( f =--a)IA(X' a ) p ( f L a) ~l{f=a'(x) alP(x)


aC~

{f= }
= ~ - ~ P ( ( x : f(x) = a and (x, f(x)) e A})
aE¢~

= P({x: (x,f(x)) e A})


= Q(A).

Thus Q is absolutely continuous with respect to Q1 ® Q2 and h is the corresponding


11. The quantization dimension 163

Radon-Nikodym derivative. By the definition of Rp,~ we get

R~,.(c')< s(f, Q)
= f h(x, y)log h(x, y) dQ, ® Q2(x, y)

= f log h(x, y) dQ(x, y)

= f logh(x, f(x)) dR(x)

=a~ea i logh(x,a)dP(x)
{f= }

--~-~"f log~dP(x)
= - ~_i P(I = a) log P ( f = a)
aE~
___ log ]~].

Since f is n-optimal and a = f(R d) we know that [a] = n (cf. Theorem 4.1). Thus
we have shown that
e~,r < e implies Rp,r(er) <_log n.
Now let ne C N be the smallest natural number with en,,r <_ e. Then we get

R~,~(Yn~,.) < logn~


hence
Rp,r(V,~.,r) < logn~
- l o g e~,,~ - - l o g e~,,r"

This implies
liminf RP'~(vr) <_lim inf logns
e-+0 - l o g e s~0 - l o g e ~ , ~ "
With e~ = e~,r we know that n~. _< n and e. .... = e~,r and get

liminI logn~ <liminf log n _D_~(p).


c-~oo - log e,~,r - n-*o. - log e~,r

Thus the theorem is proved. []

11.11 Remark
It follows from the preceding theorem that, for 1 < r < cx),

d i m n ( P ) _< D , ( P ) _< D__~(P) _< D ~ ( P ) = dimB(supp(P)).

It remains an open question whether dimR(P) <_ Dl(P).


164 III. Asymptotic quantization for singular probabifity distributions

Notes

The concept of quantization dimension was introduced by Zador (1982). Hausdorff-


and box dimension are classical mathematical notions which play a central role in
fractal geometry. A good survey can be found in the books of Falconer (1985, 1990,
1997). The rate distortion dimension is introduced and thoroughly discussed by
Kawabata and Dembo (1994) where the identity with R~nyi's information dimension
is also pointed out (cf. R6nyi, 1959).
12. Regular sets and measures o f dimension D 165

12 Regular sets and measures of dimension D

The notion of regular sets and measure of dimension D is an obvious modification of


a concept of regularity used by David and Semmes (1993) and attributed to Ahlfors
by these authors. The class of regular sets contains, for instance, convex sets in R d,
surfaces of these sets, compact Cl-manifolds, and serf-similar sets (satisfying the open
set condition). Examples of regular measures of dimension D are certain measures
which are absolutely continuous with respect to the Hausdorff measure on a regular
set of dimension D. Here we give a detailed discussion of the asymptotic behaviour
of the quantization errors for regular measures of dimension D. In this section [[ [[ is
o

an arbitrary norm on R d and D is a nonnegative real number. By B(a, r) we denote


the open ball of center a and radius r.

12.1 Definition and examples

12.1 D e f i n i t i o n
Let # be a finite Borel measure on R d.

(a) # is called regular o f d i m e n s i o n D if # has compact support and satisfies


o

3 c > 0 3r0 > 0 V r • (0, r0) Va • supp(/~): ~r D < I~(B(a,r)) < cr D.

(b) M C R d is called r e g u l a r o f d i m e n s i o n D i f M is compact, 0 < 7 i V ( M ) < oo,


and the restriction 7i~)M = 7/D(" N M) of 7iD to M is a regular measure of
dimension D with support M.

12.2 R e m a r k
A set or a measure which is regular of dimension D in R ~ with one given norm is
also regular with dimension D in R d with any other norm. This follows from the
wellknown fact that any two norms on R d are equivalent, i.e., if [[ [[ and [[[ [][ are
norms on R d then there is a constant c > 0 with ~1]] ]]] -< ][ !t -< cl]l III- The notion of
regularity of dimension D remains unchanged if one uses closed balls instead of open
bails in the definition.

Next we will study the elementary properties of regular sets of dimension D.

12.3 L e m m a
Let # be a finite measure on •d such that there is a c > 0 and an ro > 0 with
0

Iz(B(a, r)) <_ cr ° for all a • supp(p) and all r • (0, ro). Then there is a d > 0 with

0
~ ( B ( a , r)) < c'r D for all a • R d and a / / r > 0.
166 III. Asymptotic quantization for singular probability distributions

Proof
o

First we will show t h a t there is a 5 > 0 with I~(B(a, r)) <_ cr D for all a E supp(#)
and all r > 0. To this end let a E supp(#) be a r b i t r a r y and define

----max(c, I~(Rd)roD).

If r E (0, ro) then by assumption we have


o
~(B(a, r)) < c r v < ~r ".

If r _> ro then
o
5r z) >_ #(R~)roDr D >_ # ( R d) > I~(B(a, r)).
We claim that, for a r b i t r a r y a E R d and r > 0,
o
I~(B(a, r) ) <_ 2DSrD = dr D.
o

If 0 < r < d(a, supp(#)) then #(B(a, r)) --- 0 and the claim is true.
If d(a, supp(#)) < r choose b E supp(/~) with [[a - bl[ ----d(u, supp(#)). Then we have
o o

# ( B ( a , r)) < #(B(b, r + lib - aiD)


_< ~(~ + lib - all) ~
_< e~D(1 + lib- all)D <_ 2Da~D.
r

[]
12.4 L e m m a
A finite union of regular sets of dimension D is regular of dimension D.

Proof
Let M b . . . , M~ C R d be regular of dimension D and M = M1 O . . . O Mn. Then M
is compact and we have

o < riD(M) <_~ n ° ( M ~ ) < oo.

Let c~ > 0, ri,0 > 0 be such t h a t

Y r E (0, ri,0) Va E U,: l r D <_ u D ( u i A B ( a , r ) ) <_ carD.


ca
By L e m m a 12.3 there is a constant d / > 0 with
o
7"lD(Mi M B(a, r) ) <_ d~rD
12. Regular sets and measures of dimension D 167

for all a • R d and all r > O. W i t h o u t loss of generality we may assume d/ > c+ Set
c = d1 + . . . + dn and r0 = min(r~,0,... , rn,o). It follows that for all a • M and all
r • (0, r0)
1 1 °
~ ___rain( , . . . , ± ) r ~ <_ n ~ ( i n B(a, ~))
en
n

<_ ~ U ~ ( M ~ nb(a,~))
i=,

-< ~ c~TD
i=1
< cr D.
[]

12.5 L e m m a
Let M C R u be compact. Then M is regular of dimension D if and only if every
point x of M has a regular neighbourhood of dimension D in M .

Proof

Since M is compact M can be covered by finitely many regular sets of dimension D


and the lemma follows from Lemma 12.4. []
12.6 L e m m a
Let M C ]~d be regular of dimension D, U C R d open with M C U, a n d g: U --~ R d
a bi-Lipschitz map, i.e., there is a constant d > 0 with ~

~ llx - yll -< IIg(x) - g(y)ll --- c'llx - yll


for all x, y • U.
Then g( M ) is regular of dimension D.

Proof

Obviously g(U) is open and g ( M ) is a compact subset of g(U). It follows (from


Falconer (1990, p. 28, 2.9)) that

0 < ( 1 ) D T t D ( M ) < 7-ID(g(M)) <_ c'DTtD(M) < co.

Let rl = min d(z,g(U)C). Then we have rl > 0. Let r0 > 0 and c > 0 be such that
zeg(M)

1--rD < ?'l°(M M B(a, r)) < cr D


C

for all a • M and r • (0, r0). Define r~ = min(r~, I r 0). For y • g(M) and r • (0, r0)
we obtain
o 1 o o
B(g-l(y), ~ r ) c g - l ( B ( y , r ) ) c B(g-'(~l,c'r)
168 III. Asymptotic quantization for singular probability distributions

and, hence,

1 1 D °
< n '(M n 1))
o

(12.1) < 7-lV(M n g-'(B(y, r)))


o
< 7-lD(M NB(g-l(y),c'r))
< c(c'r)
o o

Since g( M N g-l (B (y, r ) ) ) = g( M ) N B (y, r) an elementary property of Hausdorff


measures (see Falconer (1990, p. 28, 2.9)) implies

1 D o o
-~--57-l (M N g-l(B(y, r))) < 7-lO(g(M) n B(y, r))
(12.2)
<_ c'DnD(Mn r))).

Combining (12.1) and (12.2) yields

- r D ~_ 7-ID(g(M) n B(y, r)) ~_ ca'2DrD.


C

o
Thus ~/Ig(M) has g(M) as its support and g(M) is regular of dimension D. []

We will now give some examples of regular sets of dimension D.

12.7 E x a m p l e ( C o n v e x sets)
Let K C R d be a nonempty compact convex set. The dimension D of K is defined
as the dimension of the affine subspace of Ra spanned by K. We will show that K is
a regular set of dimension D.
Without loss of generality we may assume that K spans R ~ (otherwise we take the
affine subspace generated by K and transform it by an affine isometry onto some Rd).
In this situation D -- d, 7-/~ is just a non-zero multiple of the Lebesgue measure A~K,
and, obviously,
0 < Ad(K) < oo

since i n t K ¢ 0 (cf. Webster, 1994, p. 61, Theorem 2.3.1). By Remark 12.2 we may
assume that R d carries the/2-norm. To prove that K is regular of dimension d it is,
therefore, enough to show that A~K is regular of dimension d. Let r0 > 0 be arbitrary.
o o

The map K --+ R, x --+ Ad(K NB(x, ro) ) is continuous. Thus c -- min Aa(K nB(x, to))
xCK
o
exists. Since B(x, r0) N int K ~ 0 we know that c > 0. For 0 < t < 1 and x E K the
convexity of K yields
o o
x + t(B(O, ro) n (K - x)) C K n B(x, tro).
12. Regular sets and measures of dimension D 169

We deduce
o o

Aa(K M B(x, tro)) >_ taA~(B(O, ro) M (K - x))


o

= tUAd(B(x, ro) n K )
C d
> (tro) .

Thus we obtain, for r E (0, ro),


C d o o
< n _< Ad(B( ,
o
<_ Ad(B(O, 1))r d.

12.8 E x a m p l e ( S u r f a c e s o f c o n v e x sets)
Let K C R d be nonempty, convex, and compact. Let b d K denote the relative
b o u n d a r y of K , i.e., the boundary of K relative to the affine subspace spanned by
K . Let D + 1 be the dimension of K . We will show t h a t b d K is regular of dimension
D.
As above we may assume t h a t D + 1 = d and t h a t R d carries the/u-norm.
For x, y E R d the number (x, y) E R is the s t a n d a r d scalar product of x and y, i.e.,
i f x = ( X l , . . . ,xd) and y = (Yl,... ,Yd) then
d
(12.3) (x, y) = x,y .
4=1

Our claim is that, for every x E OK = b d K , there is an r > 0, a bi-Lipschitz m a p


f : B(x, r) -+ R d and a hyperplane H C R d with
f ( B ( x , r) M OK) = f ( B ( x , r)) M H.
o

Once this has been proved the argument is finished as follows. The set f ( B ( x , r)) is
open. Hence there is an s > 0 with B(f(x), s) C f ( B ( x , r ) ) . By Example 12.7 the
o

set B ( f ( x ) , s) A H is regular of dimension d - 1. The m a p g = f - 1 from f ( B ( x , r))


o

to B(x, r) maps B ( f ( x ) , s) M H onto f - l ( B ( f ( x ) , s)) M OK. Since g is bi-Lipschitz


L e m m a 12.6 implies t h a t f - 1 (B(f(x), s)MOK is regular of dimension d - 1 . Moreover,
f - l ( B ( f ( x ) , s)) AOK is a neighbourhood o f x in OK. By L e m m a 12.5 OK is a regular
set of dimension d - 1.
To prove the claim let x E OK be arbitrary. Then there exists a 5 > 0 and a unit vector
u E R d with x + 5u E int K . Moreover there is an r > 0 with B(x + 5u, r) C int K .
S i n c e x ~ i n t g we know t h a t r < 5. Define ~ : B ( x , r ) - - + R b y W(y) = min{t E
R: y + tu E K}. Then ~ is well defined since for an a r b i t r a r y y E B(x, r) we have
y + hu E B(x + hu, r) C i n t K .
For y E OK the point y + 0 • u belongs to K so t h a t ~(y) < 0. Since the open line
segment between y + W(y)u and y + 5u lies in int K the fact t h a t y belongs to OK
170 111. Asymptotic quantization for singular probabifity distributions

implies ~(y) _> O. Thus y • OK yields ~(y) = O. Obviously every y • B(x, r) with
~(y) --- 0 belongs to OK.
Next we will show t h a t ~, is convex. Let y, y' • B(x, r) and t • [0, 1] be arbitrary.
Then ty + (1 - t)y' + (t~(y) + (1 - t)~o(y'))u = t(y + ~o(y)u) + (1 - t)(y' + ~o(y')u) • K,
hence
~o(ty + (1 - t)y') <_ t~o(y) + (1 - t)~(y').
A convex function on an open set is locally Lipschitz (see Webster, 1994, p. 224/225,
proof of Theorem 5.51). Thus, by making r a bit smaller if necessary we may assume
t h a t p is Lipschitz, i.e., there is a c > 0 with ]~o(y) - ~(y~)] _< c]]y - y']] for all
y,y' E B ( x , r ) . Define H = {y • Rd: (y,u) = 0} and let Ptt be the orthoghonal
projection onto H. Define f : B ( x , r ) -+ R ~ by f ( y ) = PH(Y) -- ~O(y)u. Then f is a
Lipschitz map, since

I]/(u) - I(y')]i -< liP.(u) - P.(y')[] + I~(y) - ~o(y')l]


(12.4) -< Ily - u'll + oily - y'll
< (1 + c)lly - y'li

Now we will show t h a t there is a constant d > 0 such t h a t

(12.5) [if(v) -/(u')tl >_ ¢lly - v']l

for all y, y' • B(x, r).


Let y,y' • B ( x , r ) be arbitrary, set t = (x,u) and define z = PH(Y) + tu, Z' =
PH(Y') + tu. Since u is orthogonal to H we deduce

IIz - ~fl 2 = l I P ~ ( z - x)II ~ + lltu - (x, u ) ~ l l ~


= iiP-(~) - P~(~)II ~.

Since PH(Z) = PH(Y) we get

IIz - xJI ~ _< Ily - xJJ~ -< r ~,

hence z e B(x, r).


Similarly, z ~ C B(x, r).
By the definition of ~, we have

~o(z) = min{s • R: z + su • K}
= min{s • R: PH(Y) + tu + su • K }
= min{s • R: PH(Y) + (Y, u)u + (t + s -- (y, u))u ~_ K}.

Since y = PH(Y) + (Y, u)u we obtain

~o(z) = min{s • R: y + (t + s - (y, u)) • K }


(12.6)
= (y, u) - t + ~(y)
12. Regular sets and measures of dimension D 1 71

For the same reason


(12.7) ~ ( z ' ) = (y', u) - t + ~ ( y ' ) .
Using (12.4), (12.6) and (12.7) and the definition of z and z' leads to
I~(y) - ~ ( y ' ) i = I ~ ( z ) + t - (y, ~) - ~ ( ~ ) - t + (~', ~)I
> I<y-y',u>l- (l+c)llz-z'll
--I(y- y', u)l- (1 + c)IIPH(Y) -- P-(Y')II.
If I(Y- Y', u)l > 20 + c)IIPH(Y) -- PH(Y')II then

I~(Y) - ~(Y')I > ~I(Y


L
- Y',")I
hence
tly - y'll ~ = I I P . y - P . ~ ' t l ~ + I(y - y', u)l 2
(12.8) < 4 1 1 P . y - P H y ' l l ~ + 41~(y) - ~ ( ~ ' ) l ~
= 4Ill(y) - f(y')[I 2

If I(Y - Y',u)I < 2(1 + c)IIPHU -- PHY'II then


Ily - y'll 2 = I I P . y - P ~ y ' l l 2 + I ( y - y', u)l 2
(12.9) _< (4(1 + c) 2 + 1)IIPHY - PHY'II 2
< 4((1 + c) 2 + 1 ) I l l ( y ) - f(y')lL 2
Combining (12.8) and (12.9) yields (12.5), so that f is bi-Lipschitz.

Moreover we have
(12.10) f ( B ( x , r) n OK)) = g A f ( B ( x , r) ).
This can be seen as follows:
If y 6- OK n B(x, r) then ~(y) -- 0 and, hence,
f(Y) = PB(Y) C H N f ( B ( x , r ) )
If y 6- B(z, r) and f(y) 6_ H then ~(y) = 0, hence, y 6_ OK N B(x, r).

12.9 E x a m p l e ( C o m p a c t d i f f e r e n t i a b l e m a n i f o l d s )
Let D 6_ {1,... ,d - 1} and M be a compact D-dimensional CLsubmanifold of R a.
Then M is regular of dimension D.
To prove this we will show that every point in M has a regular neighbourhood of
dimension D in M. The claim then follows from Lemma 12.5. Let x 6_ M be
arbitrary. By Federer (1969, p. 231, 3.1.19) there exists an open neighbourhood U
of x, an open set V in R d, a D-dimensional vector subspace W of R d, and a C 1-
diffeomorphism f : U --~ V with f ( M n U ) = W N V . Let r > 0 be such that
B ( f ( x ) , r) (:7_Y. By Example 12.7 the set B ( f ( x ) , r) N W is regular of dimension
D, g = f - 1 is bi-Lipschitz and maps B ( f ( x ) , r ) [7 W onto f - l ( B ( f ( x ) , r ) ) t3 M. By
Lemma 12.6 f - l ( B ( f ( x ) , r))NM is regular of dimension D. Since f-1 (B(f(x), r))NM
is obviously a neighbourhood of x in M the argument is finished.
172 III. Asymptotic quantization for singular probability distributions

12.10 E x a m p l e ( S e l f - s i m i l a r sets)
Let S 1 , • . . , SN : R d ---> R d be contracting similarity transformations with scaling num-
bers s l , . . . ,Sg E (0,1), i.e., for i E { 1 , . . . , N } and allx, y E R d,

llS~z - S~yll = s~ll~ - yll.

It was shown by Hutchinson (1981) that there is a unique non-empty compact set A
in •a with
A = SI(A) U . . . U S N ( A ) .
This set is called the self-similar set corresponding to $ 1 , . . . , SN. It is easy to see
that there is a unique real number D > 0 with
N
~--~8D : i,
i=l

the similaritydimension of ($I,... ,SN).


(S1,... , SN) is said to satisfy the open set condition (OSC) if there exists a nonempty
open set U C R d with Si(V) C V and Si(U) M Sj(U) -- 0 for i ¢ j, i, j = 1 , . . . , g .
Recovering an older result of Moran (1946), Hutchinson (1981, Theorem (1)) proved
that
0 < 7-l°(A) < oo
if ( $ I , . . . , SN) satisfies the open set condition. Hutchinson (1981, p. 737/738) also
showed that there are constants cl, c2 > 0 and r0 > 0 with
o
elf D <_ 7-lD(A M B ( x , r)) <_ c2r D

for all x E A and all r E (0, to).


In our language this means that A is regular of dimension D.
Classical examples of self-similar sets satisfying the OSC are the Cantor set, the Sier-
pinski gasket, and the von Koch curve. The C a n t o r set is the self-similar set corre-
sponding t o t h e two contractions $1, $ 2 : / R --+/R with Sl(X) = I x a n d S 2 ( x ) = ~x+~.l 2
The S i e r p i n s k i g a s k e t corresponds to the three contractions $1, $2, $ 3 : / R 2 --+/R 2
with S i x = ½x, S2x = ~x 1 + (3,
1 0) and S3x = ~x1 + (-,, 1 x~_
4 )" The v o n K o c h c u r v e is
generated by the four contractions $1, $2, $3, $ 4 : / R ~ - + / R 2 defined by Sl(x) = Ix,
S2(z) = I D ~ x + ( ~1 , 0) S3(x) = ~1 D _ ~ z + (~, 1 ~ v ~ ) , S3(x) -- ~zl + ( 2~ , 0) , where D e
is the counter-clockwise rotation about the origin with angle ~. By Hutchinson's
result these sets are regular of dimension D, for d -- ~l o g 3 ' D --- ~log2 ~ and D = t°-aA log3
respectively. For more information about these and other self-similar sets we refer
the reader to the book of Falconer (1990).

Before we deal with the quantization of regular measures of dimension D let us men-
tion the following characterization of these measures.

12.11 P r o p o s i t i o n
Let # be a finite measure on R d with compact support K . Then the following prop-
erties are equivalent:
12. Regular sets and measures of dimension D 173

(i) # is regu/ar of dimension D.


v < # < cT-t~.
(ii) K is regular of dimension D and there exists a c > 0 with -J~lg

Proof
That (i) implies (ii) is an immediate consequence of a proposition in Falconer(1990,
Proposition 4.9) and the definition of regularity of dimension D.
The converse implication (ii) =~ (i) follows from the definition of regularity of dimen-
sion D. []

12.2 Asymptotics for the quantization error

12.12 P r o p o s i t i o n
Let P be a probability on R d. Assume that there is a c > 0 and an ro > 0 such that
o

(12.11) P ( S ( a , r ) ) < cr D

for all a E supp(P) and all r E (0, ro).


Then there exists a constant d > 0 with

(12.12) f Ilx - all dP(z) > dP(B)l+-~


B

t'or a/l a E R a and all Borel sets B c R a.

Proof
o

By Lemma 12.3 there is a 5 > 0 with P ( B ( a , r)) < 5r D for all a ~ R a and all r > 0.
Let a E R a and let B be a Borel subset of R a. If P ( B ) = 0 then the conclusion
(12.12) obviously holds. Let us assume P ( B ) > 0 and set
o

rB = inf{r > 0: P ( B ( a , r ) ) >_ ½P(B)}.


O o

Since lira P ( S ( a , r)) = 1 >_ P ( B ) there is an r > 0 with P ( B ( a , r)) >_ ½P(B). Hence,
r --+OO

r B < 00. For r > r n we have


o
& o >_ P ( B ( a , r)) >_ ½P(B)

which implies

(12.13) 5r~ >_ 1 p ( B ) .

For r < rB we have


o

P ( S ( a , r)) < 1 p ( B )
174 III. Asymptotic quantization for singular probability distributions

Since P ( B ) > 0 there is an r > 0 with


0

P(B(a, r)) ~_ 5r D < ½P(B).

If (r,~),~eN is any increasing sequence with rn < rB and lim r,~ ----rB we deduce from
o o

B(a, rB) = [_J B(a, r~) that


nCN
o o

(12.14) P ( B ( a , rB)) = lira P(B(a, rn)) <_ P(B)

Using (12.14) and (12.13) we get

f Ilx-alldP(x)>- f IIx - all dP(x)


B o
B\B(a,rB)
o

>_ rBP(B\B(a, rB))


o

> rB(P(B) - P(B(a, rB)))


>_ l r B g ( g )

> l (~-~)b(P(B))l+v.

[]
12.13 C o r o l l a r y
Let P be a probability on R d. Then the following conditions axe equivalent:

(i) There exists a c > 0 with


o

P(B(a, r)) < cr D


for all r > 0 and all a E R d.
Oi) There exists a d > 0 with

o
/ llx - all dP(x) >_ctP(B(a, T)) l+b
B(a,r)

for all r > 0 and all a E R d.

(iii) There exists a d' > 0 with

f l]x - a]] dP(x) > c l t p ( B ) l + ~


B
for all a E R d and all Borel sets B C R d.
12. Regular sets and measures o f dimension D 1 75

Proof
That (i) implies (iii) is Proposition 12.12 and that (iii) implies (ii) follows by setting
o
B = B(a, r). It remains to show that (ii) implies (i). Obviously, (ii) yields

rP(B(a,r)) > f IIx - all d P ( x ) > c ' P ( B ( a , r ) ) 1+-~


o
B(a,r)

and, hence,
o
1 D_D
(~) "1 >_P(B(a,r)).
[]

12.14 C o r o l l a r y
Let P be a probability on R d. Suppose that there is a c > 0 and an ro > 0 with
o
P ( B ( a , r)) < cr o

for all a E supp(P) and a/l r C (0, r0). Then there is a c o n s t a n t b > 0 such that, for
every e - p ~ k i n g {B1,... ,B~} in R d with P(Rd\ ~ BO = 0 o~d an ~ , , . . . , ~ e R d
i=1

Proof
Without loss of generality D > 0. Set p = 1 + -~ and q = 1 + D. Then we have
~+~1t = 1 and p > 1. H61der's inequality yields
D

S := ~ 1 IIx - adl d P ( x )

= n~ IIx - a~ll d P ( x

This implies

Sq < n x - aill d#(x


\i:1~,
176 III. Asymptotic quantization t'or singular probability distributions

Using Proposition 12.12 we get for the constant c' > 0 of that proposition
D

S = ~ [Ix - a i l [ d f ( x > c')D--O~P(Bi) = (c') ~-~.

Thus, the corollary holds, if we set b = (6J)D. []

12.15 P r o p o s i t i o n
Let P be a probability on R d. Suppose that there are constants c > 0 and ro > 0
with
o
P ( B ( a , r)) <_ cr D
for every a e supp(P) and every r e (0, r0). Then

> o.

Proof
Let am E C~,1 and let {Aa: a E am} be a Voronoi partition o f R d with respect to am.
By Corollary 12.14 we have

ne~l = n ~ Ilx- all dP(x) > b > O,


aE a n A a

where b is as in Corollary 12.14. This implies the proposition. []

12.16 C o r o l l a r y
Let P be a probability on R d. Then

dimH(P) < D.r(P )

for aIl r > l.

Proof
By Corollary 11.4 it suffices to prove dimH(P) _< D__t(P). If dimH(P) = 0 then there
is nothing to show. So let dimH(P) > 0 and let t with 0 < t < dimH(P) be arbitrary.
By Falconer (1997, Prop. 10.3) we have
o

dim~(P) = inf{s E R: lim inf l ° g P ( B ( x ' r ) ) < s for P - a.e. x}.


r-~0 Io~ r -

This implies

P ( { x C Rd: lira inf l ° g P ( ' ~- ' ( x ' r ) ) > t}) > 0


r-.o log r
12. Regular sets and measures of dimension D 177

and, hence,
o
P ( { x E Ru: 3r~ > 0 Vr _< rx: P ( B ( x , r ) ) < rt}) > O.

Thus, there exists a compact set K C R d with P ( K ) > 0 and an r0 > 0 such t h a t
o
P ( B ( x , r)) <_ r t

for all x E K and all r < r0.


Set Q = p---~K)PIK. Then it follows t h a t

o 1 t
Q ( S ( x , r ) ) <_ p----(~r

for all x E supp(Q) and all r _< r0 and Proposition 12.15 yields

lim infneu I ( Q ) t > O.

Since en,i(Q) _< p---(~K)en,l(P)this leads to

lira infne~ I ( P ) t > 0.

Using Proposition 11.3 we deduce


t <_ _D~.
Since t < dimH(P) was a r b i t r a r y this implies

dimu(P) < D,.

[]

12.17 P r o p o s i t i o n
Let P be a probabifity on R ~ with compact support K . Assume that there is a c > 0
and an ro > 0 with
o

crv <_ P ( B ( a , r))

for every a E K and every r E (0, r0).


Then
lim supne,~,oo(P) D < oo.
n--~oo

Proof
If there is an no C N with e~o,oo = 0 then en,oo -- 0 for all n ~ no and the assertion
of the proposition is obvious. So let us assume t h a t e,~,o¢ > 0 for all n E N. Since
K is compact there exists a finite set an C K of m a x i m u m cardinality satisfying
[Ix - y[[ >_ eu,oo for all x, y E ~ with x ¢ y.
1 78 IlL Asymptotic quantization for singular probability distributions

We will show that [an[ > n. Assume the contrary. Then we know that

e,~,oo _< sup dam (x).


xEK

Hence there exists a y E K with [[y - all > en,o~ for all a E an, which contradicts the
maximality of an.
For x, y ~ au with x ¢ y we have
0 0

B(x, ½e,~,~) M B(y, ½e,~,oo)= I~


hence
1 = P(K) > P (u ° ½e,~,oo
a(a, )) = z P(B(a, ~e~,oo)).
° '
lim( X ) en,~
Due to I'~---~ = 0 there is an nt E N with

1
en,oo ~ r0

for all n > nl. Thus, for n > nl,

1 D
1 _> E e(2en,°°)
aC ¢~n

-- el , l (2 en,oo) v > c %00

and, therefore,
2D
neDoo ~ - -
-- C "

This proves the proposition. []


Now we can formulate and prove the main result concerning the asymptotics of quan-
tization errors for regular probabilities.

12.18 T h e o r e m
Let P be a regular probability of dimension D on R ~. Then, for 1 < r < o%

(12.15) 0 < liminfne,~r(P) ° < limsupnen,r(P) D < oo.


n-+oo ~ -- n---~ o o

In particular the quantization dimension D r ( P ) agrees with D which is also the


Hausdorff dimension of the support of P.

Proof
The inequality (12.15) follows immediately from Propositions 12.15, 12.17 and Lemma
10.1 (a). The remaining statements follow from Corollary 11.4 and Proposition 12.11.
[]
12. Regular sets and measures of dimension D 179

12.19 R e m a r k
It remains an open question for which regular probabilities P of dimension D the
limit
lim ne D
~--+OO Dr

exists in (0, oo). Recall from (6.4) that in this situation, for 1 _< r < oo,

Qr(P) = ,~-~o~limn-~V,~,r(P) = ( l i m ne~r(P) ) f~

is called the r-th quantization coefficient of P. It follows from Theorem 6.2 that
for the normalized volume measure P of a convex compact set the r-th quantization
coefficient exists. We conjecture that the same is true for the normalized surface mea-
sures on convex compact sets and compact C1-manifolds. For the natural Hausdorff
measure on a self-similar set the quantization coefficients exist in some cases while
in other cases (like the classical Cantor set) they need not exist. We will discuss
measures on self-similar sets in Section 14.

Notes

The concept of regularity for sets and measures can be found in several books on geo-
metric measure theory (see, for instance, David and Semmes (1993, 1997) and Mattila
(1995)). Since there are several different notions of regularity for sets and measures
the above regularity is sometimes called Ahlfors-David regularity (see Mattila, 1995,
p. 92). The elementary results on regular sets of dimension D (Lemma 12.4-12.6)
and the results concerning the regularity of convex sets and their boundaries as well
as that of compact Cl-manifolds are probably well-known. We just could not find an
explicit reference. To our knowledge the results concerning the quantization of regular
sets and measures of dimension D as stated above are new. A good introduction to
the theory of convex sets is Webster (1994). The basic theory concerning self-similar
sets as well as many examples can be found in Barnsley (1988). For the canonical
normalized Hausdorff measure P on a self-similar set with OSC the inequalities in
(12.15) were first proved in Graf and Luschgy (1996). After this book had essentially
been finished PStzelberger (1998a) gave different conditions for a probability P to en-
sure that 0 < liminf,~oone,~,2(P) D or limsup,~_~oone~2(P) D < oo or lim ne,~2(P) D
exists (for the/s-norm), where D is suitably chosen.
180 IlL Asymptotic quantization for singular probability distributions

13 Rectifiable curves

Here we consider the length measures on rectifiable curves. These measures can be
obtained by restricting the one-dimensional Hausdorff measure to the given rectifiable
curve. In this way we get an elementary class of singular measures of quantization
dimension 1 for which the quantization coefficients exist and will be calculated. Nev-
ertheless there are simple examples that show that the length measure on a rectifiable
curve need not be regular of dimension 1 (see below).

In this section [I [[ will always denote an euclidean norm on R a. First we wilt collect
some basic results about rectifiable curves.
13.1 D e f i n i t i o n
Let a, b E R with a < b. A c u r v e (more exactly, a Jordan curve) F is the image of a
continuous injection 3`: [a, b] -+ R d. 3' is called a p a r a m e t r i z a t i o n of F. A curve is
called r e c t i f i a b l e if

L=L(F)--sup 3,(ti)-3,(ti-1)[[:nEN, a=t0<...<t~--b <oo.

L is called the l e n g t h o f t h e c u r v e F.

13.2 L e m m a
I f F c R d is a curve then 7-/1(F) -- L(F), in particular L(F) does not depend on the
parametrization 3`.

Proof
See Falconer (1985, p. 29, L e m m a 3.2). []

13.3 D e f i n i t i o n
Let F be a rectifiable curve of length L. A continuous injection 3,: [0, L] --~ F is called
a p a r a m e t r i z a t i o n b y a r c l e n g t h ff L(3`([0, t]) = t for all t E [0, L].

13.4 R e m a r k
Every rectifiable curve admits a parametrization by arc length (see Falconer, 1985,
p. 29).

13.5 L e m m a
Let F be a rectifiable curve of length L and 3`: [0, L] -~ F a parametrization by arc
length. Let # be 1-dimensionaI Lebesgue measure restricted to [0, L]. Then

(13.1) IJ3,(t)- 3,(s)l[ _< [ t - s l foralls, t e [O,L] and 7-/~r = # o 3 , -t.

Proof
The lemma follows immediately from Falconer (1985, p. 29, (3.2) and p. 30, Corollary
3.3). []
13. Rectiaable curves 181

13.6 Lemma
Let F be a rectifiable curve, x E F and r E (0, ½diam F). Then
o
(13.2) 7/1 (F M B(x, r)) > r.

Proof
This follows from Falconer (1985, p. 30, L e m m a 3.4). []
Inequality (13.2) is one half of the condition for regularity of dimension 1 of a recti-
fiable curve. That, in general, a rectifiable curve need not be regular of dimension 1
is shown by the following example.

13.7 E x a m p l e
Let m E N, m > 1. Then there exist a continuous injection %~: [m---~' ~] --+ R2 with
~m(~) =
(~--~,o),
, ~
(~)
1
=(~,1 0 ), ~-r-1<ll%.(t)ll<'forallte~ [~-¥r,~),and
1 1

lm:=L(Tm([rnll,1])) :max( 1
m+l'v~
Define 7: [0,i] --+R ~ by

{°,
7(t)= %~(t),
t=0
1
if m E N with t E [~--~, 1
~]

Then F = 7([0, 1]) is a rectifiable curve since


oo

L(r) < ~ lm< ~.

Moreover, for m > 2,


oo

7/I(B(0, )NF) = Ik
k=rn

Since there exists an m0 E N with


i I i i
__>
v/m x/-m-~ 1 - m m+l

for m > m0, we have, for these m,


oo 1

k=m

so that
o 1
7-/1 (B(0, 1 ) n r ) -- v ~
182 III. Asymptotic quantization for singular probability distributions

This shows that


1 o
sup - U ' (B(0, r) FI F) -- q-oc
r>0 r

and F is not regular of dimension 1.

Before we come to the quantization of recitifiable curves we will prove a result con-
cerning the distance of the n-optimal set of centers of order r from the support of the
probability in question.

13.8 L e m m a
Let P be a probability on R d with compact support K . Let 1 < r < 0o and let ~
be an n-optimal set of centers for P of order r. Define

5~ = ma~ max
aCa,~ xC W(al(xn)g)K
IIx- all = IId.ollo~

Then
o ~ ~
(13.3) 5..~2minP(B(Xx,eK 2 ))" -< e,~,,(P).

Proof
Let a E a~ satisfy 5u = max ]]x - all. Then there exists an x E W(a]a~) N K
xEW(ala~)nK
with 5~ = I]x - all. For every b E au we have

ilx - bll >_ IIx - all.


o
For every y E B(x, lSn) and every b E ~n this yields

1
Ib - bll _> IIx - bll - IIx - yll -> II~ - all - II~ - yll = ~ - IIx - yll -> ~ .
Using this inequality we deduce

cr,r = Vn,r = . [ da,(z)r d p ( z )

> / d.~(z)"dP(z)
~n~(~,½~)
_> (15n)rp( K A B(x,
o 1
-~,~))

and the lemma is proved. []

13.9 C o r o l l a r y
Let P be a probability on ]~d with compact support K . Let n ~ [K[, 1 <_ r < 0o and
let a,~ be an n-optimal set of centers for P of order r. Then, for every a E ~ ,
1 o 1 1
(13.4) -sd(a,K) m ~ P ( B ( y , x d ( a , K ) ) ) ; ~_ en,r(P).
yE Z
13. Recti~able curves 183

Proof
The corollary follows from L e m m a 13.8 if one observes that

5n >_ d(a, K )
for all a E am, since W(a[a~) n K ¢ O for all a E an by (4.1). []
First we give a quantization result for line segments. For x, y E R d let [x, Yl be the
line segment from x to y, i.e.

[x, y] = {(1 - t)x + ty: t • [0, 1]}.

It is a well-known fact t h a t for f : [0, 1] --+ R d with f(t) = (1 - t)x + t y we have


1
(13.5) IIz - yll rilE.,,
j l = u([0, l j/.l'

(see L e m m a (13.5)).
13.10 L e m m a
1 1
Let x, y E R d with x ~ y be given. Let P = lWz~7-/l[~,y]. Then, for 1 _< r < co and
n>_l,

(13.6) e,~,r(P) = \1 + r] 2n

Proof
By the remark preceding the l e m m a we have P = U([0, 1])I = U([0,1]) o f - 1 . Let
l<r<oo.

" < ' : For a E C~,r(U([0,1]) the set fl = f((~) has n points. Hence we deduce

y~,r(P) < f rain IIx - bll r d(P(x)


-- J be/~
= J ~min Ill(t) -- f(a)ll ~ dV([0, 1])(t)

Since [If(t) - f(a)l I = It - a I IIx - Yll we obtain

Vn,r(P) < IIx - yll r f min It - al r dU([0,1])(t)


J aEa
---- IIx - ylV'V,~,,.(U([O, 1]))
According to Example 5.5 we have
1
U~,~(U([0, 10) - - (1 + r)(2n) ~

This yields
184 III. Asymptotic quantization for singular probability distributions

">": Let fl E C~,r(P). Since supp(P) = [x,y] is convex and since the underlying
norm is euclidean Remark 4.6 yields fl C Ix, Y]. Let a C [0, 1] equal f - l ( f l ) .
Then a has n points and we get

V,~,r(P) = f min llx - bllr d(P(x)


J b E E '"
----/ ~ n HI(t) - f(a)J[ r dU([0, 1])(t)

= ilx - yll ~ / It - ai r dU([0, 1D(t)


> Ilx - yllrV,~,r(U([O, 1]))
1
= Ilx - yll"(1 + r)(2n) r

and, hence,

[]

Now we will give a first lower bound for the quantization errors for a normalized
one-dimensional Hausdorff measure on a rectifiable curve.
13.11 L e m m a
Let 7: [a, b] --~ R d be a continuous injection which is a parametrization of the rectifi-
i 1
able curve F with length L > O. Let P = -fl-l[r. Then, for 1 < r < c%

1 !
(13.7) ~ r, ( p ) >- ((r (a)ii

Proof
Let G = {(1 - t)'),(a) + tT(b): t E R} be the line through 7(a) and 7(b). Let PG be
the orthogonal projection onto G. By [7(a), 7(b)] = {(1 - tT(a) + tT(b) : t E [0, 1]} we
denote the line segment from 3'(a) to 7(b). Let Q denote the image of P with respect
to PG. First we will show that
(13.8) 1
~¢lp~(r) -< n~r o p51.
Let B be a Borel set in R a. Then using the fact that
IlPa(x) - Pa(y)ll ~ IIx - y[I
for all x, y c R d and Falconer (1985, p. 27, Proposition 2.2) we get
7 @ ( P a ~ ( B ) ) = 7tl(P51(B) Cl F)
> 7-ll(Pa(Pal(B) Cl F))
= n l ( B 17 P c ( r ) )
13. Rectifiable curves 185

Thus, (13.8) is proved.


Now let a e C~,~(P). Using (13.8), the fact that [7(a), 7(b)] C PG(F), and Lemma
13.10 we obtain

r
er~r = V,~r = / d(x, a) ~ dP(x)
F

= 1L / d(x, a) r dn~r(Z )
r

> Z1 / d(Pa(z), Pa(a)y dT/~r(x)


r
_1
-- L / d(y, Pa(a)y d~t~roPj~(y)
Pc(r)
> $
1/ d(y, PG(a)) r d~/~pG(r)(y)
Po(r)
1/
> -~ d(y, Pc(a)) r dT-ll(y)
bC~),~Cb)]

>-- L 11"7(b)- 7(a)l[V~'r (]]7(b)17(a)][7-/l[~(a)'~(b)])

= 1L[[-),(b)l~+~(a)[[ ([[7(b)~n.~(a)[])~

Thus, the lemma is proved. []

13.12 Theorem
1 1 Then
Let F C Rd be a rectifiable curve, with length L > O. Set P = ZT-/ir.

1
O) for 1 < r < 0% lim n e ~ ( P ) =
_ Q~([O,ll)~/'nI(F) = (~-~)~ -~L
(ii) l i m ne~,.(P) = Q.([O, 1])7-/1(F) = L.

Proof
Let 7: [0, L] -~ F be a parametrization of F by arc length.
First we will show (i).

">" : Let O=to < t l < . . . < t r n = L a n d c h o o s e t o = t 0 = t + , t ~ n = t m = t +and,


for i C {1,... , m - 1},

tL1 < t:; < t~ < t +, < t-;÷l.


186 III. Asymptotic quantization for singular probability distributions

Let [~i = T([t+_x,t[]). Then we know that F i M F j = 0 f o r i # j. Let 6 --


min{d(P,, Pj): i ~ j}, where d(B, C) =- inf{d(b, C): b C C} for B, C c R u.
Then we have 6 > 0 and 6 < diam(F). By Lemma 13.6 this implies

>

for all x E F, and hence,

(13.9) ~m~nP(FN&(x,~_~ > (i~(6_~I+~


4)] - kLJ \ 4 ]
Let a~ be an n-optimal set of centers of order r. For i -- 1 , . . . , n set

~,~ = {~ ~ ~ : W(~l~.) n r~ ¢ 0}.

We will show that there is an no E N such that

(13.10) (~n,i M oln,j = O

for i ~ j and all n > no. Since lim e,~,r = 0 there exists an no c N with

(13.11) e~or < 6 r .

Using Lemma 13.8 we have

(13.12) Iid~il~minP(B ( 1
yEF \ \ ~ -- --

for all n > no. Observing that t -+ tminP(B(y,t))~ is non-decreasing and


yEF
using (13.12), (13.11), and (13.9) we deduce

5
]lda, II~ <

for all n > no. By the definition of 6 and an,i this yields (13.10). It follows from
(13.10) that, for n > no,

nr yn,r nr I d(x, a,~)~dP(x)


F
m

> ~ ~ / a(x,.o)r~r(x)
i:l Fi

:nr~Ffd(x'°ln'i)rdP(x).=
.
13. Rectifiable curves 187

Setting ni = I~,~l, Li = 7-/l(Fi), and P~ = E Iv, we get

w, _ "~ L, l f "d 1 x
i=l
Fi

= n" Z V.,,.(P,)
i:I

By L e m m a 13.11 this leads to

n "V.. , r _> L ( I + lr)L, - II'Y(t+-l)-~(t;)lll+"


n'~-~ Li
(1).
7n~
i=l
(13.13)
1 119,(t+_l) _ ~,(t~_)lll+r
(o,)"
~-
-> 2"(1 + r)L i=1
Set si = I1"~(~/+_1) - ")'(ti-)ll l+r. It follows from L e m m a 6.8 t h a t

(13.14) s~ n -> s ,
i:1 i----1

hence
l+r

('~'~")" = • "" - 2,(1 + r)L -~(t~_l) - ~ ( t ;

For t~- --+ ti, t + -* ti we deduce

liminfnen,,> 1((1~r))7 (L) 7


i=1

Since L = sup{Y]~ II~(t,_l) - "Y(ti)ll : a = to < . . . < t m = b} we obtain


i=1
1

13.15) liminfnen,,>l ( 1 - - ~ ) 7

-<" Now let/3 C [0, L] be of cardinality less t h a n or equal to n and set oe = 7(/3)-
Using L e m m a 13.5 we deduce

n "v.n,r ~-- n" Jfminllx


aEa - allrdP(x)

Eo,z]

[0,El
188 IlL Asymptotic quantization for singular probability distributions

Since fl with I/3[ _< n was arbitrary in [0, L] we deduce

n rV,n,r < nrVn,r(U([O, L])

By Example 5.5 we have


L r
Vn,r(U([0, L]) - nr(1 + r)2r

Thus we get

(13.17) limsupne.,~ < \ 1 + r ]

Combining (13.15) and (13.17) yields the first part of the theorem.

Now we will prove part (ii).

"<" : Let o~ be as above.


Then using L e m m a 13.5 we have

ner~,oo(P) <_n m a x m i n l l x - all


xEF aEa
= n max min 117(t) - 7(b)ll
te[0,L] be#

< n max min It - b I.


-- tC [0,L] bc~

Since/3 with 1/3I was arbitrary in [0, L] we get

L
ne,,~(P) < ne,~,,(U([O,L]) = 7"

Hence
L
(13.18) limsupnen,oo(P) < -~
~--~oo

">" : On the other hand (11.1) yields

L = 7{'(V) < 2 l i m i n f n e n , ~ ( P )
~-~oo

so that
L
(13.19) lira inf ne,~,oo(P) >
n--~O0 -- -2'

Combining (13.18) and (13.19) implies part (ii) of the theorem. []


13. R e c t i t l a b l e c u r v e s 189

13.13 R e m a r k
In geometric measure theory a measure /z on ]Rd is called m-recitifiable if m C
{ 1 , . . . , d}, /z is absolutely continuous with respect to 7/m, and # is supported on
a countable union of m-dimensional Cl-manifolds. A length measure on a rectifiable
curve is 1-rectifiable in this sense.
We conjecture that, for all m-rectifiable measures # on H a and all 1 < r < co,

lim ne,~,,.(p,) ~
7/ , -+00

exists in (0, + c o ) , i. e., every m-rectifiable measure has a r - t h quantization coefficient.

Notes

The basic notions a b o u t rectifiable curves can, for instance, be found in the book of
Falconer (1985, C h a p t e r 3). A good introduction to the theory of rectifiable measures
is given by M a t t i l a (1995, §15-20).
190 IH. Asymptotic quantization for singular probability distributions

14 Self-similar sets and measures

The class of self-similar measures has been a central object of studies in fractal geom-
etry during the last two decades. Most self-similar measures are singular with respect
to Lebesgue measure, but the restriction of Lebesgue measure to the d-dimensional
cube is, for instance, also self-similar. In this section we determine the quantization
dimension of all self-similar measures that satisfy a certain separation condition. As
it turns out many self-similar measures have the property that their quantization
dimension of order r is strictly increasing with r. In this respect they are different
from all measures considered so far in this volume.

14.1 Basic notion and facts

Let ]] ]] denote a norm on R d. A c o n t r a c t i n g s i m i l a r i t y t r a n s f o r m a t i o n S on R d


is a map S: R a -~ R d such that there is a constant s E (0, 1) with

(14.1) lls( ) - s(v)ll = - vit

for all x, y E R a. s is called the s c a l i n g n u m b e r or c o n t r a c t i o n n u m b e r of S.


In what follows N is always a natural number greater than 1, (S~,... , SN) is an
N-tuple of contracting similarity transformations of R a, and ( s l , . . . , sN) is the cor-
responding N-tuple of scaling numbers.
Hutchinson (1981) has shown that there is always a unique nonempty compact subset
A of R d with

(14.2) A = SI(A) U... 0 Sly(A).

A is called the a t t r a c t o r or i n v a r i a n t set of ( S b . . . , SN). Sometimes A is also


called the self-similar set corresponding to ( S I , . . . , SN).
N
It is easy to see that there exists a unique real number D _> 0 with ~ s D = 1, the
i=1
similarity d i m e n s i o n of ($1,... ,SN). Let (Pl,... ,PN) be a probability vector,
N
i. e., Pi > 0 and ~-~p~ = 1. Then Hutchinson (1981) showed that there is a unique
i=l
probability measure P on R d with

N
(14.3) P = ~ p ~ P o S~-'.
i=1

If Pi > 0 for all i E { 1 , . . . , N } then the support of P equals the attractor A.


P is called the self-similar m e a s u r e corresponding to ( S b . . . , S N; Pl,... ,PN)"
($1,... , SN) satisfies the s t r o n g s e p a r a t i o n c o n d i t i o n if

Si(A) n Sj(A) = 0
14. Self-similar sets and measures 191

for i ¢ j .
($1,. •. , S ~ ) satisfies the o p e n s e t c o n d i t i o n (OSC) if there exists a nonempty open
set U C R ~ with Si(U) C U and S~(U) A Sj(U) = 0 for i ¢ j .
Schief (1994) has shown t h a t the open set U in the above definition can always be
chosen to be bounded and satisfy U N A ¢ 9.
If ($1,... , SN) satisfies the strong separation property t h a n it also satisfies the open
set condition.
If ($1,... , SN) satisfies the OSC and P is the self-similar measure corresponding to
( $ 1 , . . . , Szv; sD,. • • , SD) then P is the normalized D-dimensional Hausdorff measure
restricted to the a t t r a c t o r A, i.e. 0 < TiP(A) < co and

1 D
(14.4) P = 7.ID(A) 7-LIA

(see Hutchinson, 1981, p. 737/738).


By { 1 , . . . , N}* we denote the set of all words on the a l p h a b e t 1 , . . . , N including
the empty word 9.
If ( q l , . . • , q~) is an N - t u p l e of real numbers and a = a l . . . aN belongs to { 1 , . . . , N}*
then define

q~= (i~=lq~i , otherwise

For a n o n e m p V word a = a l . . . a,~ E { 1 , . . . , N}" set

(7- = ~ , n= 1
( (71 . . - (7n--1~ n > 1.

If a is a word then the l e n g t h o f a, denoted by lal, is 0 if a = q} and equal to n if


cr = a l . . . aN. For m < 1(71 let

(71m ~- { 0,
(71 . . . a m ,
m = 0
m > 1

be the restriction of a to m. A n a t u r a l order for words is defined by

I -I and -II l =

For an infinite sequence ~7 E ( 1 , . . . , N } N the restriction ~I-~ is defined in an analogous


way. A word a E { 1 , . . . , N}* is a p r e d e c e s s o r of y iff

r/ll~ I = a .

A finite set F C ( 1 , . . . , N}* is called a f i n i t e a n t i c h a i n iff any two elements of F


are incomparable with respect to the order given above. A finite antichain F is called
maximal iif, for every finite antichain F ~ c ( 1 , . . . , N}* with F C F t, we have F -- F r.
192 III. Asymptotic quantization for singular probability distributions

A finite antichain F is maximal if and only if every sequence in {1,... , N} N has


a precessor in F. If (ql,... ,qN) is a probability vector and F is a maximal finite
antichain then

(14.5)
Z
aCF
q~ = 1.

If 0 < ¢ < min{ql,... , qN} then

F(¢) = ( a • ( 1 , . . . ,N}*: q~- > ¢ > q~}

is a maximal finite antichain.


For a • ( 1 , . . . ,N}* set

S~= id, a=~


(Sat o... OSan, (7 ~- (71...(7n
where id is the identity on R d.

14.2 An upper bound for the quantization dimension

We use the notation introduced in 14.1. In the following (Pl,... ,P~) is always a
probability vector with pi > 0 for i = 1 , . . . , N and P is the self-similar probability
measure on Iit~ corresponding to (S1,... , SN; Pl,... ,PN).
14.1 L e m m a
For every n >_ N and every r • [1, oo),

(14.6) V,~,~(P) <_ min {L


i=l
pis~Vn,,~(P): 1 <_ n,, L
i=l
}
ni <_ n .

Proof
Since Si is a similarity transformation it follows immediately (see L e m m a 3.2) that

(14.7) V,,,,(P o SF 1) = srVm,,(P)

for all i • {1,... , N } and all m • N. Using (14.3) Lemma 4.14 (b) implies, for
N
n l , . . . ,nN • N with r~ _> 1 and ~ ni _< n,
i=l

Vn,r(P) < ~-~piVm,r(P o S [ ' ) .


i=l

Using (14.7) to substitute each summand on the right hand side yields the assertion
of the lemma. []
14. Self-similar sets and measures 193

14.2 C o r o l l a r y
Let P C { 1 , . . . , N}* be a finite maximal antichain, n C N with n >_ lPI and r C
[1, +oo). Then

(14.8) Vn,r(P) < minl~p~s~Vn~.~(P): l < ha, ~-~n~ < n }


"aEF crEF

Proof
The corollary follows from L e m m a 14.1 by induction on max{lal : a C F}. []

14.3 L e m m a
For every n > N,

N
(14.9) e~,oo(P) < min~t max s, en.~o(P): 1 <
-- / I < i < N ~' " " -- 77"i~
~ n, < n }
-- "

-- -- i = l

Proof
N
Since s u p p ( P o S~-1) = Si(A) and A = [.J S~(A) the lemma is an immediate eonse-
i=l
quence of L e m m a 10.2(b) and L e m m a 10.6(b). []

14.4 L e m m a
Let r C [1 + oo) be fixed. Then there exists exactly one number ~r C (0, +oo) with

(14.10) ~ ( 8 Pi ri)~~+r
r = 1.
i=1

Proof
N
Since 0 < pis~ < 1 the function t -+ Y~(pis~) t is strictly decreasing and continuous.
i=l
Since this function tends to N as t tends to 0 and takes a value less than 1 at t = 1
the intermediate value theorem implies the existence of a unique t E (0, 1) with
N
y~(pisr) t = 1. Then ~r = ~ satisfies the conclusions of the lemma. []
i=1

14.5 P r o p o s i t i o n
Let r C [1, q-c~) and let ~r satisfy (14.10). Then

lim sup ne~,r < oo,

in particular, the upper quantization dimension Dr(P) of P is less than or equal to


I~ r .
194 III. Asymptotic quantization for singular probability distributions

Proof
Let qi = (pisS)"+'~, and Co = min{qb .. ,qt¢}. Then we have ~0 > 0. Let m , n E N be
arbitrary with m < c ~ a n d s e t ¢ = ¢ 0 - - l m~-. F o r P ( ¢ ) = { a E {1,... ,N}*: q~- > ¢ >
q,} it follows by (14.5) that

1= ~-~q,
,er(~)

: ~ qa-qawl
~er(~)
> c¢olr(~)I ,

hence

Ir(e)l _< (e~0) -~

Using Corollary 14.2 we deduce

act(E)
(p~s: )r+~, (p~s;),+~, vm,,(p)

<_ (¢ "" )'+',Vm,,(P) ~ q~


aer(¢)
= go'~'/(m)~vm,r(p).

Thus we obtain
r _ r g_ r
n~Vn,r(P) <_~o "r m~Vrn,~(P)
which implies
ne,~,,(P) '~" <_ ¢o'mem,r(P) "r.
Since, for fixed m, this inequality holds for all but a finite number of n we get

lim sup ne,~,r(P) '~" <_¢o lmem,r (P)~"


7"1,-+00

< +cx)

and the proposition is proved. []

14.6 R e m a r k
The proof of the preceding proposition shows that

(14.11) limsupne,~,,(P) '~" < max{(pls~) . , + ~. , , . ... , ( p., , , , ~ ) ~,+-, " ,~,~,,( )
14. Self-similar sets and measures 195

14.7 P r o p o s i t i o n
Let D be the similarity dimension o f ( & , . . . , SN). Then

lira sup ne~,oo (p)V < +oc,

in particular, the upper quantization dimension Doo(P) of P is less than or equal to


D.

Proof
Set qi = s ° and eo = m i n { q l , . . . , qN}. Let m, n E N be arbitrary with ~ < 5 2. Set
5 = 5 0- - I r~a a n d F ( 5 ) = {a E { 1 , . . . ,N}*: q~- _> 5 > q~}. SinceF(e) i s a m a x i m a l
finite antichain it follows by Lemma 14.3 and an induction argument that

en,oo(p)D~ min~ max sD~ , ~ ( P ) v :l<n~ y~ n~<~ }


l~er(~ ) - , ~er(e) _

As in the proof of Proposition 14.5 we see that IF(e)I <_ ~,


~ so that

e,~,oo(P) D <_ max sDe,~oo(P) o


~er(e)
<_ 5era,oo(P) D
= 5;l~e.,.(P) v
and, hence,
,~era,oo(P)~' _< ~;'-~e~,+(P) D.
For fixed m this holds for all but finitely many n and yields

lim sup ne,,,oo(P) D < +oc.

The remaining statement of the proposition follows from Proposition 11.3 []

14.8 R e m a r k
The proof of the preceding proposition shows that

(14.12) limsupne~,o~(P) D < max(sTD,.., s~ D) inf mem oo(P) D.

14.3 A lower bound for the quantization dimension

The general assumptions in this section are the same as those in the preceding section.

14.9 L e m m a
For every 5 > O,

(14.13) inf{P(B(z,¢)): z E A} > O.


196 III. Asymptotic quantization for singular probability distributions

Proof
Sett--(~) D a n d F ( t ) = { a e { 1 , . . . , N } * : s ~ _ D _> t > S~}. Then r(t) is a finite
maximal antichain. Thus a = min{p~: a C F(t)} > 0. Let z E A be arbitrary. Since
A = (.J S~(A) there is a 7- C F(t) with z E S~(A). Since ST is a similarity we have
~er(0

diam(S~(A)) = s~ diam(A)

and, hence,
diam(S~(A)) < c.
Thus, we have S~(A) C B(z,e). Prom (14.3) it follows that

P(S~(A)) = ~ p ~ e o S[I(S~(A))
~erCt)
>_p ~ P ( S ~ ( S ~ ( A ) ) )
= Pr.

Combining the last two results we obtain

P ( B ( z , ¢ ) ) > p~ > a > 0

and the lemma is proved. []

14.10 L e m m a
Let ( S~,. . . , SN) satisfy the strong separation condition and let r E [1, +oo) be given.
Then

(14.14) V,~,r(P) = min{~-~p~s~V,~,,r(P): 1 < ni, ~-~ni < n


i----1 i=1
}
for all but finitely many n E N.

Proof

"<": T h a t Vn,r(P) is less than or equal to the right hand side for all but finitely many
n is the statement of Lemma 14.1.

">": To show the converse inequality let ~ = min{d(S~(A), Sj(A)): i # j}. Then we
have ~ > 0. Prom Lemma 14.9 we deduce

By Lemma 6.1 we know that

lim V,~,~(P) = O.
r~-~oo
14. Self-similar sets and measures t97

Hence there exists an no E N with

Y~,r (P) < fl

for all n > no. Let on be an n-optimal set of centers for P of order r. Then
L e m m a 13.8 implies t h a t
1 r

for all n > no and all a E on. Since the function t -+ t~minP(B(y,t)) is
-- yEA
non-decreasing it follows t h a t

i.e.
6

for all n _> no and all a E a~. By the definition of 5 we deduce that, for n > no
and i , j E { 1 , . . . , N } with i ¢ j , the sets an,i = (a E a~ : W(a[a~)MSi(A) • 0}
and an,j -- {a E a~: W(aIan) M S j ( A ) ¢ O} are disjoint.
Using (14.3) we obtain

V~,~(P) = min ]ix - a[]~ dP(x)


~EOtn
N
= Epi f min }lSi(x) - all r dR(x)
i=i J aEa~
N
- - - - E p i f min [ [ S i ( x ) - aHrdP(x)
i=t j aea~,,
N
= Epis r f min [Ix - b[[r dR(x)
i=t J bes71(~,~)
N

pisrVm,r(P),

where ni = I~,~1 ~ 1
Since
N
Z n i ~ IC~nl z n
i=1
we deduce
198 III. Asymptotic quantization for singular probability distributions

[]

14.11 Proposition
Let ( $ 1 . . . , SN) satisfy the strong separation condition and let r E [1, +oo) be g/ven.
Moreover, let nr satisfy (14.10). Then

lira infne~.~r > 0,

in particular, the lower quantization dimension D___r(P ) is greater than or equM to t~.

Proof
Since [A[ = c~ we have V,,~(P) > 0 for all n 6 N. Let no E N be such that (14.14)
holds for all n _> no. Choose c > 0 with

V~,r(P) _> c n - ~

for all n < no.


We will show by induction on n that

(14.15) V,~,,.(P) > c n - ~

for all n E N. Let n C N be such that

Vk,~(P) >_ c k - ~

for all k < n.


Using this assumption and (14.14) we obtain

N }
V,~,r(P) = m i n ~ E P i s : V ~ , r ( P ) : l < _ hi, E ni _< n
i=1

_ pisicn i : 1 < ni, < 1


x i:l i=1

= cn-~ rain, pis -~ : 1 <_ ni, -- <_ 1 .


n
i=1

By Lemma 6.8 we have


N N N rt~r

mm~ mpis i(-) ~ : 1 <_ hi, <_ 1 >_ pis~)~+~


[" i = 1 n i=1 ~ i=1

Thus we get
v~,~(P) >_ cn ~,
14. Self-similar sets and measures 199

and (14.15) is proved.


It follows that
nen,r(P) ~T > c ~ > 0
for all n C N, in particular
liminfne~,r(P) ~ > O.
~-+oo

The remaining statement in the proposition is an immediate consequence of Propo-


sition 11.3. []
14.12 P r o b l e m
Does the conclusion of Proposition 14.11 remain true under the weaker assumption
that ($1,... , SN) satisfies the open set condition?

14.13 P r o p o s i t i o n
Let ($1,... , SN) satisfy the open set condition and let D be the similarity dimension
of ($1,... , SN). Then
lim infne~ oo(P) D > 0,

in particular, the lower quantization dimension D_oo(P ) is greater than or equal to D.

Proof
Since supp(P) =- A -- supp(7/~A ) the statement of the proposition follows immediately
from Example 12.10, Theorem 12.18 and Remark 11.2. []

14.4 The quantization dimension

The general assumptions are the same as in Section 14.2. We denote the similarity
dimension of ($1,... , SN) by D. In this section we will show that, for most serf-similar
measures, the quantization dimensions of different orders are different.

14.14 T h e o r e m
Let r E [1, +oc), let ~r C (0, oc) be defined by

N
(14.16) E(p~s[);~z;~ = 1,
i=1

and let ( S I , . . . , SN) satisfy the strong separation condition. Then

0 < lim infne,,r(P) ~" <_ limsupe~ r ( P ) ~ < + o o .

In particular, the quantization dimension Dr ( P) of order r exists and equals ~ .

Proof
The result follows from Proposition 14.5 and Proposition 14.11. []
200 IlL A s y m p t o t i c quantization for singular probability distributions

Next we will prove some auxiliary results concerning the function K : [1,+oo) -~
(0, +co), r --+ at. Define F : [1, +co) × (0, +co) -+ R by
N
F ( r , t ) = E ( p , s [ ) r ~ ; - 1.
i=1

By definition K is the unique function on [1, +co) with


F(r, K(r)) = 0
for all r C [1, co).
Since
N
OF t
-~r (r't) = i=1 (t+r)2(-l°gPi+tl°gsi)

and
N

- - ~ ( r , t ) = i=l (t + r)2(l°gpi + r l ° g s i )

implicit differentiation yields


N
~ i=~1(PISS)~ (logpi-- t% log si)
(14.17) g'(r) = g
r ~_~(pis[)_~zT~(logpi + rlogsi)
i=1

14.15 L e m m a
I f there exists an ro >_ 1 with K'(ro) = 0 then p~ = s D for i = 1 , . . . , N and K r = D
for all r E [1, +co).

Proof
r ~o
Set qi : (pisi°) ~°+~°. Since K'(ro) -- 0 we derive from (14.17) and tcro > 0 that
N
(14.18) E qi (log pi - t%o logsi) = O.
i=l

By the definition of qi we have

(14.19) si = ( =qi ".o } -


\Pi /
and hence
r o -~-/~ro
~ro log si = t~ro(_ logpi + logqi)
ro 1%o
-- log qi.
t%° l o g pi + ro + tcr_.______~o
TO ro
14. Self-similar sets and measures 201

Substituting this value into (14.18) yields

N
E qi(r° + ~ro log Pi ro + aro log qi) = O.
i=1 7"0 r°

Since ~ r0
> 0 this implies
N
E qi log p--!= O.
i=1 qi

Since the logarithm is a strictly concave function this implies

Pi = qi

for i = 1 , . . . , N. Hence (14.19) yields


1..2_
Si = p~O

i.e.

Pi = S~ r°

N
Since y~ Pi = 1 the definition of the similartiy dimension of ( S b . . . , Sly) yields
i=1

and Pi = s~ for i = 1 , . . . , N.
Using this identitiy in (14.16) we obtain

N
~'~ (s9+r~
\ $ ./ rq*-~r ~ 1.

i=1

This implies
D+r
D = t % ~
/~r - b r
and, hence
~r = D
for all r • [1, +co). []

14.16 L e m m a
If (p~,... ,p~) = ( s f , . . . , s~) then ~, = D for ~tI r • [1, + o o ) .
I f (Pl,... ,P~) ~ ( s ~ , . . . , sg) then K : (0, +oo) --+ R, r --+ ~;r is strictly increasing
with
lim nr --= D.
202 IIL Asymptotic quantization for singular probability distributions

Proof
If ( p l , . . . ,p~¢) ----( s ~ , . . . , s~) then the last part of the proof of L e m m a 14.15 shows
that ~r = D for all r E [1, +co).
If (Pl,... ,PN) ~ (s~,... , s~) then it follows from L e m m a 14.15 that K'(r) ~ 0 for
all r C [1, +c~). Since, as a consequence of the definition of ~r, the function K is
increasing this implies that K is strictly increasing. In particular moo = lim mr exists
r--+oo
in [0, +c~]. Since
N N
~p r~r
1 = Z ( p i s r ) r+'~ <_ ~ s: ÷'"
i=1 i=1

we obtain
- rm
- r~ D,
r-l-mr
hence
Dr
mr ~ - -
r-D
for r > D. Thus we deduce
moo < D .
Since
N N rif T
1 = ~oolim--~-~)(pis
_ 'r - ~-~ = ~oolimZ p i ~:~ s,~+~"
i=1 i=l
N

i=1

we deduce
m~o = D.
[]
14.17 T h e o r e m
Let q, r E [1, +oo] and let ($1,... , SN) satisfy the strong separation condition.

(i) If (Pl,... , P~) = (sO,... , s °) then the quantization dimension Dr(P) of order
r exists and equals D.
(ii) If (Pl,... ,PN) ~ ( s f , . . . ,s D) then the quantization dimension D r ( P ) exists
and
J D , r = +o0
Dr(P)
mr, r < -'[-00.

Moreover,
q < r =¢. Dq(P) < Dr(P)
and
lim Dr(P) = D.
r-+OO
14. Self-similar sets and measures 203

Proof
That D ~ ( P ) exist and equals D follows from Proposition 14.7 and Proposition 14.13.
The remaining statements follow from Theorem 14.14 and Lemma 14.16. []

14.18 P r o b l e m
Does Theorem 14.17 hold under the weaker assumption that ($1,... , SN) satisfies
the open set condition?
14.19 R e m a r k
It is shown by Kawabata and Dembo (1994, Theorem 4.1), that under the assumptions
of Theorem 14.17 the rate distortion dimension of P equals
N
p~ log p~
i=1
N
Pi log si
i=1

and, therefore, equals the Hausdorff dimension dimn(P) of P by Cawley and Mauldin
(1992, Theorem 2.1).

14.5 The quantization coefficient

In the preceding sections we have shown that, for many self-similar probabilities P,
the inequality

0 < liminfnen,r(P) Dr < limsupnen,r(P) D~ < +co


1%--~00 1%-"~00

holds for r E [1, col and the quantization dimension Dr of order r for P. It is, there-
fore, natural to investigate the problem under what conditions the above sequence
has a finite and positive limit. Taking r < co and generalizing Theorem 6.2 and (6.4)
the ~7-th power of this limit, if it exists, is called the r-th quantization coefficient
1
Qr(P). For r = co, lim n~e,~,~, if it exists, is called the covering coefficient or
r~-+oo
quantization coefficient of order co and does only depend on supp(P) ((10.10) and
Theorem 10.7). Little seems to be known about the above problem. We will first
state a positive result concerning the quantization coefficient of order co. To this end
we need the following definition.
14.20 D e f i n i t i o n
An N-tuple (Sl,... , sN) of real numbers is called a r i t h m e t i c if there is a positive
s E R with Sl,... ,SN E s Z : = {sn: n E Z}.

14.21 T h e o r e m
Let ($1,... , SN ) be an N-tuple of contracting similarity transformations of R d stat-
isfying the open set condition and let the corresponding N-tuple ( S l , . . . , SN) of con-
traction numbers be such that (logs1,... ,logsN) is not arithmetic. Let A be the
204 III. Asymptotic quantization for singular probability distributions

attractor of ($1,... , SN) and D its similarity dimension. Then (ne,,oo(A)D),er~ has
a finite and positive limit, hence the quantization coefficient Qoo(A) of A exists in

Proof
With an argument similar to that given in Remark 10.9 one can show that
lim nem~(A) D exists if and only limN(c)E D exists, where N(c) is the minimal num-
r/.--', OO ¢--~0
ber of balls of radius ¢ > 0 that cover A. If one of the limits exists then so does
the other and they agree. Due to a result of Lalley (1988) combined with a result
of Schief (1994), limN(¢)¢ D exists in (0, +oo) under the assumptions of the theorem
¢~t0
(see also Falconer, 1997, p. 123, Proposition 7.4). []
The following proposition shows that the quantization coefficient Qo~(A) need not
exist if the assumption is dropped that (log s b . • • , log s~v) is not arithmetic.

14.22 P r o p o s i t i o n
Let N >_ 2, ($1,... , SN), A, and D be as above but assume that ( S 1 , . . . , SN) satisfies
the strong separation condition and that all Si have the same contraction number s.
Then
0 < lim inf ne~ oo(A) D < lim sup ne~,oo (A) D < +oo.

Hence the quantization coefficient Qoo(A) does not cxist.

Proof
According to Proposition 14.7 and 14.13 we know that

0 < liminfne,,oo(A) D < limsupne,~,oo(A) D < c~.


1"~--+o o

Therefore, it remains to show that (nen,~(A)n)neN does not converge. Let no C N


satisfy no _> N and

1 min{d(S~(A), Sj(A)) : i # j}.


e,~o,~(A) < -~

Using Lemma 10.2(b) it follows immediately that, for n > no,


N n

We claim that

(14.21) e,~,~(A) = se[~],~(A)

for all n _> no, where [~] denotes the greatest integer less than or equal to ~. Setting
ni = [~] it follows from (14.20) that

en,oo(A) <_ se[~rl,oo(A).


14. Self-similar sets and measures 205

N
Now let n l , . . . , nN C N satisfy 1 _< hi, ~ ni _< n, and
i=l

e,~,c~(A) = max se,~,cc(A).

Without loss of generality we assume nl <_ n2 <_ ... <_ nN SO that

e,~,~(A) = se,~,,oo(A)

and nl < [~]. Thus we deduce

= > 8eE j, (A)

and our claim is proved.


Since N > 2 the equality (14.21) implies

(14.22) e2,~o+l,~(A) = seno,~(A) = e2,~o,o~(A)

and, therefore,

(14.23) eNk(2,~o+t),~(A) = em(2,,o),oo(A)

for a l l k E N. Now assume that (ne,~,~(A)D),,e N converges to some constant c E


(0, co). Using (14.23) this implies

c ---- lim (gk(2no + 1))eN,(Uno+l),~(A) D


k--+ oo

---- lim (Nk(2no + 1))em(2,~o),~(A) °


k--~oo
• 2no + 1 k D

{2no -}- 1~
-- )C,
which yields a contradiction and finishes the proof of the proposition• []

14.23 R e m a r k
It remains an open problem to characterize those serf-similar sets A for which the
quantization coefficient Q ~ ( A ) exists by a natural condition on the generating N-
tuple ( S l , • . . , SN). For 1 < r < co and general serf-similar probabilities P almost
nothing is known about the existence of the quantization coefficients Qr(P). The
only serf-similar probability P for which the existence of all quantization coefficients
Qr(P), 1 < r < co is known seems to be the restriction of Lebesgue measure to
the unit cube in R d. The classical Cantor distribution P on R has no quantization
coefficient Q2(P) (cf. Graf and Luschgy, 1997).

Below we will summarize the known results for the Cantor distribution.
Let S1,S2: R -+ R be defined by S i x = ~xl and S2x = .5xl +.5.~ Then $1 and $2
are similarity transformations with contraction number ~. The attractor of the pair
206 III. Asymptotic quantization for singular probability distributions

($1, $2) is the classical C a n t o r set C c [0,1]. The similarity dimension of ($1, $2)
equals D = ~log 3" Let P be the self-similar probability corresponding to ($1, $2,½,½)
(see (14.3)). According to (14.4) P is the normalized D-dimensional Hausdorff mea-
sure on C. This distribution is called the (classical) C a n t o r d i s t r i b u t i o n . Since
(S1, $2) satisfies the strong separation condition and since (s D, s D) = (½, ½) we know
from Theorem 14.17 that D is the quantization dimension of P of order r for all
r C [1, + o o 1. In the following theorem we will describe all optimal sets of n-centers,
the quantization errors V~,r (P), and all limits points of the sequence (n 2/DV,,,r(P)),~eN
for r - 2. In particular we show that the quantization coefficient Q2(P) does not
exist.
To do this we need some more notation. For a E {1, 2}* let ag = Sg(½). For n _> 1
let l(n) = [log 2 n]. For I C {1, 2} ~('0 with [I] -- n - 2° 0 let

an(I) = {a~: a C {1,2}t(~) \ I } U U{a~l,ag2}


gel

Define f : [1, 2] -~ R by
1 2
f(x) ----~-~xZ(17- 8x).

14.24 T h e o r e m
Let P be the Cantor distribution and let D, l(n), an(I), and f be defined as above.

(a) For every natural number n >_ 1 the following conditions axe equivalent

(i) a is an n-optimal set of centers of order 2 for P


(ii) There exits an I C {1,2} L(n) with a = (~(I)

(b) For every natural number n >_ 1,

V~ 2(P) = 1 1 (2/(n)+, _ n + 1
, 18,(~) . g ~(n - 2~(~))).

(c) The set of all accumulation points of (n~Vn,2(P)),~n is the intervall

1 17

(Notice that =
In particular P has no second quantization coe~cient.

The proof is given in Graf and Luschgy (1997) and will be omitted here.
14. Self-similar sets and measures 207

Notes

The definition of self-similarity as used in this section was introduced by Hutchinson


(1981). His paper also contains the basic results about self-similar sets and mea-
sures. Other references concerned with this subject are the books of Barnsley (1988),
Falconer (1990, 1997), and Mattila (1995). The book of Barnsley (1988) describes
many interesting examples of self-similar sets and measures. The idea of studying the
quantization of self-similar probabilities goes back to Zador (1982). But his results
are not formulated in a rigorous way. Since then nobody seems to have dealt with
the problem. Thus, all the quantization results in this Section 14 seem to be new.
Appendix

Univariate distributions

The following univariate distributions served as examples. Recall that the r-th (ab-
solute) moment about the center of a real random variable X is given by
V~(X) = aE
infIR E ] X - a l l

N o r m a l distribution N(O,a 2)
The normal distribution is strongly unimodal. If X is N(0, a~)-distributed, then

~ 2/~-a2rF(r+l ~ r : > l .
V,(X)=EIXr = V T ~TJ' -

In particular
v, ( x ) = o vr ;.

Logistic distribution L(a)

The density (with respect to A) is given by


exla
h(x) = a(1 + e~/~) 2 ' x c l~,

where a > 0 is a scale parameter. The logistic distribution is symmetric about the
origin and strongly unimodal. The distribution function takes the form
1 e~/~
F ( x ) - 1 + e-Z~ " - 1 + e~/~
Suppose that X has distribution L(a). Then
oo
V r ( X ) = EIX[*" = 2arF(r + 1) E ( - 1 ) J - l j -r, r _~ 1
j=l

= 2 a r r ( r + 1)(1 -- 2 - ( r - x ) ) ¢ ( r ) , r > 1,
210 Appendix Univariate distributions

where ~ denotes the Riemann zeta function. In particular


a27r 2
VI(X) = 2alog2, V2(X) - 3

G e n e r a l i z e d L o g i s t i c d i s t r i b u t i o n GL(a, b)

The density is defined by


F(2/b)e~:/ab
h(x) = a r ( i / b ) 2 ( 1 + :/~)~/b, x E / R ,

where a > 0, b > 0. We have GL(a, 1) = L(a).

D o u b l e E x p o n e n t i a l d i s t r i b u t i o n DE(a)

The density is given by

h(x) = l e - I ~ l / ~ , x ~ IR,

where a > 0 is a scale parameter. The double exponential distribution is strongly


unimodal. The distribution function takes the form

F ( x ) = ~ 2v ' x < O
I 1 - - ~i e-~/a , x > 0 .
I.
If X is DE(a)-distributed, then
V~(X) = E I X F -- a~r(1 + r), r _> 1.
In particular
VI(X) = a , V2(X) = 2a 2.

Double Gamma d i s t r i b u t i o n DF(a, b)

The density is given by


1
h(x) - 2abF(b ) [x[b-le -Izl/a, x E IR,

where a > 0 is a scale parameter and b > 0 is a shape parameter. We have DF(a, 1) =
DE(a). If X is DF(a, b)-distributed, then

v ~ ( x ) = E I X I r - a r r ( b + r)
r(b) , r > 1.

In particular
VI(X) = ab, V2(X) = a2b(b + 1).
Appendix Univariate distributions 211

Hyper-exponential d i s t r i b u t i o n HE(a, b)

The density is given by

h(x) - 2aF(1/b) exp - , x e/R,

where a > 0 is a scale parameter, b > 0. The hyper-exponential distribution is


strongly unimodal if b > 1. We have HE(a, 1) = DE(a) and HE(a, 2) -- N(0, a2/2).
Let X be HE(a, b)-distributed. Then

V~(X) = EIXI ~ - a"r(~--~b~)


r(-~) , r > 1.

U n i f o r m d i s t r i b u t i o n U([a, b])

The density is given by


1
h(x) = b--~l[a,b](X),
where a, b E /R, a < b. The uniform distribution is symmetric about (a ÷ b)/2 and
strongly unimodal. Let X be U([a, b])-distributed.Then
Med(X) --- {(a + b)/2}, E X = (a + b)/2,
V~(X)= E X a +b r ( b - a)r
-"--7 - ( l q - r ) 2 ~ ' r~_ 1.

T r i a n g u l a r d i s t r i b u t i o n T(a, b; c)

The density is given by


2(x - a) 2 ( b - x)
h(x) - ( c - - a ) ' ~ - - a)l[a,c](x) + ( b - c ) ( b - a)l[c,b](X),
where a < c < b. Consider the case c : (a + b)/2. Then the triangular distribution
is symmetric about (a + b)/2 and
4
h(x) - (b - a) 2 ((x - a)l[a,(a+b)/2l(X) q- (b - x)l((a+b/2,b](X)).

If X is T(a, b; a~-~)-distributed, then

u~(x) = E X - a+b~
-5--
(b - a) r
-- (r q- 1)(r + 2)2 r-1 ' r > 1.
In particular
Vl(X)- b-a V2(X)- (b-a) 2
' 24
212 Appendix Univariate distributions

E x p o n e n t i a l d i s t r i b u t i o n E(a)

The density is given by

h(x) = 1-e-~/%0,=)(x),
a

where a > 0 is a scale parameter. The exponential distribution is strongly unimodal.


The distribution function takes the form
- - e -x/a , X >0

If X is E(a)-distributed, then
F(x) =
{1 O, x <_ O.

Mud(X) = {alog2}, E X = a,
VI(X) = E I X - alog21 = a l o g 2 ,
V2(X) = V a r X = a 2.

Weibull distribution W (a, b)

The density is given by


h(x) ~xb-lexp( - x b

where a > 0 is a scale parameter and b > 0 is a shape parameter. The Weibull
distribution is strongly unimodal for b >_ 1. The distribution function takes the form

F(x)=
{ 1-exp
0,
- ~ , x>0
x<0.
We have W(a, 1) = E(a). The distribution W(a, 2) is called Rayleigh distribution.
Let X be W(a, b)-distributed. Then
Mud(X) = {a(log 2)1/b},
E X " = arF (1 + b) , r _> 0,

V2(X) = V a r X = a2(F( 2 + 1 ) - F( 1 + 1)2).


o (]

Let b = 2. Then
V~(X) = EIX - a ~ [
3

71-
v~(x) = a~(1 - -~).
For
a= [1Vi~ + ~ ( ~ - 2¢(~))]-' = 2.7027...
one obtains Vl(X) = 1.
Appendix Univariate distributions 213

Gamma distribution F(a, b)

The density is given by

h(~)- 1 ~_~_~/o. ,,
abF(b) X e 1(o,¢¢)(x),

where a > 0 is a scale parameter and b > 0 is a shape parameter. The G a m m a


distribution is strongly unimodal for b > 1. We have F(a, 1) = E(a). If X is F(a, b)-
distributed, then

EX" = arF(b + r) , r > 0 ,


r(b)
E X = ab, V2(X) = V a r X = a2b.
Let b = 2. Then

Med(X) = {a- 1.6783... }


VI(X) = a . 1.0517...

and for a = 0.9508... one gets VI(X) = 1.

G e n e r a l i z e d G a m m a d i s t r i b u t i o n GF(a, b, c)

The density is given by

h(x)- c__ zb_ ~ z c

where a > 0, b > 0, c > 0. We have GF(a, b, 1) -- F(a, b).

Pareto distribution P(a, b)


The density is given by

h(x) = ba%-(b+l)l(a,oo)(x),
where a > 0 is a scale parameter, b > 0. The distribution function takes the form

F(x) = ( 1 - (~)b, x >a


t 0, x<_a.

Let X be P(a, b)-distributed. Then

Med(X) = {a21/b},
214 Appendix Univariate distributions

arb
EX r - , b > r ~ 0,
b-r
B(X) = E I X - a2~lbl - ab(211b - 1)
b-1 , b>l,
a2b
v:(x) = Va~X = (b - 2 ) ( b - 1)~ ' b > 2.

Cantor distribution

Let C c [0, 1] be the (classical) Cantor set and let D = ~mog3be the Hausdorff
dimension of C. Then the Cantor distribution P ist the normalized D-dimensional
Hausdorff measure on C. Define $1, $2: R --+ R by S i x = ~xl and S 2 x = ~xl + ~"
Then P is the unique Borel probability on R with

p= l(ps, + e,~).

Let X be P-distributed. Then

1 V2(X) = V a r X = ~1
E(X) = 3'
Bibliography

Abaya, E.F. and Wise, G.L. (1981). Some notes on optimal quantization. Pro-
ceedings of the International Conference on Communications (Denver, Colorado),
30.7.1-10.7.5. IEEE Press, New York.
Abaya, E.F. and Wise, G.L. (1984). Convergence of vector quantizers with applica-
tions to optimal quantizers. SIAM J. Appl. Math. 44, 183-189.
Abut, H., editor (1990). Vector Quantization. IEEE Press, New York.
Adams Jr., W.C. and Giesler, C.E. (1978). Quantizing characteristics for signals hav-
ing Laplacian amplitude probability density function. IEEE Trans. Communications
26, 1295-1297.
Agrell, E. and Eriksson, T. (1998). Optimization of lattices for quantization. IEEE
Trans. Inform. Theory 44, 1814-1828.
Anderberg, M.R. (1973). Cluster Analysis for AppLications. Academic Press, San
Diego.
Aurenhammer, F. (1991). Voronoi diagrams: A survey of a fundamental geometric
data structure. ACM Computing Surveys 23, 345-405.
Baranovskii, E.P. (1965). Local density minima of a lattice covering of a four-
dimensional Euclidean space by equal spheres. Soviet Math. Dokl. 6, 1131-1133.
Barnes, E.S. and Sloane, N.J.A. (1983). The optimal lattice quantizer in three di-
mensions. SIAM J. Algebraic Discrete Methods 4, 30-41.
Barnsley, M. (1988). Fractals Everywhere. Academic Press, London.
Bartlett, P.L., Linder, T., and Lugosi, G. (1998). The minimax distortion redundancy
in empirical quantizer design. IEEE Trans. Inform. Theory 44, 1802-1813.
Benhenni, K. and Cambanis, S. (1996). The effect of quantization on the performance
of sampling designs. Techn. Report No. 481, Center for Stoch. Processes, Univ. of
North Carolina, Chapel Hill.
Bennett, W.R. (1948). Spectra of quantized signals. Bell Systems Tech. J. 27,
446-472.
Bock, H.H. (1974). Automatische Klassifikation. Vandenhoeck and Ruprecht,
GSttingen.
216 Bibliography

Bock, H.H. (1992). A clustering technique for maximizing ~o-divergence, noncentrality


and discriminating power. Analyzing and Modeling Data, 19-36 (ed., M. Schader).
Springer, Berlin.
Bollob£s, B. (1972). The optimal structure of market areas. J. Economic Theory 4,
174-179.
Bollob~s, B. (1973). The optimal arrangement of producers. J. London Math. Soc.
6, 605-613.
Bouton, C. and Pages, G. (1997). About the multidimensional competitive learning
vector quantization algorithm with constant gain. Ann. Appl. Probab. 7, 679-710.
Bucklew, J.A. and Cambanis, S. (1988). Estimating random integrals from noisy
observations: Sampling designs and their performance. IEEE Trans. Inform. Theory
34, 111-127.
Bucklew, J.A. and Wise, G.L. (1982). Multidimensional asymptotic quantization
theory with r-th power distortion measures. IEEE Trans. Inform. Theory 28, 239-
247.
Calderbank, R., Forney Jr., G.D., and Moayeri, N., editors (1993). Coding and
Quantization. DIMACS Vol. 14, American Mathematical Society.
Cawley, R. and Mauldin, R.D. (1992). Multifractal decomposition of Moran fractals.
Adv. Math. 92, 196-236.
Cambanis, S. and Gerr, N.L. (1983). A simple class of asymptotically optimal quan-
tizers. IEEE Trans. Inform. Theory 29, 664-676.
Cassels, J.W.S. (1971). An Introduction to the Geometry of Numbers. Second Print-
ing. Springer, Berlin.
Chatterji, S.D. (1973). Les martingales et leurs application analytiques. Lecture
Notes in Math. 307 (Ecole d' Et~ de Probabilit~s: Processus Stochastiques), 27-135.
Springer, Berlin.
Cohn, D.L. (1980). Measure Theory. Birkh~user, Boston.
Cohort, P. (1997). Unicitd d'un quantifieur localement optimal par le th~or~me du
col. Technical Report, Labo. Probab., Univ. Paris 6.
Conway, J.H. and Sloane, N.J.A. (1993). Sphere Packings, Lattices and Groups.
Second Edition. Springer, New York.
Cox, D.R. (1957). Note on grouping. J. Amer. Statist. Assoc. 52, 543-547.
Cuesta-Albertos, J.A. and MatrOn, C. (1988). The strong law of large numbers for
k-means and best possible nets of Banach valued random variables. Probab. Theory
Related Fields 78, 523-534.
Cuesta-Albertos, J.A., Gordaliza, A., and Matr£n, C. (1997). Trimmed k-means: an
attempt to robustify quantizers. Ann. Statist. 25, 553-576.
Bibliography 217

Cuesta-Albertos, J.A., Gordaliza, A., and Matr£n, C. (1998). Trimmed best k-nets:
a robustified version of an L~-based clustering method. Statist. Probab. Letters 36,
401-413.
Cuesta-Albertos, J.A., Garci£-Escudero, L.A., and Gordaliza, A. (1999). Trimmed
best k-nets: asymptotics and applications. Preprint.
Dalenius, T. (1950). The problem of optimum stratification. Scandinavisk Aktuari-
etidskrift 33, 203-213.
David, G. and Semmes, S. (1993). Analysis of and on Uniformly Rectifiable Sets.
Mathematical Surveys and Monographs, Vol. 38, American Mathematical Society,
Rhode Island.
David, G. and Semmes, S. (1997). Fractured Fractals and Broken Dreams. Clarendon
Press, Oxford.
Dharmadhikari, S. and Joag-Dev, K. (1988). Unimodality, Convexity and Applica-
tions. Academic Press, Boston.
Diday, E. and Simon, J.C. (1976). Clustering analysis. Digital Pattern Recognition,
47-94 (ed., K.S. Fu). Springer, New York.
Elias, P. (1970). Bounds and asymptotes for the performance of multivariate quan-
tizers. Ann. Math. Statist. 41, 1249-1259.
Eubank, R.L. (1988). Optimal grouping, spacing, stratification, and piecewise con-
stant approximation. SIAM Review 30, 404-420.
Falconer, K.J. (1985). The Geometry of Fractal Sets. Cambridge University Press,
Cambridge.
Falconer, K.J. (1990). Fractal Geometry. Wiley, Chicester.
Falconer, K.J. (1997). Techniques in Fractal Geometry. Wiley, Chicester.
Fang, K.-T. and Wang, Y. (1994). Number-theoretic Methods in Statistics. Chapman
and Hall, London.
Federer, H. (1969). Geometric Measure Theory, Springer, Berlin-Heidelberg-New
York.
Fejes T6th, L. (1959). Sur la repr6sentation d'une population infinie par un nombre
fini d' @16ments. Acta Math. Acad. Sci. Hung. 10, 299-304.
Fejes T6th, L. (1972). Lagerungen in der Ebene, anf der Kugel und im Raum. Second
Edition. Springer, Berlin.
Fleischer, P.E. (1964). Sufficient conditions for achieving minimum distortion in a
quantizer. IEEE Int. Cony. Rec., part 1, 104-111.
Flury, B.A. (1990). Principal points. Biometrika 77, 33-41.
218 Bibfiography

Forney Jr., G.D. (1993). On the duality of coding and quantization. Coding and
Quantization, 1-14 (eds., R. Calderbank et al.). DIMACS Vol. 14, American Mathe-
matical Society.
Fort, J.C. and Pages, G. (1999). Asymptotics of optimal quantizers for some scalar
distributions. Preprint.
Garci£-Escudero, L.A., Gordaliza, A., and MatrOn, C. (1999). A central limit theo-
rem for multivariate generalized trimmed k-means. Ann. Statist. 27, 1061-1079.
Gardner, W.R. and Rao, B.D. (1995). Theoretical analysis of the high rate vector
quantization of LPC parameter. IEEE Trans. Speech Audio Processing 3, 367-381.
Garkavi, A.L. (1964). The best possible net and the best possible cross-section of a
set in a normed space. Amer. Math. Soc. Translations 39, 111-132.
Gersho, A. (1979). Asymptotically optimal block quantization. IEEE Trans. Inform.
Theory 25, 373-380.
Gersho, A. and Gray, R.M. (1992). Vector Quantization and Signal Compression.
Kluwer, Boston.
Gilat, D. (1988). On the ratio of the expected maximum of a martingale and the
Lp-norm of its last term. Israel J. Math. 63, 270-280.
Goddyn, L.A. (1990). Quantizers and the worst-case Euclidean traveling salesman
problem. J. Combinatorial Theory Series B 50, 65-81.
Graf, S. and Luschgy, H. (1994a). Foundations of quantization for random vectors.
Research Report No. 16, Applied Mathematics and Computer Science, University of
Miinster.
Graf, S. and Luschgy, H. (1994b). Consistent estimation in the quantization problem
for random vectors. Trans. Twelfth Prague Conf. Inform. Theory, Stat. Decision
Functions, Random Processes, 84-87.
Graf, S. and Luschgy, H. (1996). The quantization dimension of self-similar sets.
Research Report No. 9, Dept. of Mathematics and Computer Science, University of
Passau.
Graf, S. and Luschgy, H. (1997). The quantization of the Cantor distribution. Math.
Nachrichten 183, 113-133.
Graf, S. and Luschgy, H. (1999a). Quantization for random vectors with respect to
the Ky Fan metric. Submitted.
Graf, S. and Luschgy, H. (1999b). Quantization for probability measures with respect
to the geometric mean error. Submitted.
Graf, S. and Lusehgy, H. (1999c). Rates of convergence for the empirical quantization
error. Submitted.
Gray, R.M. (1990). Source Coding Theory. Kluwer, Boston.
Bibfiography 219

Gray, R.M., Neuhoff, D.L., and Shields, P.C. (1975). A generalization of Ornstein's
distance with applications to information theory. Ann. Probab. 3, 315-328.
Gray, R.M. and Davisson, L.D. (1975). Quantizer mismatch. IEEE Trans. Commu-
nications 23, 439-443.
Gray, R.M. and Karnin, E.D. (1982). Multiple local optima in vector quantizers.
IEEE Trans. Inform. Theory 28, 256-261.
Gray, R.M. and Neuhoff, D.L. (1998). Quantization. IEEE Trans. Inform. Theory
44, 2325-2383.
Gruber, P. (1974). 0ber kennzeichnende Eigenschaften yon euklidischen Pd4umen und
Ellipsoiden I. J. Reine Angew. Math. 265, 61-83.
Gruber, P.M. and Lekkerkerker, C.G. (1987). Geometry of Numbers. Second Edition.
North-Holland, Amsterdam.
Griinbanm, B. and Shephard, G.C. (1986). Tilings and Patterns. Freeman and
Company, New York.
Haimovich, M. and Magnati, T.L. (1988). Extremum properties of hexagonal parti-
tioning and the uniform distribution in euclidean location. SIAM J. Discrete Math.
1, 50-64.
Hartigan, J.A. (1978). Asymptotic distributions for clustering criteria. Ann. Statist.
6, 117-131.
Hochbaum, D. and Steele, J.M. (1982). Steinhaus's geometric location problem for
random samples in the plane. Adv. Appl. Probab. 14, 56-67.
Hoffmann-Jorgensen, J. (1994). Probability with a View Toward Statistics. Vol. 1.
Chapman and Hall, New York.
Hutchinson, J.E. (1981). Fractals and self-similarity. Indiana Univ. Math. J. 30,
713-747
Iyengar, S. and Solomon, H. (1983). Selecting representative points in normal pop-
ulations. Recent Advances in Statistics, Papers in Honor of H. Chernoff, 579-591.
Academic Press.
Jahnke, H. (1988). Clusteranalyse als Verfahren der schliet3enden Statistik. Vanden-
hoeck and Ruprecht, GSttingen.
Johnson, M.E., Moore, L.M., and Ylvisaker, D. (1990). Minimax and maximin dis-
tance designs. J. Statist. Plann. Inference 26, 131-148.
Karlin, S. (1982). Some results on optimal partitioning of variance and monotonicity
with truncation level. Statistics and Probability: Essays in Honor of C. R. Rao,
375-382 (eds., G. Kallianpur et al.). North-Holland, Amsterdam.
Kawabata, T. and Dembo, A. (1994). The rate distortion dimension of sets and
measures. IEEE Trans. Inform. Theory 40, 1564-1572
220 Bibliography

Kemperman, J.H.B. (1987). The median of a finite measure on a Banach space.


Statistical Data Analysis based on the L1-Norm and related Methods, 217-230 (ed.,
Y. Dodge). North-Holland, Amsterdam.
Kershner, R. (1939). The number of circles covering a set. Amer. J. Math. 61,
665-671.
Klein, R. (1989). Concrete and Abstract Voronoi Diagrams. Lecture Notes in Com-
puter Science 400. Springer, New York.
Lalley, S. (1988). The packing and covering functions of some self-similar fractals.
Indiana Univ. Math. J. 37, 699-709.
Lamberton, D. and Pages, G. (1996). On the critical points of the 1-dimensional
competitive learning vector quantization algorithm. Proceedings of the ESANN'96
(Bruges, Belgium), 97-101.
Li, J., Chaddha, N., and Gray, R.M. (1999). Asymptotic performance of vector
quantizers with a perceptual distortion measure. IEEE Trans. Inform. Theory 45,
1082-1091.
Linder, T. (1991). On asymptotically optimal companding quantization. Problems
of Control and Information Theory 20, 475-484.
Linder, T., Lugosi, G., and Zeger, K. (1994). Rates of convergence in the source
coding theorem, in empirical quantizer design, and in universal lossy source coding.
IEEE Trans. Inform. Theory 40, 1728-1740.
Linder, T., Zamir, R., and Zeger, K. (1999). High-resolution source coding for non-
difference distortion measures: multidimensional companding. IEEE Trans. Inform.
Theory 45, 548-561.
Lloyd, S.P. (1982). Least squares quantization in PCM. IEEE Trans. Inform. Theory
28, 129-137.
Lookabaugh, T.D. and Gray, R.M. (1989). High resolution quantization theory and
the vector quantizer advantage. IEEE Trans. Inform. Theory 35, 1020-1033.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate
observations. Proc. Fifth Berkeley Symp. Math. Statist. Prob. 281-297, Univ.
California Press, Berkeley.
Mann, H. (1935). Untersuchungen fiber Wabenzellen bei allgemeiner Minkowski
Metrik. Monatsh. Math. Physik 42, 417-424.
Mattila, P. (1995). Geometry of Sets and Measures in Euclidean Spaces. Cambridge
Univ. Press, Cambridge.
Max, J. (1960). Quantizing for minimum distortion. IEEE Trans. Inform. Theory 6,
7-12.
McClure, D.E. (1975). Nonlinear segmented function approximation and analysis of
line patterns. Quart. Appl. Math. 33, 1-37.
Bibliography 221

McClure, D.E. (1980). Optimized grouping methods. Part 1 and part 2. Statistik
Tidskrift 18, 101-110, 189-198.
McGivney, K. and Yukich, J.E. (1997). Asymptotics for geometric location problems
over random samples. Preprint.
McMullen, P. (1980). Convex bodies which tile space by translation. Mathematika
27, 113-121. (Acknowledgement of priority: Mathematika 28, 191.)
Milasevic, P. and Ducharme, G.R. (1987). Uniqueness of the spatial median. Ann.
Statist. 15, 1332-1333.
Moiler, J. (1994). Lectures on Random Voronoi Tesselations. Lecture Notes in Statis-
tics 87. Springer, New York.
Moran, P.A.P. (1946). Additive functions of intervals and Hausdorff measure. Proc.
Cambridge Phil. Soc. 42, 15-23.
Na, S. and Neuhoff, D.L. (1995). Bennett's integral for vector quantizers. IEEE
Trans. Inform. Theory 41,886-900.
Newman, D.J. (1982). The Hexagon theorem. IEEE Trans. Inform. Theory 28,
137-139.
Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Meth-
ods. CBMS-NSF Regional Conference Series in Applied Math. Vol. 63. SIAM.
Okabe, A., Boots, B. and Sugihara, K. (1992). Spatial Tesselations: Concepts and
Applications of Voronoi Diagrams. Wiley, Chicester.
Pages, G. (1997). A space quantization method for numerical integration. J. Comput.
Appl. Math. 89, 1-38.
P~rna, K. (1988). On the stability of k-means clustering in metric spaces. Tartu
Riikliku 01ikooli Toimetised 798, 19-36.
P~rna, K. (1990). On the existence and weak convergence of k-centres in Banaeh
spaces. Tartu Ulikooli Toimetised 893, 17-28.
Panter, P.F. and Dite, W. (1951). Quantization distortion in pulse-count modulation
with nonuniform spacing of levels. Proc. Inst. Radio Eng. 39, 44-48.
Pearlman, W.A. and Senge, G.H. (1979). Optimal quantization of the Rayleigh prob-
ability distribution. IEEE Trans. Communications 27, 101-112.
Pierce, J.N. (1970). Asymptotic quantizing error for unbounded random variables.
IEEE Trans. Inform. Theory 16, 81-83.
Pisier, G. (1989). The Volume of Convex Bodies and Banach Space Geometry. Cam-
bridge University Press, Cambridge.
Pollard, D. (1981). Strong consistency of k-means clustering. Ann. Statist. 9,
135-140.
222 Bib~ography

Pollard, D. (1982a). Quantization and the method of k-means. IEEE Trans. Inform.
Theory 28, 199-205.
Pollard, D. (1982b). A central limit theorem for k-means clustering. Ann. Probab.
10, 919-926.
PStzelberger, K. and Felsenstein, K. (1994). An asymptotic result on principal points
for univariate distributions. Optimization 28, 397-406.
PStzelberger, K. (1998a). Asymptotik des Quantisierungsfehlers. Quantisierungsdi-
mension, Verallgemeinerung des Satzes von Zador und Verteilung der Prototypen.
Preprint.
PStzelberger, K. (1998b). Asymptotik des empirischen Quantisierungsfehlers und
Konsistenz des Sch~tzers oder Quantisierungsdimension. Preprint.
PStzelberger, K. and Strasser, H. (1999). Clustering and quantization by MSP-
partitions. Preprint.
Rachev, S.T. (1991). Probability Metrics and the Stability of Stochastic Models.
Wiley, Chicester.
Rachev, S.T. and Riischendorf, L. (1998). Mass Transportation Problems. Vol. 1
and Vol. 2. Springer, New York.
Rdnyi, A. (1959). On the dimension and entropy of probability distributions. Acta
Math. Sci. Hung. 10, 193-215.
Rhee, W.T. and Talagrand, M. (1989a). A concentration inequality for the k-median
problem. Math. Oper. Res. 14, 189-202.
Rhee, W.T. and Talagrand, M. (1989b). On the k-center problem with many centers.
Oper. Res. Letters 8, 309-314.
Rogers, C.A. (1957). A note on coverings. Mathematika 4, 1-6.
Sabin, M.J. and Gray, R.M. (1986). Global convergence and empirical consistency of
the generalized Lloyd algorithm. IEEE Trans. Inform. Theory 32, 148-155.
Schief, A. (1994). Separation properties for self-similar sets. Proc. Amer. Math.
Soc. 122, 111-115.
Schulte, E. (1993). Tilings. Handbook of Convex Geometry, 899-932. (eds., P.M.
Gruber and J.M. Wills). Elsevier Sciene Publishers.
Semadeni, Z. (1971). Banach Spaces of Continuous Functions. Polish Scientific Pub-
lishers, Warszawa.
Serinko, R.J. and Babu, G.J. (1992). Weak limit theorems for univariate k-mean
clustering under a nonregular condition. J. Multivariate Anal. 41, 273-296.
Serinko, R.J. and Babu, G.J. (1995). Asymptotics of k-mean clustering under non-
i.i.d, sampling. Statist. Probab. Letters 24, 57-66.
Bibfiography 223

Shannon, C.E. (1959). Coding theorems for a discrete source with a fidelity criterion.
IRE National Convention Record, Part 4, 142-163.
Singer, I. (1970). Best Approximation in Normed Linear Spaces by Elements of Linear
Subspaces. Springer, Berlin.
Small, C.G. (1990). A survey of multidimensional medians. Int. Statist. Review 58,
263-277.
Sp~th, H. (1985). Cluster Dissection and Analysis. Ellis Horwood Limited, Chich-
ester.
Stadje, W. (1995). Two asymptotic inequalities for the stochastic traveling salesman
problem. Sankhy~ 57, Series A, 33-40.
Steinhaus, H. (1956). Sur la division des corps materiels en parties. Bull. Acad.
Polon. Sci. 4, 801-804.
Stute, W. and Zhu, L.X. (1995). Asymptotics of k-means clustering based on pro-
jection pursuit. Sankhy~ 57, Series A, 462-471.
Su, Y. (1997). On the asymptotics of quantizers in two dimensions. J. Multivariate
Anal. 61, 67-85.
Tarpey, T. (1994). Two principal points of symmetric, strongly unimodal distribu-
tions. Statist. Probab. Letters 20, 253-257.
Tarpey, T. (1995). Principal points and self-consistent points of symmetric multivari-
ate distributions. J. Multivariate Anal. 53, 39-51.
Tarpey, T. (1998). Serf-consistent patterns for symmetric multivariate distributions.
J. Classification 15, 57-79.
Tarpey, T., Li, L., and Flury, B.D. (1995). Principal points and self-consistent points
of elliptical distributions. Ann. Statist. 23, 103-112.
Tou, J.T. and Gonzales, R.C. (1974). Pattern Recognition Principles. Addison-
Wesley, Reading.
Trushkin, A.V. (1984). Monotony of Lloyd's method II for log-concave density and
convex error weighting function. IEEE Trans. Inform. Theory 30, 380-383.
Vajda, I. (1989). Theory of Statistical Inference and Information. Kluwer, Dordrecht.
Wagner, T.J. (1971). Convergence of the nearest neighbor rule. IEEE Trans. Inform.
Theory 17, 566-571.
Webster, R. (1994). Convexity. Oxford University Press, Oxford.
Williams, G. (1967). Quantization for minimum error with particular reference to
speech. Electronics Letters 3, 134-135.
Wong, M.A. (1982). Asymptotic properties of bivariate k-means clusters. Comm.
Statist. Theory Methods. 11, 1155-1171.
224 Bibliography

Wong, M.A. (1984). Asymptotic properties of univariate sample k-means clusters. J.


Classification 1, 255-270.
Yarnada, Y., Tazaki, S., and Gray, R.M. (1980). Asymptotic performance of block
quantizers with difference distortion measure. IEEE Trans. Inform. Theory 26, 6-14.
Yamamoto, W. and Shinozaki, N. (1999). On uniqueness of two principal points for
univariate location mixtures. Statist. Probab. Letters 46, 33-42.
Yang, M.-S. and Yu, K.F. (1991). On a class of fuzzy c-means clustering procedures.
Proceedings of the 1990 Taipei Symposium in Statistics, 635-647, (eds., M.T. Chao
and P.E. Cheng). Institute of Statistical Science, Academia Sinica, Taipei.
Yukich, J.E. (1998). Probability Theory of Classical Euclidean Optimization Prob-
lems. Lecture Notes in Math. 1675. Springer, New York.
Zador, P.L. (1963). Development and evaluation of procedures for quantizing multi-
variate distributions. Ph.D. dissertation, Stanford Univ.
Zador, P.L. (1982). Asymptotic quantization error of continuous signals and the
quantization dimension. IEEE Trans. Inform. Theory 28, 139-149.
Zemel, E. (1985). Probabilistic analysis of geometric location problems. SIAM J.
Algebraic Discrete Methods 6, 189-200.
Zopp~, A. (1997). On uniqueness and symmetry of self-consistent points of univariate
continuous distributions. J. Classification 14, 147-158.
Symbols

B(~,~) closed ball with center a and radius r, 8


o

B(a, r) open ball with center a and radius r, 165


~(~) Borel sets, 20
cl closure, 11
Cr,,r(P), C~,r (X) set of all n-optimal sets of centers for P (for
X), 31
C~,oo(A) set of all n-optimal sets of centers for A of
order c~, 137
cony o~ convex hull of ~, 17
cr(P), cr(x) set of all centers of P (of the random variable
X) of order r, 20, 20
Da, D~ lattices, 117, 118
DE(c) double exponential distribution, 67
det(h) 111
Dr(a,b) double Gamma distribution, 99
dH Hausdorff metric, 57
diam(A) diameter of A, 24
dimB(K) lower box dimension of K, 158
dimB(K) upper box dimension of K, 158
dimB(K) box dimension of K, 159
dimn(A) Hausdorff dimension of A, 157
dimn(P) Hansdorff dimension of P, 158
dimn(P) upper rate distortion dimension, 161
dims(P) lower rate distortion dimension, 161
dimn(P) rate distortion dimension, 162
D,,,r(P) set of all n-optimal quantizing measures for
P of order r, 59
D..~,D__r(P) lower quantization dimension (of P) of order
r, 155
Dr, Dr(P) upper quantization dimension (of P) of order
r, 155
Dr, Dr(P) quantization dimension of P of order r, 155
D__~(K),D~(K), D~(K) (lower, upper) quantization dimension of K
of order co, 155
226 Symbols

d(x,A) distance from x to A, 8


E(c) exponential distribution, 67
e~,r(P) = Vn,r(P)Ur, 137
e,~,~(A) n-th covering radius for A, 137
en,oo(P) 138
7~ set of n-quantizers, 30
GL(a,b) generalized logistic distribution, 99
ar(~,b) generalized Gamma distribution, 99
H(a,b) Leibnitz halfspace, 9
HE(a,b) hyper-exponential distribution, 99
hr 94
?-lS(A) s-dimensional Hausdorff measure of A, 157
restriction of a (Hausdorff) measure to M,
165
H.(P) Renyi entropy of P, 133
H(P), H(X) differential entropy of P (of X), 133, 134
I6 unit matrix, 54
int interior, 9
I(P,Q) average mutual information of P and Q, 161
i(a) logistic distribution, 71
Med(X) set of medians of a real random variable X,
22
M,~,,.(A) normalized n-th quantization error for A of
order r, 31
M,~,~(A) 138
57
Mr(A) normalized r-th moment of A, 20
M~(A) 146
id(O, ~), d-dimensional normal distribution, 54, 106
N(O, 1) normal distribution, 55
N(e,A) 146
P. absolutely continuous part of P, 78
P(a,b) Pareto distribution, 99
p I , p o f -1 image measure, 33, 162
set of discrete probabilities with at most n
points in the support, 33
Pr 94
es singular part of P, 78
Qr(A) r-th quantization coefficient of A, 78, 81
Q(L)([0, 1] 6) r-th lattice quantization coefficient of [0, 1]d,
114
Qr(P), Qr(X) r-th quantization coefficient of P (of X), 81
Q(~R)([O,1]6) r-th regular quantization coefficient of [0, 1]a,
110
Symbols 227

Qoo(A) covering coefficient of A (or quantization co-


efficient of order co), 145
Q~)([o, 1V) lattice covering coefficient of [0, 1] d, 148
q~ 191
Rp,r rate distortion function, 161
S(a, b) separator of a and b, 11
s~,r(P), s~,r(x) set of all n-stationary sets of centers
for P (for X) of order r, 39
SS,~,r(P), SS,~,~(X) set of all elements of S~,r(P) (of S,~,r(X))
whose Voronoi diagram is a P-tesselation,
39
o% 192
supp(/~) topological support of a finite measure #, 24,
165
T(a, b;c) triangular distribution, 123
U(A) uniform distribution on A, 20
VarX variance of a real random variable X, 22
U~,~(P), W,~,~(X) n-th quantization error for P (for X) of order
r, 30
Vr(P), V~(X) r-th moment of P (of the random variable
X), 20, 20
W(al~) Voronoi region generated by a E c~, 8
W(a, b) Weibull distribution, 73, 127
W0(~l-) open Voronoi region generated by a E c~, 9

A symmetric difference, 27
~x point mass at x, 25
r(a,b) Gamma distribution, 72
r(e) maximal finite antichain generated by e, 192
1-dimensional Lebesgue measure, 55
A~ d-dimenisonal Lebesgue measure, 13
#(-IA) 26
pr Lr-minimM metric, 33, 140
(7- immediate predecessor of a, 191
Iol length of a, 191
tT[m restriction of a to m, 191
(7_<T a is a predecessor of 7-, 191

IAI cardinality of A, 13
1A indicator function of the set A, 31
D
----+ weak convergence, 57
0 boundary, 10
Lr(P)-norm of g, 137
IJhll~ Lp(Ag)-(quasi-)norm of h, 78
228 Symbols

V+f(x, y) one-sided directional derivative, 23


gradient, 23
[~] integer part of the number x, 82
(x,y) scalar product, 16
[I TI norm on R d, 7
Index

admissible, 111 empirical version, 34, 57, 151


arithmetic, 203 euclidean norm, 16
asymptotic covering radius, 142 Existence theorem, 47, 139
asymptotic quantization error, 78 exponential distribution, 67, 70
asymptotically n-optimal, 93, 96
attractor, 190 finite antichain, 191
average mutual information, 161 fundamental parallelotope, 111

Ball packing theorem, 50 Gamma distribution, 72


Boundary theorem, 13 generalized Gamma distribution, 213
box dimension, 159 generalized logistic distribution, 210

Cantor distribution, 206 Hausdorff dimension, 157


Cantor set, 172, 206 Hausdorff dimension of a measure, 158
center of P of order r, 20 Hausdorff metric, 57
checkerboard, 117 hexagonal lattice, 116
cluster analysis, 34 hyper-exponential distribution, 99
compact differentiable manifolds, 171 invariant distributions, 95
consistency, 62, 153 invariant set, 190
contracting similarity transformation,
190 /p-norms, 8
contraction number, 190 Lr-Kantorovich metric, 33
covering, 9 Lr-minimal metric, 33, 140
covering coefficient, 145 Lr-Wasserstein metric, 33
cube quantizer, 52 lattice, 111
curve, 180 lattice covering coefficient, 148
length of a curve, 180
d-asymptotics, 150 length of a, 191
density of the thinnest covering, 146 locally finite, 8
density of the thinnest lattice covering, logistic distribution, 71
149 lower box dimension, 158
diameter, 24 lower quantization dimension of order
differential entropy, 133 r, 155
directional derivative, 23 lower rate distortion dimension, 161
double exponential distribution, 69
double Gamma distribution, 99 ~packing, 50
dual lattice, 118 /~-tesselation, 15
empirical measure, 34, 61 n-optimal partition of order r, 32
230 Index

n-optimal quantizer of order r, 30 regular of dimension D, 165


n-optimal quantizing measure of order Renyi entropy, 133
r, 34
n-optimal set of centers, 31 s-dimensional Hausdorif measure, 157
n-optimal set of centers for A of order scaling number, 15, 190
co, 137 self-similar measure, 190
n-quantizer, 30 self-similar set, 172
n-stationary set of centers of order r, separator, 11
39 Sierpinski gasket, 172
n-th covering radius, 137 sign-symmetric distributions, 96
n-th quantization error of order r, 30 similarity dimension, 172, 190
necessary conditions for optimality, 37, similarity transformation, 15
38 smooth norm, 23
normal distribution, 54 space-filling, 107
normalized n-th quantization error of space-filling by translation, 107
order r, 31 spherical distribution, 53
normalized r-th moment, 20 standard lattice, 116
normalized r-th moment of balls, 27 star-shaped, 9
strictly convex norm, 11
one-dimensional marginals, 49 strong separation condition, 190
one-tailed version, 65 strongly unimodal distribution, 64
open set condition, 172, 191 surfaces of convex sets, 169
open Voronoi region, 9 symmetric distributions, 65

packing, 50 tesselation, 15
parametrization, 180 triangular distribution, 123
parametrization by arc length, 180 truncated octahedron, 118
Pareto-distribution, 99
uniform distribution, 20
polyhedral set, 17
unimodal distribution, 64
polytope, 17
Uniqueness theorem, 22, 64
predecessor, 191
upper box dimension, 159
product quantizer, 42
upper quantization dimension of order
quantization coeffÉcient of order co, 145 r, 155
quantization dimension of order r, 155 upper rate distortion dimension, 161

r-th (absolute) moment of P, 20 vector quantizer advantage, 104


r-th lattice quantization coefficient, 114 volume of/p-balls, 28
r-th quantization coefficient, 81 yon Koch curve, 172
r-th regular quantization coefficient, Voronoi diagram, 8
110 Voronoi partition, 9
rate distortion function of order r, 161 Voronoi region, 7
Rayleigh distribution, 73
Weibull distribution, 73, 123
rectifiable, 180
regular hexagon, 116
regular n-quantizer, 109

You might also like