Ledoux Concentration of Measure

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 250

THE CONCENTRATION OF MEASURE PHENOMENON

Michel Ledoux Departement de Mathematiques Universite Paul-Sabatier 31062, Toulouse (France) ledoux@cict.fr http://www.lsp.ups-tlse.fr/Ledoux/

Preliminary version (September 2000) First revision (December 2000)

TABLE OF CONTENTS

INTRODUCTION 1. CONCENTRATION FUNCTIONS AND DEVIATION INEQUALITIES 1.1 First examples 1.2 Concentration functions 1.3 Deviation inequalities 1.4 Observable diameters 1.5 Expansion coe cients 1.6 Laplace bounds and in mum-convolutions Notes and remarks 2. ISOPERIMETRIC AND FUNCTIONAL EXAMPLES 2.1 Isoperimetric examples 2.2 Brunn-Minkowski inequalities 2.3 Semigroup tools Notes and remarks 3. CONCENTRATION AND GEOMETRY 3.1 Spectrum and concentration 3.2 Spectral and diameter bounds 3.3 Levy families 3.4 Topological applications 3.5 Euclidean sections of convex bodies Notes and remarks 4. CONCENTRATION IN PRODUCT SPACES 4.1 Martingale methods 3

p. 5 p. 11 p. 11 p. 14 p. 17 p. 28 p. 30 p. 31 p. 38 p. 39 p. 39 p. 51 p. 58 p. 66 p. 69 p. 69 p. 77 p. 80 p. 82 p. 87 p. 94 p. 96 p. 97

4.2 Convex hull approximation p. 103 4.3 Control by several points p. 112 4.4 Convex in mum-convolution p. 116 4.5 The exponential measure p. 119 Notes and remarks p. 127 5. ENTROPY AND CONCENTRATION p. 129 5.1 Concentration and logarithmic Sobolev inequalities p. 129 5.2 Product measures p. 137 5.3 Modi ed logarithmic Sobolev inequalities p. 143 5.4 Discrete settings p. 153 5.5 Covariance identities p. 160 Notes and remarks p. 162 6. TRANSPORTATION COST INEQUALITIES p. 165 6.1 Information inequalities and concentration p. 165 6.2 Quadratic transportation cost inequalities p. 169 6.3 Transportation for product and non-product measures p. 177 Notes and remarks p. 184 7. SHARP BOUNDS ON GAUSSIAN AND EMPIRICAL PROCESSES p. 186 7.1 Gaussian processes p. 186 7.2 Bounds on empirical processes p. 194 7.3 Sharper bounds via the entropic method p. 199 Notes and remarks p. 209 8. FURTHER ILLUSTRATIONS p. 211 8.1 Concentration of harmonic measures p. 211 8.2 Concentration for independent permutations p. 217 8.3 Subsequences, percolation, assignment p. 222 8.4 The spin glass free energy p. 226 8.5 Concentration of random matrices p. 233 Notes and remarks p. 235 REFERENCES p. 237 INDEX p. 249

INTRODUCTION

The aim of this book is to present the basic aspects of the concentration of measure phenomenon. The concentration of measure phenomenon was put forward in the early seventies by V. Milman in the asymptotic geometry of Banach spaces. Of isoperimetric inspiration, it is of powerful interest in applications, in various areas such as geometry, functional analysis and in nite dimensional integration, discrete mathematics and complexity theory, and especially probability theory. This book is concerned with the basic techniques and examples of the concentration phenomenon with no claim to be exhaustive. A particular emphasis has been put on geometric, functional and probabilistic tools to reach and describe measure concentration in a number of settings. As mentioned by M. Gromov, the concentration of measure phenomenon is an elementary, yet non-trivial, observation. A rst illustration of this property is suggested by the example of the standard n-sphere Sn in Rn+1 when dimension n is large. By a standard computation, uniform measure n on Sn is almost concentrated when the dimension n is large around the diameter. More generally, as a consequence of spherical isoperimetry, given 1 , almost all points (in any measurable set A with, say, n(A) 2 the sense of the measure n) on Sn are within (geodesic) distance 1 from A , which becomes in nitesimal as n ! 1. Precisely, for p n every r > 0, n (Ar ) 1 ; e;(n;1)r2 =2 where Ar = fx 2 Sn d(x A) < rg. 5

This concentration property on the sphere may be described equivalently on functions, an idea going back to P. Levy. Namely, if F is a continuous function on Sn with modulus of continuity supfjF (x) ; F (y)j d(x y) < "g less than r > 0, then
n;

jF ; mj "

2 e;(n;1)r2

where m is a median of F for n. Therefore, functions on high dimensional spheres with small local oscillations are strongly concentrated around a mean value, and are thus almost constant on almost all the space! This high dimensional concentration phenomenon was extensively used and emphasized by V. Milman in his investigation of asymptotic geometric analysis. As yet another interpretation, the concept of observable diameter as considered by M. Gromov is a \visual" description of the concentration of measure phenomenon. We view the sphere with a naked eye which cannot distinguish a part of Sn of measure (luminicity) less than > 0 (small but xed). A Lipschitz function F may be interpreted as an observable, that is an observation device giving us the visual image measure of n by F . In this language, Levy's inequality on Lipschitz functions expresses that the \observable diameter of Sn" is of the order of p1n as n is large, in strong contrast with the diameter of Sn as a metric space. The concentration of measure phenomenon is often an high dimensional e ect, or as in laws of large numbers in probability theory, a property of a large number of variables. A probabilistic interpretation of the concentration phenomenon namely goes back to E. Borel who suggested the following geometric interpretation of the law of large numbers for sums of independent random variables uniformly distributed on the interval ;1 +1]. Let n be uniform measure on the n-dimensional cube ;1 +1]n. Let H be a hyperplane that is orthogonal to a principal diagonal of ;1 +1]n at the origin. Then, if Hr is the neighborhood of order r 0 of H , for every " > 0, n (H"pn) ! 1 as n ! 0. A related description is the following. Let X1 X2 : : : be a sequence of independent random variables taking the values 1 with equal probability, and set, for every n 1, Sn = X1 + + Xn. 6

We think of Sn as a function of the individual variables Xi and we state the classical law of large numbers by saying that Sn is essentially constant (equal to 0). Of course, by the central limit p theorem, the uctuations of Sn are of order n which is hardly zero. But as Sn can take values as large as n, this is the scale at which one should measure Sn in which case Sn=n is indeed essentially zero. In this context, according to M. Talagrand, one probabilistic aspect of measure concentration is that a random variable that depends (in a smooth way) on the in uence of many independent variables (but not too much on any of them) is essentially constant. Measure concentration is surprisingly shared by a number of cases that generalize the previous examples, both by replacing linear functionals (such as sums of independent random variables) by arbitrary Lipschitz functions of the samples and by considering measures that are not product. It was indeed again the merit of V. Milman of emphasizing the di erence between the concentration phenomenon and standard probabilistic view on probability inequalities and law of large number theorems by the extension to Lipschitz (and even Holder type) functions and more general measures. His enthusiasm and persuasion eventually convinced M. Talagrand of the importance of this simple, yet fundamental, concept. It will be one of the purposes of this book to describe some of the basic examples and applications of the concentration of measure phenomenon. While the rst applications were mainly developed in the context of asymptotic geometric analysis, they have now spread to a wide range of frameworks, covering areas in geometry, discrete and combinatorial mathematics, and in particular probability theory. Classical probabilistic inequalities on sums of independent random variables have been used namely over the years in limit theorems and discrete algorithmic mathematics. They provide quantitative illustrations of measure concentration by so-called exponential inequalities (mostly of Gaussian type as on the sphere above). Recent developments on the concentration of measure phenomenon describe far reaching extensions that provide dimension free concentration properties in product spaces which, due to the work of M. Talagrand during the last 7

decade, will form a main part of these notes. The book is divided into 8 chapters. The rst one introduces the notions and elementary properties of concentration functions, deviation inequalities and their more geometric counterparts as observable diameters. We also brie y indicate a few useful tools to investigate concentration properties. The second chapter describes some of the basic and classical isoperimetric inequalities at the origin of the concentration of measure phenomenon. However we do not concentrate on the usually rather delicate extremal statements, but rather develop some self-contained convexity and semigroup arguments to reach the concentration properties originally deduced from isoperimetry. Chapter 3 is a rst view towards geometric and topological applications of measure concentration. In particular, we describe there Milman's proof of Dvoretsky's theorem on almost spherical sections of convex bodies upon which V. Milman most vigorously emphasized the usefulness of concentration ideas. Chapter 4 investigates measure concentration in product spaces, mostly based on the recent developments by M. Talagrand. After a brief view of the more classical martingale bounded di erence method, we cover there convex hull and nite point approximations of powerful use in applications to both empirical processes and discrete mathematics. We also discuss the special concentration property of the exponential measure. The next two chapters emphasize functional inequalities stable by products towards a new approach to the results of Chapter 4. Chapter 5 is devoted to the entropic and logarithmic Sobolev inequality approach. We present there the Herbst method to deduce concentration from a logarithmic Sobolev inequality and describe the various applications to product measures and related topics. Chapter 6 is yet another form of concentration relying on information and transportation cost inequalities with which one may reach several of the conclusions of the preceding chapters. Chapter 7 is devoted to the probabilistic applications of concentration in product spaces to sharp bounds on sums of independent random vectors or empirical processes, at the origin of M. Talagrand's investigation. The last chapter is a selection of (recent) applications of the concentration of measure phenomenon to various areas such as statistical mechanics, geometric probabilities, discrete and algorithmic 8

mathematics. While we describe in this work a number of concentration properties put forward in several contexts, from more geometric to functional and probabilistic settings, we almost never discuss sharp constants, although the right orders will usually be the correct ones. This book is strongly inspired by early references on the subject. In particular, the lecture notes by V. Milman and G. Schechtman that describe the concentration of measure phenomenon and its applications to asymptotic theory of nite dimensional normed spaces were a basic source of inspiration during the preparation of this book. We also used the recent survey by G. Schechtman in the Handbook in the Geometry of Banach Spaces. (The latter handbook contains further contributions that illustrate the use of concentration in various functional analytic problems.) The memoir by M. Talagrand on isoperimetric and concentration inequalities in product spaces is at the basis of most of the material presented starting with Chapter 4, and the ideas developed there gave a strong impetuous to recent developments in various areas of probability theory and its applications. Several of the neat arguments presented in these references have been reproduced here. 1 Chapter of the recent book by M. GroThe already famous 3 2 mov served as a useful source of geometric examples. While many geometric invariants are introduced and analyzed there, our point of view is perhaps a bit more quantitative and motivated by a number of recent probabilistic questions. Each chapter is followed by some Notes and Remarks where credit has to be found. We apologize for inaccuracies and omissions. The notations used throughout this book are standard. Although we keep some consistency, we did not try to unify all the notations and often used the classical notation in a given context eventhough it might have been used before in another. I am grateful to Michel Talagrand for numerous discussions over the years on the topic of concentration and for explaining to me his work on concentration in product spaces. Parts of several joint works with Sergey Bobkov on concentration and related matters are reproduced here. I sincerely thank him for numerous 9

corrections and comments on the rst draft of the manuscript. Toulouse, December 2000 Michel Ledoux

10

1. CONCENTRATION FUNCTIONS AND DEVIATION INEQUALITIES

In this chapter, we introduce, with the rst examples of spherical and Gaussian isoperimetry, the concept of measure concentration as put forward by V. Milman M-S] and discuss its rst properties. We de ne the notion of concentration function and connect it with Levy's deviation and concentration inequalities for Lipschitz functions that provide a main tool in applications. The notion of observable diameter is another more geometric view to concentration. The last two sections of this chapter are devoted to the useful tools of expansion coe cients, Laplace bounds and in mum-convolution inequalities to explore concentration properties.

1.1 First examples


To introduce to the concept of measure concentration, we rst brie y discuss a few examples that will be further analyzed later on. A rst illustration is suggested by the example of the standard n-sphere Sn in Rn+1 when dimension n is large. By a standard computation, uniform measure n on Sn is almost concentrated when the dimension n is large around the (every!) diameter. Actually, the isoperimetric inequality on Sn expresses that spherical caps (geodesic balls) minimize the boundary measure at xed volume. In its integrated form (see Section 2.1), given a Borel set A 11

on Sn with the same measure as a spherical cap B, then for every r > 0, n (Ar ) n (Br ) where Ar = fx 2 Sn d(x A) < rg is the (open) neighborhood of order r for the geodesic distance on Sn. One main feature of concentration with respect to isoperimetry is to analyze this inequality for the large values of r rather than the small ones. The explicit evaluation of the measure of spherical caps (performed below in Section 2.1) then implies that given any measurable set A with, say, n(A) 1 2 , for every r > 0,
n (Ar )
2 1 ; e;(n;1)r =2:

(1:1)

Therefore, almost all points on Sn are within (geodesic) distance 1 from A which is of particular interest when dimension n is p n large. From a \tomographic" point of view (developed further in Section 1.4 below), the visual diameter of Sn (for n) is of the order of p1n as n ! 1 which is in contrast with the diameter of n as metric space. S This example is a rst, and main, instance of the concentration of measure phenomenon for which nice patterns develop as dimension is large. It furthermore suggests the introduction of a concentration function in order to evaluate the decay in (1.1). Setting namely
n (r)

= sup 1 ; n(Ar ) A
n (r)

n S
2

n (A)

1 2

r>0

the bound (1.1) amounts to say that

r > 0: (1:2) Note that r > 0 in (1.2) actually ranges up to the diameter of n and that (1.2) is thus mainly of interest when n is large. S By rescaling of the metric, the preceding results apply simin of radius R > 0. n on the n-sphere S larly to uniform measure R R In particular,
n (r) R

e;(n;1)r =2

e;(n;1)r =2R
2

r > 0:

(1:3)

12

Now, it is well-known that when n tends to in nity the mean converge to the canonical Gaussian measure on RN . sures p n The isoperimetric inequality on spheres may then be transferred to an isoperimetric inequality for Gaussian measures. Precisely, if = k is the canonical Gaussian measure on Rk with density 2 (2 );k=2 e;jxj =2 with respect to Lebesgue measure and if A is a Borel set in Rk with (A2) = (a) for some a 2 ;1 +1] where Rt ; 1 = 2 ; x =2 dx is the distribution function of the (t) = (2 ) ;1 e standard normal distribution on the line, then for every r > 0, (Ar ) (a + r ) : Here Ar denotes the r-neigborhood with respect to the standard Euclidean metric on Rk . Unless otherwise speci ed, Rk (or subsets of Rk ) will be equipped with the standard Euclidean structure ; Pk 1=2 induced by the norm jxj = i=1 x2 , x = (x1 : : : xk ) 2 Rk . i De ning similarly the concentration function for as (r) = sup 1 ; (Ar ) A
Rk

(A)

1 2

;r2 =2 , r > 0, we get in particular since (0) = 1 2 and 1 ; (r) e that 2 (r) e;r =2 r > 0: (1:4) p One may also think of (1.3) in the limit as n ! 1 with R = n. One may again interpret (1.4) by saying that given a set A with 1 , almost all points in Rk are within distance 5 or 10 (A) 2 say from the set A whereas of course Rk is unbounded. We have thus here a second instance of measure concentration with the particular feature that the concentration function of (1.4) does not depend on the dimension of the underlying state space Rk for the product measure = k . Our third example will be discrete. Consider the n-dimensional discrete cube X = f0 1gn and equip X with the Hamming metric

d(x y) = Card fxi 6= yi i = 1 : : : ng x = (x1 : : : xn ) y = (y1 : : : yn) 2 f0 1gn. Let be uniform (product) measure on f0 1gn de ned by (A) = 2;n Card (A),
13

(r) e;2r =n r > 0 (1:5) where the concentration function for on f0 1gn equipped with the Hamming metric isp de ned as above. In this case therefore, neigborhoods of order n of sets with half of the measure will almost cover f0 1gn (which is of metric diameter n). These rst examples of concentration properties all follow from more re ned isoperimetric inequalities. They will be detailled in the next chapter in which full proofs of the concentration (rather than isoperimetric) results will be presented. These examples will serve as guidelines for the further developments. They motivate and justify in particular the analysis of the concept of concentration function performed in the next section.
2

A X . Identifying the extremal sets A in X for which the in mum 1 g is attained may be used to show here that inf f (Ar ) (A) 2

1.2 Concentration functions


Motivated by the early examples of the preceding section, we introduce and formalize the concept of concentration function of a probability measure on, say, a metric space. The concentration examples of Section 1.1 rely namely on two main ingredients, a (probability) measure and a notion of (isoperimetric) enlargement with respect to which concentration is evaluated. Let thus (X d) be a metric space equipped with a probability measure on the Borel sets B of (X d). The concentration function (X d ) (denoted more simply (X ), or even , when the metric d, or the underlying metric space (X d), is implicit) is de ned as 1 r > 0: (1:6) (X d )(r) = sup 1 ; (Ar ) A X (A) 2 Here Ar = fx 2 X d(x A) < rg is the (open) r-neighborhood of A (with respect to d). A concentration function is less than or equal to 1 2 . When (X d) is bounded, the enlargements r > 0 in (1.6) actually range up to the diameter Diam(X d) = sup d(x y) x y 2 X 14

of (X d), the concentration function being 0 when r is larger than the diameter. This will however usually not be speci ed. In any case, the concentration function decreases to 0 as r ! 1. Indeed, 1 , choose r such that the x a point x in X . Given 0 < " < 2 measure of the complement of the ball B with center x and radius r is less than ". Then, any Borel set A such that (A) 1 2 intersects B. Hence A2r covers B and thus 1 ; (A2r ) 1 ; (B) < ". By de nition of the concentration function = (X d ), 1 given a set A with measure (A) 2 , the set of points which are within distance r > 0 from a point in A has measure larger than or equal to 1 ; (r). Concentration is concerned with the behavior of (r) as r is large. If necessary, we agree in the following that (0) = 1 2. The idea of the concentration of measure phenomenon is that, in a number of basic examples, (X d )(r) decreases rapidly as r is large. In particular, we say that has normal concentration on (X d) if there are constants C c > 0 such that, for every r > 0,
(X d )(r)

C e;cr2 :

(1:7)

As emphasized in Section 1.1, important examples share this normal concentration and we will often be concerned with this property throughout these notes. In particular, as we have seen with (1.2), the normalized invariant measure n on the standard nsphere Sn, n 2, satis es a normal concentration with C = 1 and c = (n ; 1)=2 that thus yields strong concentration when dimension n is large. By (1.4), the canonical Gaussian measure on Euclidean space satis es this concentration (with C = 1 and n c= 1 2 ). Concentration (1.5) on the cube f0 1g also belongs to this family (although the e ect of dimension has to be analyzed di erently in this example). Throughout these notes, we do not pay much attention on sharp constants in normal concentration, although when possible we try to reach the correct exponent c > 0. While the concentration function bounds 1 ; (Ar ) for any 1 , it also does when (A) set with (A) 2 " > 0. This is the content of the following easy and useful consequence of the de nition of a concentration function. 15

(r) for any r > 0 and r0 such that (r0 ) < ". Proof. Let r0 > 0 be such that (r0 ) < ". Denote by B the complement of Ar0 so that A is included into the complement of 1, Br0 . If (B) 2 (A) 1 ; (Br0 ) ( r0 ) < " 1 so that for every r > 0, which is impossible. Thus (Ar0 ) 2 1 ; (Ar0 +r ) (r): The lemma is proved. The following simple contraction property shows that concentration functions are decreasing under 1-Lipschitz mappings. Proposition 1.2. Let ' be a Lipschitz map between two metric spaces (X d) and (Y ) such that ; '(x) '(x0 ) c'd(x x0 ) for all x x0 2 X: Let be a probability measure on the Borel sets of X and denote by ' the measure pushed forward by '. Then, for every r > 0, ; (Y ' ) (r) (X d ) r=c' : For the proof, simply note that if A is a Borel set in Y , for every r > 0, ; ';1(Ar ) ';1(A) r=c' where the enlargements are understood with respect to d and respectively. A typical example of application of Proposition 1.2 arises when X is a group and Y is a quotient X=G equipped with the quotient metric (y y0 ) = inf d(x x0 ) '(x) = y '(x0 ) = y0 where ' : X ! X=G is the quotient map for which clearly c' = 1. 16

Lemma 1.1. If (A) " > 0, then 1 ; (Ar0 +r )

1.3 Deviation inequalities


In this paragraph, we discuss equivalent descriptions of concentration properties in terms of deviation and concentration inequalities for Lipschitz functions. Let as before (X d) be a metric space with a Borel probability measure . We de ned in the preceding section the concentration function (X d )(r), r > 0, as the in mum over all Borel sets 1 of 1 ; (Ar ) where we recall that Ar is the A with (A) 2 enlargement of order r of A in the metric d. One could also de ne, for every " > 0,
" (X d )(r) = sup 1 ; (Ar )

A X (A) "

r>0

leading essentially to the same concept by Lemma 1.1. The value 1 is however of particular interest through its connection with "= 2 medians. If is a probability measure on the Borel sets of (X d), and if F is a measurable real-valued function on (X d), we say mF is a median of F for if
;

fF mF g

1 2

and

fF mF g

1: 2

A median mF may not be unique. For a continuous function F on (X d), we denote by

!F ( ) = sup F (x) ; F (y) d(x y) <

>0

its modulus of continuity. If mF is a median of F for , and if A = fF mF g, note that whenever x is such that for some y 2 A, d(x y) < , then

F (x) F (y) + !F ( ) mF + !F ( ):
Hence, since (A) = (X d ),
;

1 , by de 2

nition of the concentration function ( ): (1:8)

F > mF + !F ( )
17

Similarly with A = fF mF g, ; F < mF ; !F ( ) ( ): In particular, ; jF ; mF j > !F ( ) 2 ( ): (1:9) On the n-dimensional sphere Sn for which n ( ) is small when n is large, it follows that functions with small local oscillation are nearly constant with on almost all the space (with respect to n). Inequalities (1.8) and (1.9) on the sphere are sometimes called Levy's inequalities. It will be more convenient in the sequel to work with Lipschitz functions and Lipschitz coe cient rather than with moduli of continuity. Let us detail again (1.8) and (1.9) on Lipschitz functions. A real-valued function F on (X d) is said to be Lipschitz if ) ; F (y)j < 1: kF kLip = sup jF (x d(x y) x6=y Clearly !F ( ) kF kLip for every > 0. We say that F is 1Lipschitz if kF kLip 1. The class of Lipschitz functions is stable by the operations min and max. If F is Lipschitz on (X d) and if A = fF mg, for every r > 0, Ar fF m + r=kF kLipg. Therefore, if m = mF is a median of F for , we get as for (1.8) that for every r > 0, ; ; fF mF + rg r=kF kLip : (1:10) We speak of (1.10) (and (1.8)) as a deviation inequality. By Lemma 1.1, if (fF mg) " > 0, then for every r > 0, ; ; fF m + r0 + rg r=kF kLip (1:11) where (r0 ) < ". Similarly, inequality (1.10) would hold with " (X d ) as soon as mF is such that (fF mF g) " > 0. But, as before, the particular choice of " = 1 2 allows us to repeat the same argument with ;F to get that ; ; fF mF ; rg r=kF kLip : (1:12) 18

Therefore, together with (1.10), we deduce that for every r > 0,


;

jF ; mF j r

r=kF kLip :

(1:13)

This inequality (as well as (1.9)) describes a concentration inequality of F around (one of) its median with rate . According to the relative size of and kF kLip , the Lipschitz function F \concentrates" around one constant value on a portion of the space of big measure. Moreover, it should already be emphasized that mF and kF kLip might be of rather di erent scales, an observation of fundamental importance in applications. By homogeneity, it is enough to consider (1.10) and (1.13) for 1-Lipschitz functions. The deviation or concentration inequalities on Lipschitz functions (1.10) and (1.13) are actually equivalent to the corresponding statement on sets. Let A be a Borel set in (X d) with (A) 1 2. Set F (x) = d(x A), x 2 X . Clearly kF kLip 1 while Hence, since F for every r > 0, 1: 2 0, 0 is a median of F for , and thus, by (1.10),
;

fF = 0g

(A)

1 ; (Ar ) = fF

rg

(r):

We may summarize these conclusions in a statement. Proposition 1.3. Let be a Borel probability measure on a metric space (X d). Let F be a real continuous function on (X d) and let mF be a median of F for . Then, for every > 0,
;

F > mF + !F ( )
;

( ):

In particular, if F is Lipschitz, for any r > 0,


;

fF mF + rg
2 19

r=kF kLip
;

and

fjF ; mF j +rg

r=kF kLip :

Conversely, if for some non-negative function on R+,


;

fF mF + rg

(r)

for any 1-Lipschitz function F with median mF and any r > 0, then . The previous proposition has the following interesting consequences. Corollary 1.4. If on (X d) has concentration function = (X d ), for any two non-empty Borel sets A and B in X , (A) (B) 4
;

d(A B)=2

where d(A B) = inf fd(x y) x 2 A y 2 Bg. Proof. Let 2r = d(A B) > 0. Consider the 1-Lipschitz function F (y) = d(y B) and denote by mF a median of F for . Since F = 0 on B and F 2r on A, (A) (B) (x y) F (x) ; F (y) 2 jF ; mF j r 4 (r)
;

2r

which is the desired result.

Corollary 1.5. If on (X d) has concentration function

= (X d ), for any 1-Lipschitz function F on (X d) and any r > 0, (x y) 2 X X F (x) ; F (y) r) 2 r : 2 Conversely, if for some non-negative function on R+ , all 1Lipschitz functions F and all r > 0, (x y) 2 X X F (x) ; F (y) then 2 . 20

r)

(r)

Proof. The rst assertion follows from the fact, as already used in Corollary 1.4, that

(x y) F (x) ; F (y)

2r

jF ; mF j r :

1 . Applying the hypothesis to Conversely, let A with (A) 2 F (x) = d(x A), x 2 A, we get as in Corollary 1.4 that for every r > 0,

(A) 1 ; (Ar )

(r)

(x y) F (x) ; F (y)

from which the desired claim follows. A particular situation for deviation inequalities under some level occurs for convex functions and the next statement is a short digression on this theme. Assume for simplicity that X = Rn equipped with its standard Euclidean metric j j and denote by rF the gradient of a smooth function F on Rn. Proposition 1.6. Let be a probability measure on the Borel sets of Rn, and let F be smooth and convex on Rn such that for some m 2 R and c > 0,
;

m jrF j c

" > 0:
(r)

Then, for every r > 0,


;

F m ; c(r0 + r)

where (r0 ) < ". Proof. Applying Lemma 1.1 to r1 = r0 + r, it is enough to show that whenever A = fF m jrF j cg, then

Ar1 fF > m ; cr1 g: But since F is (smooth and) convex, for any x y 2 Rn, F (y) F (x) + hrF (y) y ; xi:
21

Hence, if y 2 A and jx ; yj < r1 ,

F (x) F (y) ; rF (y) jx ; yj > m ; cr1 :


The proposition is proved. When m is a median of F in Proposition 1.6, it is usually 1 where impossible to expect that (A) 2

A = fF m jrF j cg:
To estimate from below (A) we may however write (A)
;

fF mg ;

jrF j > c :

(1:14)

If F is Lipschitz on Rn, by Rademacher's theorem, F is almost everywhere di erentiable and krF k1 = kF kLip. Proposition 1.6 thus shows that for Lipschitz convex functions, the deviation inequalities under some level m are governed using (1.14) by the L0-norm of the gradient of F rather than by its L1-norm as in (1.12). This is a useful observation in applications. Inequality (1.13) describes a concentration property of the Lipschitz function F around some median value mF . The median mF may actually be replaced by the mean of F . We rst show a converse result in this direction. Proposition 1.7. Let be a Borel probability measure on a metric space (X d). Assume that for some non-negative function on R+ and any bounded 1-Lipschitz function F on (X d),
;

fF

Fd + rg
;

(r)

(1:15)

for every r > 0. Then 1 ; (Ar )


(X d )(r)

(A) r

for every Borel set A with (A) > 0 and every r > 0. In particular,
;1

r > 0:

22

Moreover, if is such that limr!1 (r) = 0, any 1-Lipschitz function F is integrable with respect to and, provided is continuous, satis es (1.15). Proof. Let A with (A) > 0 and x r > 0. Consider F (x) = min(d(x A) r). Clearly kF kLip 1 while
Z

Fd
;

1 ; (A) r:

By the hypothesis, 1 ; (Ar ) = fF rg ; R F Fd + (A) r ; (A) r :


1 , 1 ; (Ar ) In particular, if (A) 2 (r 2 ) so that the rst claim follows. Let now F be a 1-Lipschitz function on (X d). For every n 0, Fn = min(jF j n) is again 1-Lipschitz and bounded. Applying (1.15) to ;Fn, for every r > 0,

fFn R Fn d ; rg (r): (1:16) 1 and r0 such that (r0 ) < Choose m such that (fjF j mg) 2 1
2 . Since for every

n,

1 2 intersecting with (1.16) for r = r0 , we get that, independently of n, R Fnd m + r0 R and thus jF jd < 1 by monotone convergence. Apply then (1.15) to min(max(F ;n) n) and let n ! 1. Proposition 1.7 is established. In Proposition 1.7, we loose a factor 2 in the concentration function. This may be improved by an iteration of the argument.
;

fFn mg

23

In the same spirit, observe that (1.15) applied to ;F contains a concentration inequality around the mean
;

jF ; Fd j r

2 (r):

(1:17)

Now, if r0 is chosen so that 2 (r0 ) < 1 2 , any median mF of F is such that R mF ; Fd r0 : It follows together with (1.17) that
;

jF ; mF j r + r0

(r)

r > 0:

(1:18)

In case of normal concentration (r) = C e;cr2 for example, we then get via Proposition 1.3 a concentration function asymptotically of the same order as r ! 1, namely
2 (r) C 0 e;cr +c r
0

r > 0:

We will not be concerned in this work with sharp constants in concentration functions, and usually present bounds on concen1. tration functions using the simple Proposition 1.7 with a factor 2 Moreover, in normal concentration functions (r) = C e;cr2 , we will never describe sharp values of C . However, when the sharp concentration inequalities for Lipschitz functions around the median or the mean are available, we present them simultaneously. As we will see in the next chapter, isoperimetric inequalities do usually provide optimal concentration functions. The next proposition formalizes the argument leading to (1.18) with the mean replaced by any constant value. We may actually work at the level of one single function F . Proposition 1.8. Let F be a measurable function on some probability space (X B ). Assume that for some aF 2 R and for 1 and some non-negative function on R+ such that (0) 2 limr!1 (r) = 0,
;

jF ; aF j r
24

(r)

for all r > 0. Then


;

j F ; m F j r + r0

(r)

where mF is a median r0 > 0 is such that R 1 of F for and where R 1 (r0 ) < 2 . If = 0 (r)dr < 1, then jaF ; Fd j , and for every r > 0,
;

jF ; Fd j r +
C 0 e;c rp
0

(r):

In particular, if (r) C e;crp , 0 < p < 1, r > 0, then


;

jF ; M j r

r>0

where M is either the mean or a median of F for and where C 0 c0 > 0 anly depend on C c and p. Proof. The rst part follows from the argument leading to (1.18). For the second, note that
Z

jF ; aF jd =
R

jF ; aF j r dr
R

Therefore jF jd < 1 and jaF ; Fd j from which the desired claim easily follows. Normal concentration implies strong integrability properties of Lipschitz functions. This is the content of the simple proposition that immediately follows by integration in r > 0. Proposition 1.9. Let F be a measurable function on some probability space (X B ) such that for some aF 2 R and some constants C c > 0,
;

jF ; aF j r
Z

C e;cr2

for every r 0. Then e F2d < 1 25

for every < c. Proof. From the hypothesis, for every r jaF j,
;

jF j r
Z

jF ; aF j r ; jaF j
Z

C e;c(r;jaF j)2 :

Now, by Fubini's theorem,


2 e F d = 1+

F+ e a2
2 e aF +

0 Z 1
Z

2r
jaF j 1

jF j r e r2 dr
;

2r

jF j r e r2 dr

jaF j

2 2 2C r e;c(r;jaF j) e r dr

from which the conclusion follows. The following proposition is a further description of normal concentration. Proposition 1.10. Let be a Borel probability measure on a metric space (X d). Then (X d ) has normal concentration if and only if there is a constant C > 0 such that for every q 1 and every 1-Lipschitz function F on (X d),

F ; Fd

C pq

where k kq is the Lq -norm with respect to . Proof. If (r) C e;cr2 , r 0, we know from Proposition 1.7 that for every 1-Lipschitz function F , and every r 0,
;

fjF ; R Fd j rg
Z

2C e;cr :
2

Then, for q 1,

F ; Fd

1 q q;1 ;fjF ; R Fd = qr q Z0 1 2 2Cqrq;1 e;cr dr


0

j rg dr

26

from which the implication follows since


Z

1
0

rq;1 e;cr2 dr qq=2

as q ! 1. Conversely, By Chebyshev's inequality, for every r > 0 and q 1, ; R fjF ; Fd j rg C pq r;q from which normal concentration follows by optimization in q 1. As a consequence of Proposition 1.10, if (X d ) has normal concentration, there exists C > 0 such that for every q 1 and every Lipschitz function F on (X d),

kF kq kF k1 + C pq kF kLip:

(1:19)

Propositions 1.9 and 1.10 clearly extend to concentration functions of the type C e;crp , p > 0 (or more general su ciently small concentration functions). The growth rate in q 1 is then q1=p in Proposition 1.10. Proposition 1.7 is a convenient tool to handle concentration in product spaces with which we conclude this section. If (X d) and (Y ) are metric spaces, we equip the cartesian product space X Y with the `1-metric

d(x x0 ) + (y y0 ) x x0 2 X y y0 2 Y:

(1:20)

Proposition 1.11. Let and be Borel probability measures


on metric spaces (X d) and (Y ) respectively. Assume that there are non-negative functions and on R+ such that whenever F : X ! R and G : Y ! R are bounded and 1-Lipschitz on their respective spaces, then for every r > 0,
;

fF

Fd + rg Gd + rg
27

(r) (r):

and

fG

Then, if is the product measure of and on X Y equipped 1 with the ` -metric (1.20), for any bounded 1-Lipschitz function F on the product space X Y and any r > 0,
;

fF

Fd

+ 2rg

(r) + (r):

y Proof. R y Set, for every x 2 X , yy 2 Y , F (x) = F (x y ) and G(y ) = F d , and observe that F and G are 1-Lipschitz on their respective spaces. Therefore, for every r > 0,
;

fF

+ 2rg (x y) 2 X Y F y (x) ; R + G Gd + r (r) + (r):

Fd

F yd + r

The proposition is established. While Proposition 1.11 describes concentration results in product spaces, these are not well suited to concentration bounds which are independent on the number of spaces in the product (dimension free concentration). It will be one main question adressed in Chapters 4, 5 and 6 to develop tools to reach such dimension free bounds of fundamental importance in applications.

1.4 Observable diameters


The notion of observable diameter is somewhat dual to concentration function. It describes the diameter of a metric space (X d) viewed through a given probability measure on the Borel sets of (X d). We x > 0 to be thought of as small. According to M. Gromov Grom2], de ne rst the partial diameter PartDiam (X d) of (X d) with respect to as the in mal D such that there exists a subset A of X with diameter less than or equal to D and measure (A) 1 ; . This diameter is clearly monotone for the Lipschitz ordering: if ' : (X d) ! (Y ) is 1-Lipschitz, and if ' is the pushed forward measure by ', then PartDiam ' (Y ) 28

PartDiam (X d) (for all > 0). What is not obvious is that the partial diameter may dramatically decrease under all 1-Lipschitz maps from X to a certain Y , that we always take to be R below. We then de ne the observable diameter ObsDiam (X d) of (X d) with respect to as the in mum of PartDiam F (R) over all image measures F of by a 1-Lipschiz map F : X ! R. Following Grom2], we think of as a state on the con guration (X d) and a Lipschitz map F : X ! R is interpreted as an observable giving the tomographic image F on R. We watch F and can only distinguish a part of its support of measure 1 ; . The next simple statement connects the observable diameter ObsDiam (X d) with the concentration function = (X d ). Let ;1 be the generalized inverse function of the non-increasing function , that is ;1 (") = inf r > 0 (r) " " > 0:

Proposition 1.12. Let be a probability measure on the Borel


sets of a metric space (X d). Then ObsDiam (X d) 2 ;1 2 :
Proof. It is an easy consequence of the concentration inequalities for Lipschitz functions of the preceding sections. Namely, if F : X ! R is 1-Lipschitz, we know from (1.13) that for every r > 0,

2 (r) where mF is a median of F for . Hence, if F denotes the image measure of by F , the interval ]mF ; r mF + r has length 2r and F -measure larger than or equal to 1 ; 2 (r). The proposition then follows from the de nition of the inverse function ;1 of . As an example, if has normal concentration (r) C e;cr2 , r > 0, on (X d), then 2C : ObsDiam (X d) 2 1 log c 29
r

jF ; mF j r

(1:21)

The important parameter in (1.21) is the rate c in the exponential decay of the concentration function, the value of C > 0 being usually a numerical constant C = 1 or 2 that simply modi es the numerical value of by a factor. For example, by (1.2), (1.21) for the standard n-sphere Sn may loosely be described by saying that 1 ObsDiam n (Sn) = O p n as n is large, which is of course in strong contrast with the diameter itself of Sn as a metric space. Similarly, the observable diameter of Euclidean space with respect to Gaussian measures is bounded.

1.5 Expansion coe cients


Expansion coe cients are a natural multiplicative description of exponential concentration. As before, let be a probability measure on the Borel sets of a metric space (X d). De ne the expansion coe cient of on (X d) of order " > 0 as 1 Exp (") = inf e 1 (B") e (B) B X (B" ) 2 where we recall that B" is the (open) "-neighborhood of B with respect to d. The de nition of Exp (") shows that whenever B is such that (Bk") 1 2 for some integer k 1, then (B) Exp (");k (Bk") 1 Exp (");k : (1:22) 2 In particular, if Exp (") > 1, B has very small measure. Thinking of B as the complement of some large r-neighborhood of a set A with measure (A) 1 2 immediately leads to the exponential decay of the concentration function of on (X d). Proposition 1.13. If for some " > 0, Exp (") e > 1, then e e;r(log e)=" r > 0: (r) 2 30

1 . If B is the Proof. Let A be a Borel set in (X d) with (A) 2 complement of Ak" , then Bk" is containde in the complement of 1 . Therefore, by (1.22), A and thus (Bk" ) 2 1 ; (Ak" ) 21 ek : We then simply interpolate between k" and (k + 1)" to get the desired result.

1.6 Laplace bounds and in mum-convolutions


In this section, we provide some simple and useful tools to establish concentration properties, either through exponential deviation inequalities for Lipschitz functions or by in mum-convolution arguments. Let (X d) be a metric space and let be a probability measure on the Borel sets of (X d). De ne, for 2 R, the Laplace functional of on (X d) as E(X d )( ) = sup e F d where the supremum runs over all (bounded) mean zero 1-Lipschitz functions F on (X d). We often write more simply E = E(X d ). Replacing F by ;F , it is enough to consider E ( ) for 0. The following elementary proposition bounds the concentration function (X d ) of on (X d) by E(X d ). Proposition 1.14. Under the preceding notation, ; r=2 E(X d )( ) r > 0: (X d )(r) inf0 e In particular, if E(X d )( ) e 2=2c 0 then, for every 1-Lipschitz function F : X ! R and every r 0,
; Z

fF

Fd + rg
31

e;cr2 =2

and (X d ) has normal concentration


(X d )(r)

e;cr =8
2

r > 0:

For the proof, simply note that by Chebyshev's exponential inequality, for any mean zero 1-Lipschitz function F : X ! R, and any r and > 0,
;

fF rg

e; r E(X d )( ):

Optimizing in , the various conclusions immediately follow from Proposition 1.7. As in Proposition 1.2, the Laplace functional is decreasing under 1-Lipschitz maps. Moreover, the Laplace functional E(X d ) is a convenient tool to handle concentration in product spaces with respect to the `1-metric. If (X d) and (Y ) are two metric spaces, we equip the product space X Y with the metric (1.20)

d(x x0 ) + (y y0 ) x x0 2 X y y0 2 Y:

Proposition 1.15. Under the preceding notation,


E(X Y d+
)

E(X d )E(Y

):

Proof. It is similar to the proof of Proposition 1.11. Let F be a mean zero 1-Lipschitz function on the product space. Set, for R every x 2 X , y 2 Y , F y (x) = F (x y) and G(y) = F y d , and observe that F y and G are 1-Lipschitz on their respective spaces. We then write, for every 0,
Z

e Fd

eG

y y e F (x); F d d (x) d (y)


Z

E(X d )( ) e Gd from which the claim follows since Gd = 0. 32


R

The next statement is a simple illustration of the use of the Laplace functional. It will describe a basic concentration property in product spaces with respect to the `1-metric. Proposition 1.16. If Diam (X d) = D, then, for any probability measure on the Borel sets of (X d),
2 2 E(X d )( ) eD =2

0:

Proof. Let F be a mean zero 1-Lipschitz function on (X d). By Jensen's inequality, for every 0,
Z

e Fd

ZZ

(D )2i i=0 (2i)! 2 2 eD =2 : As a consequence of Propositions 1.14, 1.15 and 1.16, we get the following important corollary. Corollary 1.17. Let P = 1 n be a product probability measure on the cartesian product X = X1 Xn of metric spaces (Xi di ) with nite Pn diameters Di , i = 1 : : : n, equipped 1 with the ` -metric d = i=1 di. Then, if F is a 1-Lipschitz function on (X d), for every r 0,

1 X

e F (x);F (y)] d (x)d (y)

P fF

FdP + rg
e;r =8D
2 2

2 2 e;r =2D

P 2 where D2 = n i=1 Di . In particular,

P (r)

r > 0:

Applied to sums S = Y1 + + Yn of real-valued independent bounded random variables on some probability space ( A P), 33

Corollary 1.17 yields a Hoe ding type inequality Hoe] (see Sto], MD2]) ; 2 2 P S E (S ) + r e;r =2D (1:23)
n kY k2 . Inequalities such as for every r 0 where D2 i=1 i 1 (1.23) have actually a long run in the study of limit theorems in classical probability theory going back to S. Bernstein, A. N. Kolmogorov, Y. Prokhorov etc (cf. e.g. Sto]). In Corollary 1.17, we may consider in particular the trivial metric on each factor so that any product probability measure P on a product X = X1 Xn equipped with the Hamming metric
P

d(x y) =

n X i=1

1fxi6=yi g = Card f1 i n xi 6= yi g

(where we denote by x = (x1 : : : xn ) and y = (y1 : : : yn ) the coordinates of the points x y in the product space X ), has concentration 2 (1:24) P (r) e;r =8n r > 0 in accordance with (1.5). As already mentioned next to Proposition 1.11, this approach is however not well suited to concentration bounds with respect to `2 -metrics (such as the Euclidean metric) and which are independent on the number of spaces in the product (dimension free concentration) such as Gaussian measures. In terms of the observable diameter of the product space X = X1 Xn with respect to the Hamming metric and any product probability measure P , ObsDiamP (X ) = O n :
;

(1:25)

In Chapter 4, we establish normal concentration by showing that for every Borel set A in X ,
Z
2 ecd( A) d

1 (A)

(1:26)

34

where d(x A) is the distance from the point x to the set A. Indeed, under (1.26), for every r > 0, ; 1 e;cr2 1 ; (Ar ) = d( A) r (A) so that (r) 2 e;cr2 , r > 0. In the same spirit, one may investigate in mum-convolution inequalities. Given a space X , consider a non-negative cost function c : X X ! R+. A typical example is the quadratic cost 1 jx ; y j2 on Rn Rn . Given a real-valued meac(x y) = c(x;y) = 2 surable function f on X , denote by Qc f the in mum-convolution of f with respect to the cost c de ned by

Qc f (x) = yinf f (y) + c(x y) 2X

x 2 X:

If is a probability measure on the Borel sets of X , and c a non-negative measurable cost function on X , we say that satis es an in mum-convolution inequality with respect to the cost c if for all bounded measurable functions f on X ,
Z

eQc f d

e;f d

1:

(1:27)

If we adopt the convention that +1 0 1, the above inequality extends to all R-valued functions f . The preceding de nition is motivated by its connection to concentration. Proposition 1.18. If satis es an in mum-convolution inequality with respect to the cost c, then, for every Borel set A and every r > 0, ; 1 e;r : 1 ; yinf c ( y ) < r 2A (A) The proof simply follows by applying (1.27) to the function f that is equal to 0 on A and +1 outside. Then Qc f (x) r if and only if inf y2A c(x y) r. Conclude with Chebyshev's exponential inequality. 35

A typical application is of course the cost given by the distance c(x y) = d(x y) on a metric space (X d). One may also consider powers of the distance functions. The example of c d(x y)2 x y 2 X c(x y) = 2 (1:28) for some c > 0 is of particular interest. Indeed, it should be noted that by Jensen's inequality, inequality (1.27) implies that for every bounded measurable function f on X ,
Z

eQc f d

e fd :

(1:29)

Now, for the choice (1.28) of the cost c, whenever F is Lipschitz on (X d) with Lipschitz coe cient kF kLip , for every x 2 X , h c d(x y)2 i Qc F (x) F (x) + yinf ; k F k d ( x y ) + Lip 2X 2 1 kF k2 : F (x) ; 2 c Lip It thus follows from (1.29) that
Z

eF d

2 e Fd +kF kLip=2c:

We thus recover from Proposition 1.18 the concentration result deduced from Laplace bounds in Proposition 1.14. We will come back to this aspect in Chapter 6. Another instance of interest is the case of a topological vector space X with its Borel - eld. Given a some measurable nonnegative function c on X , consider the cost c(x ; y), x y 2 X . The concentration result of Proposition 1.18 then amounts to the inequality ; 1 e;r r > 0 1 ; A + fc < r g (1:30) (A) where we denote by A + B the Minkowski sum

A + B = fx + y x 2 A y 2 Bg
36

of two sets A and B in X . As in Proposition 1.2 and for the above Laplace bounds, the in mum-convolution property (1.27) satis es a useful contraction property. Indeed, let satisfy an in mum-convolution inequality with respect to a cost function c on X X . Let e be a cost function on Y Y and let ' : X ! Y such that ; e '(x) '(y) c(x y) for all x 2 X , y 2 Y . Then the image measure of by ' satis es an in mum-convolution inequality with cost e on Y . Indeed, it is enough to observe that for every f : Y ! R, Qc (f ') (Qe f ) '. One basic aspect of the in mum-convolution inequality is its stability by products. Proposition 1.19. If and satisfy in mum-convolution inequalities with respect to costs c and e on X and Y respectively, then satis es an in mum-convolution inequality with respect to the cost c + e. Proof. Given f = f (x y) on the product space, set f y (x) (x y) R =Qf f y c for every y 2 Y and apply (1.27) for to g(y) = log e d . Since Z Q g e e eQc+ef d the conclusion follows. As a consequence of Proposition 1.19, each time a measure satis es the in mum-convolution inequality (1.27), the product measure n on X n satis es the concentration inequality of Proposition 1.18 with the sum of the costs along each coordinate. With respect to Proposition 1.15, this result allows us to deal with more general metrics than the `1-metric on product spaces, such as for example the `2-metric related to the quadratic cost. In particular, if P = 1 n is any product measure on Rn , in order that P has normal concentration with respect to the Euclidean metric, it su ces to know that each one dimensional marginal i satises an in mum-convolution inequality (1.27) with the same cost c(x) = cx2, x 2 R, for some c > 0. We develop application of this principle in Chapters 4 and 6. 37

Notes and Remarks


The de nition of concentration function was rst introduced in A-M]. It is formalized in Gr-M1] and further analyzed in MS], references from where the elementary properties of Section 1.1 are taken. Concentration and deviation inequalities for Lipschitz functions have been emphasized by P. Levy Le] and are sometimes called Levy's inequalities. Their equivalence with concentration inequalities are also part of the folklore of the subject. Sharper constants in normal deviations inequalities are discussed in Bob6]. Many properties discussed in Section 1.2 of this chapter may be found in the reference M-S]. Further introductions to the concentration of measure phenomenon and its applications are Mi5] and Sche5]. Observable diameters are described by M. Gromov in Grom2] where further geometric invariants related to concentration are analyzed. Expansion coe cients are also discussed there and used in Gr-M1] (see Chapter 3). Inequalities on Laplace transforms are a traditional tool to produce (normal) concentration. In mum-convolution inequalities were introduced in this way by B. Maurey in Mau2] inspired by investigations by M. Talagrand Tal3]. The various results on concentration under in mum-convolution inequalities of Section 1.5 are taken from Mau2] and will be revisited in Chapter 6.

38

2. ISOPERIMETRIC AND FUNCTIONAL EXAMPLES

As already illustrated in the rst section of the rst chapter, isoperimetric inequalities are a basic source of examples of concentration functions. We present in the rst sections of this chapter the basic isoperimetric and Brunn-Minkowski inequalities leading to measure concentration (and cover in particular more carefully the early examples of Section 1.1). We however do not present proofs of these isoperimetric inequalities but rather provide in Section 2.3 complete functional arguments (mainly from semigroup theory) to reach the concentration examples initially derived from isoperimetry. Namely, while the rst concern of isoperimetry is extremal sets, the concentration phenomenon dealing with large neighborhoods is a milder property that may be established far outside the isoperimetric context as will be amply demonstrated throughout these notes. Geometric examples involve lower bounds on the Ricci curvature as parts of Riemannian comparison theorems. These are treated here with rather elementary functional tools (Bochner's formula and convexity criterions) that do not require any deep knowledge in Riemannian geometry.

2.1 Isoperimetric examples


Isoperimetric inequalities are a basic source of examples of the concentration principle. However, their main interest lies in extremal sets and surface measures, whereas concentration deals with big 39

enlargements. As such, concentration covers situations far outside isoperimetric considerations. This section thus only describes the concentration properties that follow from isoperimetry rather than the isoperimetric inequalities themselves. Let (X d) be a metric space equipped with a (non necessarily nite) Borel measure . The boundary measure or Minkowski content of a Borel set A in X with respect to is de ned as + (A) = lim inf 1 (A n A) (2:1) r r!0 r where we recall that Ar = fx 2 X d(x A) < rg is the (open) r-neighborhood of A (with respect to d). The isoperimetric function of is the largest function I on 0 (X )] such that + (A) I ; (A) (2:2) holds for every Borel set A X . When A is such that +(A) = I ( (A)), A has minimal boundary measure among sets of the same measure, and A is said to be extremal. Isoperimetry thus expresses that whenever A is a measurable set with the same volume as an extremal set B, then the boundary measure of A is larger than or equal to the boundary measure of B. One of the main interests in isoperimetric inequalities lies in explicit expressions for extremal sets. Unfortunately, extremal sets are rather di cult to determine. One motivation for the concentration phenomenon is that methods and results are available for large families of examples for which the characterization of isoperimetric extremal sets is simply hopeless. The isoperimetric function I is explicitely known only in a few cases. The most notable ones are the constant curvature spaces, going back to the works of P. Levy Le] and E. Schmidt Schmi]. Namely, let X be either the Euclidean n-space Rn, the standard n-sphere Sn Rn+1 with its geodesic metric or the ndimensional hyperbolic space H n with its hyperbolic metric. Equip these spaces with their Riemannian volume element dv = d (normalized in case of the sphere) and denote by v(r) the measure of a ball with radius r > 0. Then I = v 0 v ;1 : (2:3) 40

1=n a(n;1)=n in case of Lebesgue measure In particular, I (a) = n!n n on R where !n is the volume of the Euclidean unit ball. Extremal sets are given in each case by geodesic balls. Further examples will be discussed later. One most fruitful such examples is Gaussian isoperimetry that will be described below as limiting spherical isoperimetry. The rst statement is the bridge between isoperimetric and concentration properties. We assume there for simplicity that is a Borel measure on a metric space (X d) such that the liminf in the de nition (2.1) of +(A) is a true limit for A given by a nite union of open balls and such that the family of these subsets is a determining class for . These assumptions are easily seen to cover the preceding classical examples, as well as for example measures which are absolutely continuous with respect to Lebesgue measure on Rn. Proposition 2.1. Let be as above. Assume that I v0 v;1 for some strictly increasing di erentiable function v : R ! 0 (X )]. Then, for every r > 0, ; ; v;1 (Ar ) v;1 (A) + r:

Proof. By the hypotheses, it is enough to assume that A is given by a nite union of open balls. The family of such sets is closed under the operation A 7! Ar , r > 0. Now, the function h(r) = v;1(( (Ar )) satis es + (Ar ) h0 (r) = v0 v;1 ( (Ar )) 1 R so that h(r) = h(0) + 0r V 0 (s)ds v;1( (A)) + r which is the desired claim. Note that conversely, if ; ; v;1 (Ar ) v;1 (A) + r r > 0 then, for every subset with (A) < 1, h i + (A) lim inf 1 v v ;1 ; (A) + r ; (A) r!0 r; (2:4) 0 = v v;1 (A) :

41

For the constant curvature spaces, v(s) is the measure of a geodesic ball with radius s > 0, and I = v0 v;1, so that Proposition 2.1 may be expressed equivalently as the comparison result (Ar ) (Br )

r>0

(2:5)

as soon as A is a set with (A) = (B) where B where B is a ball with radius s0 . This geometric description emphasizes the interest in explicit knowledge of extremal sets to bound below (Ar ) for any set A. The di erence in perspective between isoperimetry and concentration is however that the latter makes use of (2.5) for the large values of r with no emphasis on surface measure and extremal sets. If is a probability measure on the Borel sets of (X d), recall its concentration function = (X d ). Corollary 2.2. Let be a probability measure on (X d) and assume as in Proposition 2.1 that I v0 v;1. Then,
(X d )(r)
; 1 ; v v;1( 1 2) + r

r > 0:

Denote for example by n the normalized volume element on the unit sphere Sn equipped with its geodesic distance, n 2. Then, for every 0 < r ,

v(r) = s;1
n
R

sinn;1 d

where sn = sn;1 0 sinn;1 d is the total volume of Sn. Since v;1( 1 2 ) = 2 , let us evaluate 1 ; v ( 2 + r) for 0 < r 2 . We have 1 ; s;1
n
Z ( =2)+r

sinn;1

1 d = s; n 1 = s; n

Z Z

=2

( =2)+r =2

sinn;1 d

cosn;1 d :

42

By the change of variables 2 =2 ; t cos t e , 0 t 2,


Z

= = n ; 1, and the inequality


p

=2

Z ( =2) n;1 1 n ; 1 cos d =p d cosn;1 p p n ; 1 Zr n;1 n;1 1 pn1; 1 p e; 2 =2 d r n;1 p

2(n ; 1)

e;(n;1)r =2
2

;r2 =2 , r 0. To evaluate sn from where we used that 1; (r) 1 2e below, note that by sn = (n;2)=(n;1)]sn;2, p integration by pnparts n 2, so that n ; 1 sn ; 3 sn;2. Hence pn1 sn 2, n 2. As a consequence, we may state the following important consequence. Theorem 2.3. On the standard n-sphere Sn, n 2, equipped with its geodesic metric d and normalized volume element ,
(Sn d n )

e;(n;1)r =2
2

r>0

(actually 0 < r ). Notice that Theorem 2.3 applied to sets the diameter of which tends to zero implies the classical isoperimetric inequality in Euclidean space. As already emphasized in Section 1.1, this property is of special interest when the dimension n is large in which case a very 1 , yields a set of almost full small enlargement, of the order of p n measure. In other words, given a set A with n(A) 1 most of 2, 1 p the points on the sphere are within (geodesic) distance n from A. By Proposition 1.3, for every continuous function F on Sn and every > 0,
;

jF ; mF j > !F ( )
43

2 e;(n;1) 2=2

(2:6)

where mF is a median of F and !F the modulus of continuity of F . When n is large, functions with small oscillations are thus

almost constant. Inequality (2.6) is sometimes referred to as Levy's inequality. In terms of the observable diameter, 1 : ObsDiam n (Sn) = O p n (2:7)

For the n-sphere Sn R with radius R > 0 equipped with the n, normalized invariant measure R
n (r) R

e;(n;1)r =2R
2

r > 0:

(2:8)

The isoperimetric inequality on spheres has been extended by M. Gromov Grom1], using ideas going back to P. Levy, as a comparison theorem for Riemannian manifolds with strictly positive curvature. Let (X g) be a compact connected smooth Riemannian manifold of dimension n ( 2) with Riemannian metric g, equipped with the normalized Riemanian volume element d = dv V where V is the total volume of X . Denote by c(X ) the in mum of the Ricci curvature tensor over all unit tangent vectors, and assume that c(X ) > 0. The n-sphere Sn R with radius R > 0 is of n; 1 . Ricci curvature is a way to deconstant curvature c(Sn ) = R R2 scribe the variations of the Riemannian measure with respect to the Euclidean one. We refer to C-E], G-H-L], Cha2]... for standard introductions to curvature in Riemannian geometry as well as for some classical examples (see also below). While Ricci curvature appears as a crucial geometric parameter in the subsequent statements, we provide in Section 2.3 a simple functional approach to these results relying only on the use of Bochner's formula (2.27) that does not require any deep understanding of Riemannian geometry. Furthermore, the log-concavity conditions of Theorem 2.6 and Proposition 2.16 below may be used to get some insight on the geometric content of curvature in this context. Theorem 2.4. Let (X g) be a compact connected smooth Riemannian manifold of dimension n ( 2) equipped with the normalized Riemanian volume element d = dv V such that c(X ) > 0. Then n I IR 44

n;1 where R > 0 is such that c(Sn R ) = R2 = c(X ). Such a comparison theorem strongly emphasizes the importance of a model space, here the sphere, to which manifolds may be compared. It does not yield any information on the extremal sets for I itself. Equality in the theorem occurs only if X is a sphere. As a consequence of (2.8), under the hypothesis of Theorem 2.4, ;cr2 =2 r > 0 (2:9) (X g ) (r) e

where c = c(X ) > 0 is the in mum of the Ricci curvature tensor over all unit tangent vectors. Theorem 2.4 includes a number of geometric examples of interest for which a lower bound on the Ricci curvature is known some of which we discuss now. It is clear that (Riemannian) products of spheres or of Riemannian manifolds with a common lower bound c > 0 on the Ricci curvatures will satisfy Theorem 2.4 and therefore measure concentration holds independently on the number of factors in the product. This is in contrast with the `1-products as analyzed in Proposition 1.11 and Corollary 1.17. Let On be the group of all n n real orthogonal matrices. For 1 k n, let
Wn k

= e = (e1 : : : ek ) ei 2 Rn hei ej i = ij 1 i j k

be the so-called Stiefel manifolds. Equip W n k with the metric ; Pk 1 = 2 n d(e f ) = i=1 jei ; fi j2 . Note that W n = O n, W n = Sn;1 1 n n and W n n;1 = SO = fT 2 O det(T ) = 1g. In general, W n k may be identi ed with O n=On;k via the map ' : On ! W n k, '(e1 : : : en) = (e1 : : : ek ). 1 The value of c(SOn) is known and equal to n; 4 C-E]. Theren fore, the normalized Haar measure on SO has a normal con2 =8 ; ( n ; 1) r centration function e . By Proposition 1.2, this property is inherited by the quotients of SO n. In particular, all the Stiefel manifolds have normal concentration. 45

Similar conclusions hold for Grassman manifolds. Recall that the Grassman manifold G n k , 1 k n, is a metric space of all k dimensional subspaces of Rn with the Hausdor distance between the unit spheres of the subspaces of E and F : d(E F ) = sup d(x Sn;1 \ E ) x 2 Sn;1 \ F : Since G n is again a quotient of SO n, (normalized Haar) measure k on G n k has normal concentration. A wealth of further geometric examples are presented and discussed in Grom2] on the basis of Theorem 2.4, including subvariaetes in Sn, real and complex projectives spaces etc. For example, the (Lipschitz) Hopf bration S2n+1 ! C Pn indicates that 1 : ObsDiam(C Pn) = O p n Of more delicate algebraic and metric geometry, the observable diameter of a complex algebraic submanifold X C Pn of xed codimension and degree and normalized Riemannian volume with the induced path metric satis es n : ObsDiam(X ) = O log n We refer to Grom2] for further details. It is well known that uniform measures on n-dimensional p spheres with radius n approximate (when projected on a nite number of coordinates) Gaussian measures (Poincare's lemma) MK], Str2], Le3]. In this sense, the isoperimetric inequality on spheres gives rise to an isoperimetric inequality for Gaussian measures. Extremal sets are then half-spaces (which may be considered as balls with centers at in nity). Let, more precisely, = k be the canonical Gaussian measure on the Borel sets of Rk with density (2 );k=2 e;jxj2=2 with respect to Lebesgue measure. k with its usual Euclidean metric induced by the norm Equip; R Pk 1=2 jxj = i=1 x2 , x = (x1 : : : xk ) 2 Rk . i Theorem 2.5. For any k 1, I = 0 ;1 46

t e;x2 =2 dx, t 2 ;1 +1], is the distriwhere (t) = (2 );1=2 ;1 bution function of the canonical Gaussian measure in dimension one. Moreover, the equality +(A) = I ( (A)) holds if and only if A is a half-space in Rk . Using that 1 ; (r) e;r2 =2, r > 0, as a consequence of Theorem 2.5 and Corollary 2.2,

e;r =2 r > 0: (2:10) Alternatively, one may perform the Poincare limit on (2.8). In particular, by Proposition 1.3, for every 1-Lipschitz function F on Rk and every r 0,
(R k ) (r)
2

e;r2 =2 (2:11) where mF is a median of F for . In terms of observable diameters, ObsDiam (Rk ) = O(1) (2:12) independently of k 1. One important feature of both (2.10) and Theorem 2.5 is indeed the dimension free character of the isoperimetric and concentration functions of Gaussian measures. This observation opens the extension to in nite dimensional Gaussian analysis. The preceding results for canonical Gaussian measures immediately extend to arbitrary Gaussian measures replacing the standard Euclidean structure by the covariance structure. As for Theorem 2.4, Theorem 2.5 may be turned into a comparison theorem. Due to the dimension free character of Gaussian isoperimetry, this is better presented in terms of (strictly) logconcave probability measures (on a nite dimensional state space). The following statement is moreover a concrete illustration of the geometric content of Ricci curvature in the preceding Riemannian language. Theorem 2.6. Let be a probability measure on Rn with smooth strictly positivy density e;U with respect to Lebesgue measure such that, for some c > 0, Hess U (x) c Id as symmetric matrices uniformly in x 2 Rn. Then, pc I : I

fF mF + rg

47

r>0 where Rn is equipped as usual with the standard Euclidean metric. The convexity condition in Theorem 2.6 is stable under various operations. For example, image measures of log-concave measures in convex domains in Rn under linear projections Rn ! Rm are log-concave with the same log-concavity bound. See further Grom2] for log-concave measures in codimension. Although Theorem 2.3 may be properly modi ed so to yield similarly the case of dimension one, that is of the circle, it is more simple to deduce this case by contraction of the Gaussian measure following Proposition 1.2. We may even state the result on the product space. Let n be uniform measure on 0 1]n (with the induced Euclidean metric), that is the product measure of Lebesgue measure on 0 1] in each directions. Then n is the image of = n by the map ' = n where we recall that is the distribution function of the standard normal distribution in dimension one. Since c' = (2 );1=2 , we may state the following consequence. Constant is not sharp. Proposition 2.7. Let = n be uniform measure on 0 1]n. Then ; r2 r > 0: ( 0 1]n ) (r) e
In the same spirit, one may transfer concentration for Gaussian measures to spheres. This the content of the following proposition that we state quite informally. Proposition 2.8. Concentration for Gaussian measures implies concentration on spheres. Proof. The image measure of the canonical Gaussian measure on n Rn+1 by the map x 7! jx xj is . Given then a 1-Lipschitz function n , de ne G : Rn+1 ! R by F : Sn ! R ;, and ; x x0 a xed point on G(x) = jxj F jxj ; F (x0 ) (G(0) = 0). Then G = F ; F (x0 ) on n and kGkLip 2 . Let Mn be a median of x 7! jxj with respect S 48

In particular,

(R n ) (r)

2 e;cr =2

to on Rn+1. For every r > 0,


n

(x y) 2Rn+1

o y G jx ; G 3 r xj jy j o y x r (x y) G M ; G M n no n x + 2 x G jx ; G r : xj M

Rn+1 n

x jxj ; 1 G jx ; G 2 xj Mn Mn 2 -Lipschitz on Rn+1. Apply now Corollary 1.5 and G( Mn ) is M n together with (2.10) to G( Mn ) and Proposition 1.3 together with jxj . We get (2.10) to x 7! M n n o x y 3r (x y) G jxj ; G jyj 2 2 2 2 2 2 2 e;Mn r =32 + 4 e;Mn r =8 : pn. The claim easily A simple computation shows that Mn follows by a further use of Corollary 1.5.
We close this section with some discrete examples. We start by a further isoperimetric inequality, on the discrete cube, and with the corresponding concentration phenomenon. Equip namely the n-dimensional discrete cube X = f0 1gn with the Hamming metric ; d(x y) = Card fxi 6= yi i = 1 : : : ng x = (x1 : : : xn ) y = (y1 : : : yn ) 2 f0 1gn and with the uniform (product) measure (A) = 2;n Card (A), A X . Identifying the 1g extremal sets A in X for which the in mum inf f (Ar ) (A) 2 is attained Ha] shows the following concentration result. With a weaker numerical constant, this was already observed in (1.24). Theorem 2.9. For uniform measure = n on f0 1gn,
(f0 1gn ) (r)

Now,

e;2r2 =n 49

r > 0:

Expander graphs may also satisfy some measure concentration. Let X = (V E ) be a nite graph with set of vertices V and set of edges E . Assume that each vertex has at most a xed number k0 of adjacent edges. A subset A of V has boundary @A consisting of all vertices in the complement of A that are adjacent to A. Assume now that X satis es the isoperimetric inequality

Card(@A) C ;1Card(A)

(2:13)

1 Card(V ). Then (2.13) for all subsets A of V such that Card(A) 2 amounts to the expansion property 1 > 1: Exp (1) 1 + C Here is normalized counting measure on V and we give the path metric to X having all edges of unit length. By Proposition 1.13, each such expander graph has thus exponential concentration. The isoperimetric inequality (2.13) is actually related to the so-called Cheeger isoperimetric constant of useful interest in Riemannian geometry (see Section 3.1). Assume for simplicity that is a probability measure on the Borel sets of a metric space (X d) such that, for some constant c > 0 and all Borel sets A in X ,

c min (A) 1 ; (A) : (2:14) This isoperimetric inequality is of weaker type than the spherical and Gaussian type isoperimetric inequalities. As an easy consec R r e;cjxjdx, r 2 R, quence of Proposition 2.1 with v(r) = 2 ;1 has an exponential concentration function. Proposition 2.10. Let be a probability measure on (X d) satisfying (2.14). Then ;cr r > 0: (X d )(r) e
+ (A)

A functional description of this result and its relation with spectrum will be provided in Section 3.1. 50

2.2 Brunn-Minkowski inequalities


Geometric and functional Brunn-Minkoswki inequalities may be used to provide simple but powerful concentration results. The classical Brunn-Minkowski inequality indicates that for all bounded measurable sets A B in Rn, voln(A + B)1=n voln(A)1=n + voln(B)1=n (2:15) where A + B = fx + y x 2 A y 2 Bg is the Minkowski sum of A and B and where voln( ) is the volume element in Rn (Lebesgue measure). In its equivalent multiplicative form, for every 2 0 1], voln A + (1 ; )B Indeed, under (2.15),
; voln A + (1 ; )B 1=n ;

voln(A) voln(B)1; :

(2:16)

voln(A)1=n + (1 ; )voln(B)1=n voln(A) voln(B)1;


1=n

by the arithmetic-geometric mean inequality. Conversely, if A0 = voln (A);1=n A and B0 = voln(B);1=n B, then (2.16) implies that voln ( A0 + (1 ; )B0 ) 1 for every 2 0 1]. Since

A+B A0 + (1 ; )B0 = vol (A)1=n + voln(B)1=n n


1=n vol ( A ) n = vol (A)1=n + vol (B)1=n n n (2.15) immediately follows by homogeneity. The Brunn-Minkowski inequality may be used to produce a simple proof of Euclidean isoperimetry by taking B to be the ball with center the origin and radius r > 0: then (2.15) shows that
; voln(Ar )1=n = voln A + B 1=n voln(A)1=n + v(r)1=n

for

51

where we recall that v(r) is the volume of the Euclidean ball of radius r > 0. Since v1=n is linear,
1=n ; voln(A)1=n + v(r)1=n = v v;1 voln(A) + r

v;1 voln(Ar ) v;1 voln(A) + r which amounts to isoperimetry by Proposition 2.1. The multiplicative form of the Brunn-Minskowski inequality admits a functional version. Theorem 2.11. Let 2 n0 1] and let u, v, w be non-negative measurable functions on R such that for all x y 2 Rn, w x + (1 ; )y
Then
Z ;

so that

u(x) v(y)1; :
Z

(2:17) (2:18)

wdx

udx

vdx

1;

Applied to the characteristic functions of bounded measurable sets A and B in Rn , it yields the multiplicative form (2.16) of the geometric Brunn-Minkowski inequality. For the sake of completeness, we present a proof of the functional Brunn-Minkowski theorem. Proof. We start with n = 1 and then performR induction R on dimension. By homogeneity, we may assume that udx = vdx = 1 and by approximation that u and v are continuous with strictly positive values. De ne then x y : 0 1] ! R by
Z x(t)

;1

u(q)dq = t
;

Z y(t)

;1
;

v(q)dq = t:

Therfore x and y are increasing and di erentiable and

x0 (t)u x(t) = y0 (t)v y(t) = 1:


52

Set then z(t) = x(t) + (1 ; )y(t), t 2 0 1]. By the arithmeticgeometric inequality, for every t,

z0 (t) = x0 (t) + (1 ; )y0 (t)

; x0 (t) y0 (t) 1; :

(2:19)

Now, since z is injective, and by the hypothesis (2.17) on w and (2.19),


Z

wdx

Z 1

Z 1 ;

; w z(t) z0 (t)dt ; ; ; u x(t) v y(t) 1; x0 (t) y0 (t) 1; dt ; ; u x(t) x0 (t) v y(t) y0 (t) 1; dt

Z 1

= 1:

This proves the case n = 1. It is then easy deduce the case of Rn by induction on n. Suppose n > 1 and assume the Brunn-Minkowski theorem holds in Rn;1. Let u v w non-negative measurable functions on Rn satisfying (2.17) for some 2 0 1]. Let q 2 R be xed and de ne uq : Rn;1 ! 0 1) by uq (x) = u(x q) and similarly for vq and wq . Clearly, if q = q0 + (1 ; )q1 , qo q1 2 R,

wq x + (1 ; )y
Z Z
R n;

uq (x) vq (y)1;
Z

for all x y 2 Rn;1. Therefore, by the induction hypothesis,


R n;

wq dx 1
Z Z
R n;

uq dx 1
Z

R n;

vq dx 1
Z

1;

:
1;

Finally, applying the one-dimensional case once more shows that


Z

wdx =

wq dx dq 1
53

udx

vdx

which is the desired result. Theorem 2.11 is established.

Brunn-Minkowski inequalities may be used to produce concentration type inequalities, and to recover in particular the application to concentration of Theorem 2.6. We however start with a milder result for log-concave measures. On Rn (for simplicity), say a Borel probability measure is log-concave if for every Borel sets A and B in Rn, and every 2 0 1], ; A + (1 ; )B (A) (B)1; : (2:20) If has strictly positive density e;U , (2.20) amounts to U convex. If K is any convex body of nite volume in Rn, the probability measure voln(A \ K ) K (A) = vol (K ) n is log-concave by the Brunn-Minkowski inequality. The following is a concentration result for homothetics. Proposition 2.12. Let be log-concave. Then, for all convex symmetric sets A in Rn with measure (A) = a > 0,

a (r+1)=2 1 ; (rA) a 1 ; a
for every r > 1. In particular, if a > 1 2,
; log 1 ; (rA) < 0: lim sup 1 r

r!1

Proof. Since A is convex and symmetric, one may check that


Rn

2 nA r+ 1
;

Rn

r ; 1 A: n (rA) + r +1

Then, by (2.20), 1 ; (A) 1 ; (rA) 2=(r+1) (A)(r;1)=(r+1) 54 from which the result follows.

If F is linear on Rn, and if A = fjF j ag with a 3 4 for example, then we get from Proposition 2.12 that for every r > 1,
;

jF j ar

e;r=2 :

Integrating in r shows that for some numerical constant C > 0, kF kq Caq for every q 1 where k kq is the Lq -norm with respect to . Since we may choose a = 4kF k1, it follows that, for some C > 0 and all q 1,

kF kq CqkF k1:

(2:21)

This result applies in particular to Lebesgue measure K on a convex body K . This reverse Holder inequality is rather useful in the geometry of convex bodies. We now consider strict log-concavity condition. Assume indeed that is a probability measure with smooth strictly positive density e;U with respect to Lebesgue measure such that for some c > 0 and all x y 2 Rn, y c jx ; yj2: (2:22) U (x) + U (y) ; 2U x + 2 4 The typical example is of course the canonical Gaussian measure on Rn for which c = 1. Given a bounded measurable function f on Rn, apply then the functional Brunn-Minkowski Theorem 2.11 to

u(x) = eQc f (x);U (x) v(y) = e;f (y);U (y)


where Qc=2 f is the in mum-convolution c jx ; yj2 Qc=2 f (x) = yinf f ( y ) + n 2R 4

w(z) = e;U (z) x 2 Rn:

By de nition of Qc=2 f and the convexity hypothesis (2.22) on U , 1 so that condition (2.17) is satis ed with = 2 1
Z

eQc=2 f d 55

e;f d :

We then make use of Proposition 1.18 to obtain a rst alternate argument for measure concentration of the isoperimetric Theorems 2.5 and 2.6. Theorem 2.13. Let d = e;U dx where U satis es (2.22). Then, (r) 2 e;cr =4
2

r > 0:

(r) 2 e;r =4 r > 0 for the canonical Gaussian measure on Rn. Note that the constants provided by Theorem 2.13 are not quite optimal. Theorem 2.13 applies similarly to arbitrary norms k k on Rn instead of the Euclidean norm in (2.22). Concentration then takes place with respect to the metric given by the norm. In the last part of this section, we discuss how Brunn-Minkowski inequalities may be further used to determine concentration properties for the surface measure on the unit ball of uniformy convex spaces, extending the case of the standard sphere. A normed space X = (Rn k k) is said to be uniformly convex if for each " > 0 there is (") > 0 such that whenever x y 2 X with kxk = kyk = 1 and kx ; yk = ", then
2

In particular

yk 1 ; kx + 2
n

("):

The modulus of convexity is thus de ned as the function of ", 0 < " 2,
o x + y X (") = inf 1 ; 2 kxk = kyk = 1 kx ; yk " : Lp-spaces, 1 < p < 1, are uniformly convex with modulus of convexity of power type (") C"max(p 2) for every ". Further examples are discussed in Li-T]. Let now X be uniformly convex and let B be the unit ball of (X k k). Denote by B the normalized uniform volume element

56

on B. Given two non-empty sets A A0 B at distance " 2 (0 1), we have by de nition of the modulus of convexity that 1 (A + A0 ) ;1 ; (") B: X 2 Hence, by the Brunn-Minkowski inequality,
1=2 (A) 1=2 (A0 )

1 ; X (") n:
1 2,

Taking for A0 the complement of A", we get, when (A)


B (A" )
; 1 ; 2 1 ; X (") 2n 1 ; 2 e;2n X ("):

(2:23)

Following the argument developed for the standard sphere in Proposition 2.8, this measure concentration result for B may be transferred to the normalized surface measure @B of B with respect to itself de ned by
@B (A) = B
0 t 1

tA

A @B:

(2:24)

Given namely a 1-Lipschitz function F on @B, and x0 2 @B ; x xed, set G(x) = kxk F ( kxk ; F (x0 ) , x 2 B (G(0) = 0). Then kGkLip ;4 and G = F ; F (x0) on @B. Moreover, the distribution x with respect to B is equal to the distribution of of x 7! G kx k F ; F (x0 ) under @B . On the basis of (2.24), the proof of Proposition 2.8 yields similarly the following concentration property for the surface measure @B on the boundary of the unit ball of an uniformly convex space. Theorem 2.14. Denote by @B surface measure as de ned in (2.24) on the boundary of the unit ball B of an uniformly convex space X = (Rn k k) with modulus of convexity X . Then,
(X k k @B ) (r)

2 e;cn X (r) 57

r>0

where c > 0 is numerical.

Now (") C"max(p 2) on Lp-spaces, 1 < p < 1. Therefore, n is the ball in Rn for the norm if Bp

jxjp =

n X i=1

jxi jp

1=p

x = (x1 : : : xn ) 2 Rn
;

n ) = O n;1=max(p 2) ObsDiam (Bp In particular, if p 2, n ) = O;n;1=p : ObsDiam (Bp (2:25) This covers in particular the standard sphere Sn itself. We will see at the end of Chapter 4 that (2.25) actually holds in this form for any 1 p < 1.

n , or its boundary, and if is normalized Lebesgue measure on Bp

2.3 Semigroup tools


In this section, we present some rather elementary semigroup arguments to reach the full concentration properties on spheres and manifolds with strictly positive curvature as well as for Gaussian measures and strictly log-concave distributions derived in Section 2.1 from sharp isoperimetric inequalities. The functional approach we adopt, relying only on Bochner's formula below, allows a neat and tireless treatment of the geometric Ricci curvature. We start with the case of the sphere, or more generally of a Riemannian manifold with a strictly positive lower bound on the Ricci curvature. We then describe how the formal argument extends. Let thus as in Section 2.1 (X g) be a compact connected smooth Riemannian manifold of dimension n ( 2) equipped with the normalized Riemanian volume element d = dv V where V is the total volume of X . Denote by c = c(X ) the in mum of the Ricci curvature tensor over all unit tangent vectors, and assume that c > 0. We use Laplace bounds to establish the following result. By Proposition 1.7, it shows that
(X g )(r)

2 e;cr2 =8 r > 0: 58

Riemannian manifold of dimension n ( 2) equipped with the normalized Riemanian volume element d = dv V such that c(X ) > 0. For any 1-Lipschitz function F on X , and any r 0,
;

Proposition 2.15. Let (X g) be a compact connected smooth


fF
R

Fd + rg

2 e;cr =2 :

Proof. By Proposition 1.14, it is enough to show that

E(X g )( ) e 2 =2c

(2:26)

for every 0 where we recall that E(X g ) is the Laplace functional of de ned in Section 1.6. Let r be the gradient on (X g) and be the LaplaceBeltrami operator. As a consequence of Bochner's formula (see G-H-L], Cha2]), for every smooth function f on X , 1 ;jrf j2 ; rf r( f ) = kHessf k2 + Ric(rf rf ) (2:27) 2 2 where kHessf k2 is the Hilbert-Schmidt norm of the Hessian of f and Ric is the Ricci tensor. By the hypothesis, 1 ;jrf j2 ; rf r( f ) c jrf j2: (2:28) 2 This is the only place where Ricci curvature is really used for the concentration result of Proposition 2.15. We namely take now (2.28) into account by functional semigroup tools. Consider the heat semigroup Pt = et , t 0, with generator the Laplace operator on X (cf. Yo], Davie2], F-O-T], Bak1], Bak2]). Given a (regular) real-valued function f on X ,

u = u(x t) = Pt f (x)

x2X t 0

is thus solution of the initial value partial di erential (heat) equation @u = u (2:29) @t 59

with u( 0) = f . Let f be smooth on X . It is an immediate consequence of (2.28) that, for every t 0,

jrPtf j2 e;2c Pt jrf j2 :


h

(2:30)

Indeed, let t 0 be xed and set (s) = e;2cs Ps(jrPt;sf j2 ), 0 s t. By (2.29),


0 (s) = 2e;2cs

;cPs jrPt;sf j2 ) i ; ; 2 ; rPt;sf r( Pt;s f ) : + Ps 1 jr P f j t ; s 2

Hence, by (2.28) applied to Pt;sf for every s, is non-decreasing and (2.30) follows. Now, let F be a mean zero 1-Lipschitz function on (X g) and let 0. We may and do assume that F is smooth enough for what 1 almost everywhere. Set R follows. In particular jrF j (t) = e Pt F d , t 0. Since (1) = 1 and since, by (2.30),

jrPtF j2 e;2ct
we can write for every t 0, (t) = 1 ; =1; =1+ 1+
Z

1 1

tZ
2 2

0 (s) ds
Z ;

t Z t Z t

1 1

PsF e

Ps F d Ps F d

ds ds

rPsF 2 e

e;2cs (s) ds

where we used integration by parts in the space variable. By Gronwall's lemma, Z (0) = e F d e 2=2c 60

which is (2.26), at least for a smooth function F . The general case follows from a standard regularization procedure. The proof is complete. The principle of proof of Proposition 2.15 extends to further examples of analytic interest that will cover the concentration results of Theorem 2.6. Consider namely, on Rn , a Borel probability measure with density e;U with respect to Lebesgue measure where U is a smooth function on Rn. We assume on U some convexity assumption that corresponds to the Ricci curvature lower bound in a Riemannian context. Namely, assume that for some c > 0, Hess U (x) c Id uniformly in x 2 Rn. The typical example is the Gaussian density U (x) = jxj2=2 (up to the normalization factor) for which c = 1. Consider then the second order di erential operator L = ; rU r on Rn with invariant measure . Integration by parts for L reads as
Z

f (;Lg)d = hrf rgid

for smooth f g. It is a mere exercize to check the analogues of (2.28) and (2.30), namely 1 L;jrf j2 ; rf r(Lf ) = kHessf k2 + Hess (U )(rf rf ) 2 2 and, by the convexity assumption on U , 1 L;jrf j2 ; rf r(Lf ) c jrf j2 : (2:31) 2 Following then the standard theory of self-adjoint operators by Hille and Yoshida Yo], we may consider under mild growth conditions on U , the invariant and time reversible semigroup (Pt )t 0 with in nitesimal generator L (cf. also Du-S], Fr], Yo], Davie1], Roy]... and below for a probabilistic outline). In particlular theretf fore, @P @t = LPt f . Strict convexity of U easily enters this framework. For example, in case of the canonical Guassian measure on Rn, Lf (x) = f (x) ;hx rf (x)i for f smooth enough, and the semigroup (Pt )t 0 with generator L admits the representation

Ptf (x) =

; ;t ;2t )1=2 y d (y ) x 2 Rn t 0: f e x + (1 ; e n

61

On the basis of (2.31), exactly the same proof as the one of Proposition 2.15 thus leads to the following result. Proposition 2.16. Let d = e;U dx where, for some c > 0, n Hess U (x) c Id uniformly in x 2 R . Then, for every bounded 1-Lipschitz function F : Rn ! R and every r 0,
;

fF

Fd + rg

2 e;cr =2 :

Together with Proposition 1.7, we thus nd again the concentration properties of derived from isoperimetry in Theorem 2.6 or from Brunn-Minskowski inequalities in Theorem 2.13. As a consequence, if is the canonical Gaussian measure on 2 =2 n ; n= 2 ;j x j R with density (2 ) e with respect to Lebesgue measure, for every 1-Lipschitz function F on Rn and every r 0,
;

fF

Fd + rg

e;r =2
2

r 0:

(2:32)

This inequality has to be compared with the corresponding one with the median in (2.11). It extends similarly the case of linear functionals. An alternate proof of (2.32) may be provided by a beautiful simple argument using Brownian integration. For every smooth enough function f on Rn, write namely

f W (1) ; E f W (1) =

Z 1

hrP1;t f ;W (t) dW (t)i (2:33)

where (W (t))t 0 is Brownian motion on Rn starting at the origin de ned on some probability space ( F P) and where (Pt )t 0 denotes its associated semigroup (the heat semigroup), with the probabilistic normalization (that is with generator 1 2 ). Note then that the above stochastic integral has the same distribution as (T ) where ( (t))t 0 is a one-dimensional Brownian motion and where Z 1 T = jrP1;tf (W (t))j2 dt:
0

62

Therefore, for every 1-Lipschitz function F , T so that, for all r 0,


; P

1 almost surely

F W (1) ; E F W (1)

(t) r 2 dx = 2 e;x =2 p 2 r 2 =2 ; r e :

; P max 0 t 1 Z 1

Since W (1) has distribution , this is thus simply (2.32). As time changes of Brownian motion, continuous martingales satisfy similary sharp deviation inequalities. Assume for simplicity that M = (Mt )0 t 1 is a real-valued continuous martingale on some ltered probability space ( F Ft P) satisfying the usual conditions (cf. R-Y]). Denote by (hM M it )0 t 1 its quadratic increasing process and assume that hM i1 D2 almost surely. Then, for every r 0,
; P

M1

E (M1 ) + r

; P max 0 t hM M i1 ; P max 2 (t)

(t) r

0 t D

r
(2:34)

so that

M1 E (M1 ) + r e;r2 =2D2 : Alternatively, and assuming that M0 = 0 for simplicity,


2 e Mt; 2 hM M it

; P

2R t 0
2

is martingale with expectation Hence, for every r 0,


; P

fM1 rg

2 2 D2 =2 ; r + e

; P

M1 ;

hM M i1

r;

D2

from which we recover (2.34) by optimizing in . Related to the representation formula (2.33), a somewhat analogous method may be developed in the setting of general 63

Markov processes under the technique of forward and backward martingales. For simplicity, suppose that ( t)t 0 is a strong Markov process on some probability space with continuous paths taking its values in a locally compact space E with countable base, that it is time homogeneous, and that for each x 2 E , there is a probability measure Px on = C ( 0 1) E ) corresponding to the law of conditioned to start with 0 = x. More generally, let P be the law of where the law of 0 is a Radon measure . Such a process induces a Markov semigroup (Pt )t 0 on bounded Borel functions de ned by

Ptf (x) = E f ( t ) j 0 = x = E x f ( t )

t 0 x 2 E:

Because of the Markov property, PsPt = Ps+t. Suppose for regularity purposes that Pt : Cb(E ) ! Cb(E ). If (Pt )t 0 has stationary (not necessarily nite) measureR (i.e. R P (fXt 2 Ag) = (A) for all t or equivalently Pt fd = fd ), then Pt maps Lp ( ) into itself for every 1 p 1. We can thus express Pt = etL , where L is a uniquely de ned closed (unbounded) operator from Lp ! Lp as well as from Cb(E ) ! Cb(E ) known as the in nitesimal generator of (cf. Yo], S-V], I-W], F-O-T]). Let now f 2 D(L) \ Cb(E ) where D(L) is the domain of L. It^ o's formula of martingale characterization expresses that

f ( t ) ; f ( 0 ) = Mt +

t
0

Lf ( s )ds

(2:35)

where Mt = Mtf is a P -continuous local martingale whenever Lf 2 L1 ( ) and a Px-local martingale for any f satisfying Lf 2 Cb(E ). In concrete situations, the operator L is frequently the closure of a second order elliptic operator. This is the situation we encountered above with the Laplace-Beltrami operator on a manifold (X g). In these geometric cases, one can usually identify the quadratic variation process hM M it of M as

hM M it =

rf ( s) 2 ds:

64

When the operator L is self adjoint, the adjoint semigroup is the same as the original one. This analytic remark has stochastic counterparts. Namely, the processes 1;t and t are identical in law when started with initial law . We say the process is reversible ft is also a martingale with respect to . In particular, Mtf 1 t = M (in the ltration Ft = ( 1;s s < t)). So, by (2.35),
;

f ( t ) = f ( 0 ) + Mt +
and

t
0

Lf ( s )ds Lf ( 1;s )ds: +


Z 1

f( t) = f(
;

ft 1) + M

Z 1;t

Taking the two expressions together, 2f ( t ) = f ( 0 ) + f ( 1 ) + Mt and thus


;

f1;t +M

Lf ( s )ds

1 ;dM + dM f1;t : df ( t ) = 2 t Here the most useful form is


f1 f ( 1 ) ; f ( 0 ) = M1 ; M f are both martingales with where M and M f M fi1;t = rf ( t ) 2 dt: dhM M it = dhM

It then follows from (2.34) that whenever F is 1-Lipschitz, for every r 0


P
;

F ( 1 ) ; F ( 0 )j r

2 e;r2 =8 :

To make use of this result, take for example example for F the distance function to some xed set A shows that
P
;

02A

d( 1 A) r
65

2 e;r2 =8

(2:36)

for every r 0. While such a bound does not require any geometric assumptions on curvature as in previous analogous results, its application is conditioned by the initial information 0 2 A that is handled by geometric arguments involving volum of balls and curvature. Further applications in probability and potential theory are discussed in L-Z], Tak], Ly].

Notes and Remarks


The isoperimetric inequality on the sphere goes back to P. Levy Le] and E. Schmidt Schmi]. A short self-contained proof is presented in the appendix of F-L-M]. An alternate method relying on two-point symmetrization is performed in B-T] and presented in Beny], Sche5]. Isoperimetric inequalities are discussed in BZ], Os]... Levy's original idea Le] was extended by M. Gromov Grom1] (see M-S], Grom2], G-H-L]) as a comparison theorem for Riemannian manifolds with a strictly positive lower bound on the Ricci curvature (Theorem 2.4). Further geometric applications of Theorem 2.4 are discussed in Grom2]. Concentration for Grassmann and Stiefel manifolds, orthogonal group, and rst spectrum applications go back to the pioneering works Mi1], Mi3] by V. Milman. The Gaussian isoperimetric inequality (Theorem 2.5) is due independently to C. Borell Bor2] and V. N. Sudakov and B. Tsirel'son S-T] as limiting spherical isoperimetry. An intrinsic proof using Gaussian symmetrization is due to A. Ehrhard Eh] while S. Bobkov gave in Bob4] a functional argument based on a two-point inequality and the central limit theorem. The paper Bor2] develops the in nite dimensional aspects of the Gaussian isoperimetric inequality (see for applications Le3]). Theorem 2.6 is established in a more general context in Ba-L] via the functional formulation of Bob4] (see also Bob3]) with the semigroup tools of Section 2.3. This result may also be proved more simply by the localization lemma of L. Lovasz and M. Simonovits LS] as developed in K-L-S] and explained in Grom2] (using the separation ideas of Gr-M2]). See also Ale]. Proposition 2.7 was observed in I-S-T] and Pis1] and Proposition 2.8 in M-S] (see also Grom2]). Theorem 2.9 is a direct consequence of the solu66

tion of the isoperimetric problem on the discrete cube by L. H. Harper Ha] (for a simple proof see F-F] and for a far reaching generalization W-W]). Historical aspects of the Brunn-Minskowski theorem are discussed in DG]. To its functional version, the names of R. Henstock and A. MacBeath, A. Prekopa, L. Leindler, H. Brascamp and E. Lieb, C. Borell etc should be associated. Pertinence of BrunnMinkowski inequalities to measure concentration has a long run (see M-S]). Proposition 2.12 is due to C. Borell Bor1]. Improved (and optimal) estimates are obtained in L-S] by means of the localization method for the uniform distribution on the Euclidean ball and extended in Gu] to all log-concave measures. Inequality (2.21) is fruitful in the analysis of high dimensional convex sets and was extended to multilinear functions in Bou] (see also Bob6]). Recent developments on level of concentration with respect to a given class of functions with new geometric applications to concentration of random sets in Rn with respect to linear functions are initiated in Gi-M2] (see also Mi7]). Cheeger type isoperimetric inequalities for log-concave measures are studied in Bob5]. The proof of Theorem 2.13 is essentially due to B. Maurey Mau2] (see also Schmu1], B-G], BoL3]). Extensions of the functional Brunn-Minkowski theorem to Riemannian manifolds were described recently in CE-MC-S] and may be used to recover the concentration results of Theorem 2.4. Theorem 2.14 is due to M. Gromov and V. Milman Gr-M2], the simple proof taken from A-B-V]. See also Schmu1], Bo-L3]. More on concentration in Lp -spaces may be found in Sche3], Sche5], S-Z1], S-S1] (see Chapter 4). The semigroup arguments of Section 2.3 were developed originally for logarithmic Sobolev inequalities by D. Bakry and M. Emery B-E] (see Bak1], Bak2]) and their adaptation to concentration inequalities was noticed in Le1]. Propositions 2.15 and 2.16 are taken from Le1] (see also Le5]). The discrete analogue for non-local generators on graphs is studied in Schmu3] by appropriate notion of curvature of a graph (with illustrations). Further semigroup arguments relying on logarithmic Sobolev inequalities and covariance identities are developed in the same spirit in Chapter 5. 67

The Brownian proof of (2.32) is due to I. Ibragimov, B. Tsirel'son and V. Sudakov I-T-S], and was unfortunately ignored for a long time. The representation formula (2.33) is a particular form of the important Clark-Ocone formula in stochastic analysis Mal], Nu]. The so-called forward and backward martingale method was put forward in the paper L-Z], and further developed by M. Takeda Tak]. The exposition here is taken from the notes Ly] by T. Lyons to which we refer for applications to geometric potential theory.

68

3. CONCENTRATION AND GEOMETRY

In this chapter, we discuss the concept of Levy families and its geometric counterparts motivated by the examples presented in the preceding chapter. Following the sphere example, Levy families describe asymptotic concentration as dimension goes to in nity. In the rst section, we describe how spectral properties entail exponential concentration, both in continuous and discrete settings. Applications to spectral and diameter bounds are the topic of Section 3.2. We recover in particular with the tool of concentration Cheng's upper bound on the diameter of a Riemannian manifold with non-negative Ricci curvature. We then present and study Levy families. Topological applications to xed point theorems emphasized by M. Gromov and V. Milman are a further motivation for this geometric investigation. The last application is the historical starting point of the development of the concentration of measure phenomenon. V. Milman Mi3] namely used in the early seventies concentration on high dimensional spheres to produce an inspirating proof of Dvoretzky's theorem about almost spherical sections of convex bodies. We present this proof below with the help of Gaussian rather than spherical concentration as put forward in Pis2].

3.1 Spectrum and concentration


In this section, we realize how spectral properties may yield concentration properties. Assume we are given a smooth compact Riemannian manifold X with Riemannian metric g. Denote by V 69

the total volume of X and by d = dv V the normalized Riemannian volume element on (X g). In Chapter 2, we described normal concentration properties of under the rather stringent assumption that the Ricci curvatures of (X g) are strictly positive. In this section, we bound the concentration function (X g ) by the spectrum of the Laplace operator on (X g). Denote more precisely by 1 = 1 (X ) > 0 the rst non-trivial eigenvalue of . It is well-known (cf. Cha1], G-H-L]) that 1 is characterized by the variational inequality
1 Var

(f )

f (; f )d = jrf j2d
Z

(3:1)

for all smooth real-valued f 's on (X g) where Var (f ) =

f 2d

fd

is the variance of f with respect to and where jrf j is the Riemannian length of the gradient of f . Theorem 3.1. Let (X g) be a compact Riemannian manifold with normalized Riemannian measure . Then,

r>0 where 1 > 0 is the rst non-trivial eigenvalue of the Laplace operator on (X g). Proof. Let A and B be (open) subsets of X such that d(A B) = " > 0. Set (A) = a, (B) = b. Consider the function f given by 1 ; 1 1 + 1 min ;d(x A) " x 2 X f (x) = a " a b (so that f = 1=a on A and f = 1=b on B). By a simple regularization argument, we may apply (3.1) to this f . We need simply evaluate appropriately the various terms of this inequality. Clearly, f is Lipschitz and 1+1 jrf j 1 " a b 70
(X g )(r)

e;r

p =3 1

almost everywhere. Therefore Z 1 + 1 2 (1 ; a ; b): jrf j2 d "12 a b On the other hand, denoting by M the mean of f , Var (f ) = (f ; M )2 d
Z Z

1+1: a b It thus follows from (3.1) that 1 + 1 (1 ; a ; b) 1 ; a ; b 2 1" a b ab so that 1;a 1 b 1+ 2 " a 1 + "2 a :
1 1

(f

; M )2 d

(f ; M )2 d

Choosing for A the complement of B" and assuming that (B") = 1 shows that 1;a 2 (B")

2 1 + 12" (B): In other words, the expansion coe cient of on (X g) satis es 2 Exp (") 1 + 12" > 1: Set then " > 0 so that 1"2 = 2 and the conclusion follows from Proposition 1.13. The numerical constant 3 is not sharp at all. The proof of Theorem 3.1 is complete. On the standard n-sphere Sn, 1 = n Cha1] so that Theorem 3.1 yields a weaker concentration than Theorem 2.3. By Lich(X ) nerowicz's lower bound 1 nc n;1 ( G-H-L], Cha2]) on a compact Riemannian manifold (X g) with dimension n and strictly

71

positive lower bound c(X ) on its Ricci curvature, the same comment applies with respect to Theorem 2.4. However, Theorem 3.1 holds true on any compact manifold. It is a simple yet non-trivial observation (see Corollary 5.6 below) that 1(X Y ) = min( 1 (X ) 1 (Y ) for Riemannian manifolds X and Y . Theorem 3.1 therefore provides a useful tool to concentration in product spaces. In particular, if X n is the n-fold Riemannian product of a compact Riemannian manifold (X g) equipped with the product measure n, then ObsDiam n (X n ) = O(1): This is in contrast with the cartesian product for which we observed in (1.25) that ObsDiam n (X n ) is essentially of the order of pn. This observation is the rst step towards dimension free measure concentration for `2-metrics that will be investigated deeply in Chapter 4 and subsequent. The proof of Theorem 3.1 generalizes to probability measures on a metric space (X d) that satis es a Poincare inequality Var (f ) C jrf j2 d
Z

(3:2)

with respect to some (generalized length) of gradient jrf j. One may for example consider, given a locally Lipschitz function f on space (X d), the length of the gradient of f at the point x 2 X as ) ; f (y)j : jrf j(x) = lim sup jf (x d(x y) y!x The next statement is an immediate adaptation of Theorem 3.1. Corollary 3.2. Assume that is a probability measure on the Borel sets of a metric space (X d) such that for some C > 0 and all locally Lipschitz real-valued functions f on (X d), Var (f ) C jrf j2 d : Then
p ; r= 3 C ( r ) e (X g )
Z

r > 0:

72

Theorem 3.1 has an analogue on graphs to which we turn now. It is convenient to deal with nite state Markov chains. Let X be a nite or countable set. Let K (x y) 0 satisfy
X

y2X

K (x y) = 1

for every x 2 X . Asssume furthermore that there is a symmetric invariant probability measure on X , that is K (x y) (x) is symP metric in x and y and x K (x y) (x) = (y) for every y 2 X . In other words, (K ) is a symmetric Markov chain. De ne 1 X f (x) ; f (y) 2K (x y) ;fxg : E (f f ) = 2 x y 2X We may speak of the spectral gap, or rather the Poincare constant, of the chain as the largest 1 > 0 such that for all f 's,
1 Var

(f ) E (f f ):
X

(3:3)

Set also 1 jjjf jjj2 1 = 2 sup


x2X y2X

f (x) ; f (y) 2 K (x y):

The triple norm jjj jjj1 may be thought of a discrete version of the Lipschitz norm. Although it may not be well adapted to all discrete structures, it behaves similarly for what concern spectrum and exponential concentration. Theorem 3.3 below is namely the analogue of Theorem 3.1 in this discrete setting. We however adopt a di erent strategy of proof best adapted to this case. The argument may be applied similarly to yield alternate proofs of Theorem 3.1 and Corollary 3.2. Theorem 3.3. Let (K ) be a symmetric Markov chain on X as before with spectral gap 1 > 0. Then, whenever jjjF jjj1 1, F is integrable with respect to and for every r 0,
;

fF

Fd + rg
73

3 e;r

p =2 1 :

Proof. Let F beRa bounded mean zero function on X with jjjF jjj1 1. Set ( ) = e F d , 0. We apply the Poincare inequality to e F=2 . The main observation is that

E
Indeed, for

e F=2 e F=2

jjjF jjj2

e Fd :

(3:4)

0, by symmetry, 2 x y2X
X X ; 2 e F (x)=2 ; e F (y)=2 K (x y) fxg ; e F (x)=2 ; e F (y)=2 2K (x y) fxg

Ee

F=2 e F=2 = 1

F (x)<F (y)

2 x y 2X

2 X

F (x) ; F (y) 2e

F (y) K (x y )

fxg

from which (3.4) follows by de nition of jjjF jjj1. Therefore, ( ); that is, for every <

p , 1
( )

2 1

( )

1; 2

2 :
2k

Applying the same inequality for =2 and iterating, yields, after n steps, ( )
n ;1 Y k=0
1; 1 2 =4k

2n :
2k

Since ( ) = 1 + o( ), we have that (t )t ! 1 as t ! 0. Therefore, ( )


1 Y
k=0
1; 1 2 =4k

74

where the product converges whenever < 1. The proof of the proposition is easily completed. Namely, setting for example 1 p 1 yields that =2

2 By Chebyshev's inequality
;

p F=2 1 d

3:

fF rg

3 e;r

p =2 1

for every r 0. As in Proposition 1.7, the result is easily extended to arbitrary F with jjjF jjj1 1 which completes the proof. We may de ne a distance on X associated with jjj jjj1 as

d(x y) = sup
jjjf jjj
1

f (x) ; f (y)
p

x y 2 X:

Then, together with Proposition 1.7, Theorem 3.3 indicates that


;r 1 =4 r > 0: (3:5) (X d )(r) 3 e The distance most often used is however not d but the combinatoric distance dc associated with the graph with vertex-set X and edge-set f(x y) : K (x y) > 0g. This distance can be de ned as the minimal number of edges one has to cross to go from x to y. Equivalently,

dc(x y) = sup
krf k
1

f (x) ; f (y)

where

Now since y K (x y) = 1,

krf k1 = sup f (x) ; f (y) K (x y) > 0 :


1 2 jjjf jjj2 1 2 krf k1 : p 75

In particular, dc d= 2 and thus (3.5) also holds for dc.

As an example, let X = (V E ) be a nite connected oriented graph with symmetric set of vertices V and set of edges E and equip V with normalized uniform measure. We may consider K (x y) = k(1x) whenever x and y are adjacent in V and 0 otherwise where k(x) is the number of neighbors of x. Consider the quadratic form X (Qf f ) = f (x) ; f (y) 2 where the sum runs over all neighbors x y in X . Since X is connected, (Qf f ) 0 for all f and is zero whenever f is constant. One may then speak of the rst non-trivial eigenvalue 1 > 0 of Q, that is of the Laplace operator on X . As a consequence of Theorem 3.3, we have the following result. Corollary 3.4. Let k0 = maxfk(x) x 2 V g < 1. Then
(X dc ) (r)

x y

3 e;r

p 1=16k0

r > 0:

The most important examples of applications of the preceding corollary are the Cayley graphs. If V is a nite group and S V a symmetric set of generators of V , we may join x and y in V by an edge if x = s;1 y for some s 2 S . Here k0 = Card(S ) and the path distance on X = (V E ) is the word distance in V induced by S. Spectral and Poincare inequalities have an L1 counterpart related to Cheeger's isoperimetric constant that describes the correspondance between the previous concentration results and the isoperimetric approach of Proposition 2.10. Namely, if for example (X g) is a compact Riemannian manifold with normalized Riemannian measure , denote by h > 0 the best constant such that Z Z R jrf jd (3:6) h jf ; fd jd for all smooth2 functions f on (X g). Cheeger's inequality indicates that 1 h4 Chee]. Applying actually (3.6) to characteristic functions of sets amounts + (A) 2h (A);1 ; (A) 76

which is close to the isoperimetric inequality (2.14). By Cheeger's inequality, Theorem 3.1 thus covers the isoperimetric approach in this case, producing similar exponential concentration under weaker geometric invariants. The same comments are in order on graphs.

3.2 Spectral and diameter bounds


On the basis of the results of Section 3.1, we investigate here some relationships between spectral and diameter bounds. Assume we are given a smooth complete Riemannian manifold (X g) that is not necessarily compact but with nite volume V . We denote as usual by d = dv V the normalized Riemannian volume element. Denote by 1 = 1(X ) the rst non-trivial eigenvalue of the Laplace operator on (X g). If B(x r) is the (open) ball with center x and radius r > 0 in (X g), it follows from Theorem 3.1 applied to the 1-Lipschitz function d(x ) that 1 = 1 (X ) = 0 as soon as ; log 1 ; B(x r) = 0 (3:7) lim sup 1 r!1 r for some (all) x in X . The following is a kind of converse. Theorem 3.5. Let (X g) be a smooth complete Riemannian manifold with dimension n and nite volume. Let be the normalized Riemannian volume on (X g). Assume that the Ricci curvature of (X g) is bounded below. Then (X g) is compact as soon as 1 log 1 ; ;B(x r) = ;1 lim inf r!1 r for some (or all) x 2 X . In particular 1 > 0 under this condition. Furthermore, if (X g) has non-negative Ricci curvature, Cn (3:8) 1 D2 where D is the diameter of (X g) and Cn > 0 only depends on the dimension n of X . 77

The upper bound (3.8) goes back to the work of S.-Y. Cheng Chen] in Riemannian geometry (see also Cha1]). Proof. We proceed by contradiction and assume that X is not compact. Choose B(x r0 ) a geodesic ball in (X g) with center x 1 . By non-compactness and radius r0 > 0 such that (B(x r0 )) 2 (and completeness), for every r > 0, we can take z at distance r0 + 2r from x. In particular, B(x r0 ) B(z 2(r0 + r)). By the Riemannian volume comparison theorem C-E], Cha2], for every y 2 X and 0 < s < t,

t netp(n;1)K (3:9) s where ;K , K 0, is the lower bound on the Ricci curvature of (X g). Therefore, n ;2(r+r0 )p(n;1)K ; ; ; r B(z r) B (z 2(r0 + r) 2(r0 + r) e n ;2(r0 +r)p(n;1)K 1 r 2 2(r + r) e
(B(y t)) (B(y s))
0

where we used that (B(z 2(r0 + r))) (B(x r0 )) B(z r) is included in the complement of B(x r0 + r), 1 ; B(x r + r0 )
;

1 2.

Since

n ;2(r0 +r)p(n;1)K r 1 (3:10) 2 2(r0 + r) e

which is impossible as r ! 1 by the assumption. The rst part of the theorem is established. Thus (X g) is compact. Denote by D its diameter. Assume that (X g) has non-negative Ricci curvature. That is, we may take K = 0 in (3.9). By Theorem 3.1, for every subset A in X such that (A) 1 2 , and every r > 0, 1 ; (Ar ) e;
p r=3 1 :

(3:11)

D Let B(x D 8 ) be the ball with center x and radius 8 . We dis1 tinguish between two cases. If (B(x D 8 )) 2 , apply (3.11) to

78

D A = B(x D 8 ). By de nition of D, we may choose r = r0 = 8 in (3.10) to get 1 ;p 1 D=16 : 1 ; ( A ) e D= 8 2 4n 1 D If (B(x D 8 )) < 2 , apply (3.11) to A the complement of B (x 8 ). D ) is included in the complement of AD=16 Since the ball B(x 16 and since by (3.9) with t = D, D 1 B x 16 16n D that we get from (3.11) with r = 16 1 3 e;p 1 D=32: 16n The conclusion easily follows from either case. Theorem 3.5 is established. In the last part of this section, we describe analogous conclusions in discrete. As in Section 3.1, let K (x y) be a Markov chain on a nite state space X with symmetric invariant probability measure . Denote by 1 > 0 the spectral gap of (K ) de ned by 1 Var (f ) E (f f ) for every f on X . Recall that here 1 X f (x) ; f (y) 2K (x y) ;fxg : E (f f ) = 2 x y 2X

If d is the distance de ned from the norm 1 sup X f (x) ; f (y) 2 K (x y) jjjf jjj2 = 1 2 x2X y2X recall from Theorem 3.3 that
;r (X d )(r) 3 e p =4 1

r > 0:

(3:12)

79

Denote by D the diameter for the distance d. Proposition 3.6. If is nearly constant, that is if there exists C such that, for every x, (fxg) C miny2X (fyg), then
1

8 log(12C Card(X)) 2: D

Proof. Consider two points x y 2 X such that d(x y) = D. By Corollary 1.4 and (3.12),
;

fxg fyg

12 e;D

p =2 1 :

Since, by the hypothesis on , minz2X (fzg) (C Card(X ));1 , the conclusion follows. pRecall that the combinatoric diameter Dc is such that Dc D= 2.

3.3 Levy families


In this section, we deal with families of metric spaces (X n dn) equipped with Borel probability measures n, n 1. Denote by Dn the diameter of (X n dn ) and assume that 1 Dn < 1, n 1. According to the de nition of the concentration function, we say that the family (X n dn n)n 1 is a Levy family if for every r > 0,
n n n (Dn r) = 0: lim n!1 (X d )

(3:13)

More quantitatively, we say that (X n dn n)n 1 is a normal Levy family if there are constants C c > 0 such that for every n 1 (or only n large enough) and r > 0,
(X n dn n ) (r)

C e;cnr2 :

(3:14)

Since we assume Dn 1, any normal Levy family is a Levy family. The omission of Dn in the de nition of a normal Levy family is 80

justi ed by the fact that many examples become that way Levy families with their natural metrics. A particular situation occurs when n, n 1, is a family of probability measures on a given xed metric space (X d). We then say similarly that (X d n )n 1 is a Levy family if for every r > 0, n (r) = 0: lim (3:15) n!1 (X d ) The study of Chapter 2 provides a number of geometric examples of (normal) Levy families. Let us review a few of them. The n family (Sn Rn Rn ) of the Euclidean spheres with radii Rn equipped with the normalized Haar measure is a Levy family as soon as n;1=2Rn ! 0 as n ! 1. It is a normal Levy family if Rn 1. More generally, by Theorem 2.4, a family (X n gn)n 1 of compact Riemannian manifolds equipped with the normalized volume elements n such that c(X n) ! 1 with n is a Levy family. Recall that we denote by c(X ) the in mum of the Ricci curvature tensor over all unit tangent vectors of a Riemannian manifold (X g). This 1 includes Sn, SO n for which c(SO n) = n; 4 , the Stiefeld manifolds n W k etc (cf. C-E], Cha2]). If (X n gn)n 1 are compact Riemannian manifolds such that 1 (X n ) ! 1 as n ! 1 where 1 (X n ) is the rst non-trivial eigenvalue of the Laplace operator on X n, then (X n gn)n 1 is a Levy family by Theorem 3.1, however never normal. There are many known families of k0 -regular graphs X (k0 xed) such that Card(X ) ! 1 whereas 1 > 0 stays bounded away from zero (the so-called expanders graphs), and moreover graphs with this property are \generic" amongst k0-regular graphs Alo]. These examples thus give rise to Levy families. Further combinatorial examples such as uniform measure on the symmetric group n will be described in the next chapter. The following simple proposition is an alternate de nition of Levy families. Recall that if A is a subset of a metric space (X d) and if r > 0, we let Ar = fx 2 X d(x A) < rg. Proposition 3.7. Let (X n dn n)n 1 be a Levy family. Then, for any sequence An X n, n 1, such that lim inf n!1 n(An ) > 0, 81

we have for every " > 0,


n ;(An )"Dn = 1: lim n!1

When (X n dn n)n 1 is a normal Levy family, the same result holds with "Dn replaced by ". Proof. Let lim inf n!1 n(An ) > > 0. Since (X n dn n)n 1 is a Levy family, for every r > 0, n (Dn r) < for all n's large enough. Therefore by Lemma 1.1, for every n large enough and every t > 0,
; 1 ; n (An )Dn (r+s)

n (Dn s):

Using again that n (Dn s) ! 0 as n ! 1, and since r s > 0 are arbitrary, the conclusion follows. The next statement describes general properties of Levy families that immediately follow from the de nition and results of Chapter 1. Proposition 3.8. Let (X n dn n )n 1 be a Levy family. For every n 1, let 'n be a Lipschitz map between (X n dn) and (Y n n) with Lipschitz coe cients c'n . Denote by n 'n the image measure of n by 'n, n 1. If supn 1 c'n < 1, then (Y n n n 'n )n 1 is again a Levy family. n If (Y n n)n 1 is another Levy family, ( n n)n 1 is a Levy family on the product spaces (X n Y n dn + n)n 1. Let An X n, n 1, Borel sets such that lim inf n!1 n(An ) > 0. Then (X n dn n( jAn ))n 1 is a Levy family.

3.4 Topological applications


We present applications of the concentration of measure phenomenon to some xed point theorems. Let (X d) be a metric space, and let G be a family of maps from X into X . We say that a subset A of X is essential (with 82

respect to the action of G) if for every " > 0 and every nite subset fg1 : : : g`g G,
` \ k=1

gk (A" ) 6= :

We then say that the pair (X G) has the property of concenSN tration if for every nite covering X Ai , Ai X , there i =1 exists Ai , 1 i N , which is essential (for the action of G). The following proposition connects this de nition with Levy families. Proposition 3.9. Let (X d) and G be as before. Suppose that S G = n 1 Gn , Gn Gn+1, and that there exist probability measures n, n 1, on the Borel sets of (X d) such that n is Gninvariant. If (X d n )n 1 is a Levy family, then (X G) has the property of concentration. S i Proof. Assume X = N i=1 A . There exists i, 1 i N , such that lim sup n (Ai )
n!1

1 : 2N

By Proposition 3.7, for every " > 0,


n ;(Ai )" = 1: lim n!1

Since n is Gn-invariant, for every g 2 G,


n ;g ;(Ai )" = 1: lim n!1

It immediately follows that Ai is essential. The following examples illustrate the proposition. Let H be an in nite dimensional Hilbert space and let (ei )i 1 be an orthonormal basis of H . The orthogonal group SOn may be realized as unitary operators on H which are the identity on the span of (S ei )i>n. Then SOn SOn+1. Consider then X = G = SO1 = n 1 SOn equipped with the Hilbert-Schmidt operator 83

metric. Then (X G) has the concentration property. To prove it, recall that SOn equipped with the normalized Haar measures n, n 1, form a normal Levy family and apply Proposition 3.9 (with n de ned on all of X ). Exactly the same proof together with concentration on spheres shows that the unit sphere S 1 of H with action G = SO1 has the concentration property. One may also show that if u is an unitary operator on H and if G = (un)n 1, then (S 1 G) has the property of concentration in the above sense. Let (X d) be a metric space. Let now G be a family of uniformly equicontinuous maps from X into X . In other words, there exists a continuous function ! : R+ ! R+ with !(0) = 0 such that ; ; d g(x) g(y) ! d(x y) for all x y 2 X and g 2 G. Suppose also the identity map belongs to G. We call the pair (X G) a G-space. We do not assume that G is a group (or even a semigroup) although this will mostly be the case. We turn to the xed point theorem we want to describe. Let thus (X d) be a G-space and let K be a compact space.

Proposition 3.10. Assume that (X G) has the property of concentration. Let : X ! K be any map with domain X . Then there exists x 2 K such that for any neighborhood O of x, the set
;1 (O) is essential in X .

We call such point x an essential point of the map . Proof. By contradiction. If such x does not exist, we can nd for every y in K an open neighborhood Oy such that ;1 (Oy ) is not essential. Since K is compact, there is a nite family (Oi )1 i N SN of the covering (Oy )y2K of K such that K i=1 Oi . Then SN ;1 (Oi ) = X and, by the concentration property, one of i=1 the sets ;1 (Oi ) has to be essential which is a contradiction. The proposition is proved. A map : X ! K is called uniformly continuous if for any closed subset A K and any open neighborhood OA of A, there 84

is " > 0 such that

;1 (A)

"

OA:

such that g = J . Then any essential point of is a xed point for J . Before turning to the proof of Theorem 3.11, let us mention the following immediate consequence that motivated the present investigation and that follows by choosing K = X , the identity map and J = g. Corollary 3.12. If X is compact and if (X G) has the property of concentration, then there is a point x 2 X which is xed under the action of G. Proof of Theorem 3.11. Let x be an essential point of and assume that J (x) = y 6= x. By continuity of J , there exists open neighborhoods Ox of x and Oy of y such that J (Ox) Oy and Ox \ Oy = . We may moreover choose Ox such that for some neighborhoods U and V of the closures of Ox and Oy , we have U \ V = . Since x is an essential point of , ;1(Ox ) = X is an essential set. Therefore, for every " > 0, (g( ))" \ " 6= . Also, g( ) = ;1J (Ox) and, by uniform continuity of g, for any > 0, one can nd " > 0 such that ; ;1 J (Ox) : g( )" By uniform continuity of , one may then choose > 0 and then " > 0 such that ((g( ))" ) V (recall that J (Ox) Oy ) and such that ( " ) U . Since (g( ))" \ " 6= , it follows that U \ V 6= which yields a contradiction. The proof of Theorem 3.11 is complete. The following is a further consequence of Theorem 3.11. Corollary 3.13. Let (X G) be a G-space and let (Gn n )n 1 be a Levy family. Assume that (g x) 7! x is uniformly continuous. 85

Theorem 3.11. Assume that (X G) has the property of concentration. Let : X ! K and g : X ! X , g 2 G, be uniformly continuous maps. For g 2 G, let J : K ! K be continuous and

Then, if X is compact, there is a common xed point under the action of G. Proof. Let x0 2 X be xed and let ' : G ! X be de ned by '(g) = gx0. Consider then the image measures n ' for which X is a Levy family and apply Theorem 3.11. In the nal part of this section, we turn to some nite dimensional analogue for which we may not claim for xed points but for which we estimate the diameter of minimal invariant subspaces. Let (X d) be a compact metric space with a Borel probability measure . We denote by the concentration function of (X d ). Let also G be a family of measure-preserving maps from X into X . We say that the pair (X G) acts on a compact metric space (K ) via a map : K ! K if for every g 2 G there exists Jg : K ! K such that Jg = g. In other words, is an equivariant map. One example is provided by a (compact) group G acting on K , xing x 2 K , and setting : G ! K , g = gx. We assume below that kJg kLip L for every Jg . Denote by NK (") the minimal number of (open) balls of radius " > 0 which cover K . Theorem 3.14. Under the above conditions and notations, there exists x 2 K such that for every Jg , g 2 G, ; d(Jg x x) (1 + L) inf " + k kLip (s + r) where the in mum runs over all " s r > 0 such that NK (") (s) < 1 and (r) < 1 2. Proof. Let " > 0 and s > 0 satisfying the required condition and choose a nite subset S of K with cardinality NK (") such that the balls centered in S with radius " cover K . Choose x 2 S such that ; ;1 (B(x ") N 1(") > (s) K where B(x ") is the ball with center x and radius " in K . De ne = ;1 (B(x "). By Lemma 1.1, 1 ( s+r ) 1 ; (r) > 2 86

when

( s+r ) B x " + (s + r)k kLip = B1 while, by the de nition of L, for any g 2 G,

(r) < 1 2 . Now,

Jg (B1 ) B Jj x L " + (s + r)k kLip = B2:


Therefore, ( ;1 (B1))
;

( s+r )
;

1 2

while
;

;1 (B2 )

;1 Jg (B1 ) =
;

It follows that B1 \ B2 6= so that

1: g ;1(B1 ) > 2

d(Jg x x) (1 + L) " + k kLip(s + r) : Since g is arbitrary in G, the theorem is established. A typical application occurs when K is a compact set in Rk R )k equipped with a norm k k in which case NK (") (1 + 2" where R is the radius of K for the norm k k (see Lemma 3.18 below). 2Then, provided that has normal concentration (r) C e;cnr q, r > 0, it may be shown from Theorem 3.14 that R k where the constant depends on the various parameters Const n in Theorem 3.14 (see Mi4] for details).

3.5 Euclidean sections of convex bodies


This last paragraph is devoted to the historical example of application of concentration to the geometric problem of spherical sections of convex bodies. Let K be a convex symmetric body in Rn, that is a convex compact with non-empty interior subset of Rn that is symmetric with respect to the origin. K is said to contain almost Euclidean sections of dimension k if, for every " > 0, there exist a k-dimensional subspace H and an ellipsoid E in H such that

E K \ H (1 + ") E :
87

(3:16)

This geometric description admits the following equivalent functional formulation. A Banach space E (with norm k k) is said to contain a subspace (1 + ")-isomorphic to the Euclidean space Rk if there are vectors v1 : : : vk in E such that for all x = (x1 : : : xk ) in Rk ,

jxj

k X i=1

xi vi

(1 + ")jxj

(3:17)

where j j is Euclidean norm on Rk . The important next statement is the famous Dvoretzky theorem in the local theory of normed spaces. Theorem 3.15. For each " > 0 there exists (") > 0 such that every Banach space E of dimension n contains a subspace (1 + ")isomorphic to Rk where k = (") log n]. Since every 2n-dimensional ellipsoid has an n-dimensional section which is a multiple of the canonical Euclidean ball, we see from Theorem 3.15 that for each " > 0 there exists (") > 0 such that every centrally symmetric convex body K admits a central section K0 with dimension k "(") log n and > 0 such that B K0 (1 + ") B where B is the canonical Euclidean ball in the subspace spanned by K0 . The rest of the section is devoted to the proof of this result. While the original concentration proof by V. Milman Mi3] uses spherical concentration, we rather make use of Gaussian concentration following Pis2] (see also Pis3]). The main argument will be to span the Euclidean subspace by Gaussian distribution and concentration. The rst lemma, known as the Dvoretzky-Rodgers lemma, will be crucial in the choice of the underlying Gaussian distribution. Lemma 3.16. Let E be a Banach space of dimension n and let m = n=2] (integer part). There exist vectors v1 : : : vm in E such that kvik 1=2 for all i = 1 : : : m and satisfying
m X i=1

xi vi
88

jxj

for all x 2 Rm. Proof. We rst construct an operator T : Rn ! E such that kT k 1 and kTjF k dim(F )=n for all subspaces F of Rn. In particular kT k = 1. To this task, consider any determinant function (associated to any xed matrix representation) S ! det(S ). Let T be such that det(T ) = max det(S ) S : Rn ! E kS k 1 : Then, for any " > 0, and any S : Rn ! E , det(T + "S ) det(T )kT + "S kn: Now, det(T + "S ) = det(T )det(I + "T ;1S ) ; = det(T ) 1 + "Tr(T ;1 S ) + o(") : Hence, substituting in (3.18), we get 1 + "Tr(T ;1 S ) kT + "S kn + o(") ; 1 + "kS k n + o(") 1 + "nkS k + o(") so that Tr(T ;1 S ) nkS k. Consider now a subspace F in Rn and let P : Rn ! F be the orthogonal projection. Setting S = TP we get dim(F ) = Tr(P ) = Tr(T ;1 S ) nkS k: Therefore kTP k dim(F )=n which proves the claim. It is now easy to complete the proof of the lemma. Let T be as before. Choose y1 in Rn such that jy1j = 1 and kTy1 k kT k = 1. Then choose y2 orthogonal to y1 such that jy2j = 1 ) where Q is the orthogonal 1 and kTy2 k kTQk (1 ; n projection onto fy1 g?.Continuing in this way on the basis of the preceding claim, one constructs a sequence such that yi 2 fy1 : : : yi;1 g? and jyi j = 1, kTyi k (n ; i + 1)=n. Set then vi = Tyi , i = 1 : : : n=2], which have the required properties. The proof of Lemma 3.16 is complete. 89 (3:18)

Rm ! R by

From Lemma 3.16, we may de ne the Lipschitz map F :

F (x) =

m X i=1

xi vi

x = (x1 : : : xm ) 2 Rm :

(3:19)

By Lemma 3.16, kF kLip 1. Indeed, for all x y 2 Rm,

F (x) ; F (y)

m X i=1

(xi ; yi )vi

jx ; yj:

Denoting by the canonical Gaussian measure on Rm , by for example (2.32), for every r 0,
;

f F ; R Fd j r

2 2 e;r =2 :

(3:20)

It is an important feature of the construction of Lemma 3.16 that R MF = Fd is much bigger (for n large) than kF kLip 1. Namely, denoting by uniform measure on the discrete cube f;1 +1gm, by symmetry of ,

MF =
Z

ZZ

m X i=1

"i xi vi d (")d (x):


1 Z max jx jd (x) 2 1 i m i

By Jensen's inequality, conditionally on "j , j 6= i, it follows that

MF

1 i m

max kxi vikd (x)

where we used Lemma 3.16 in the second step. By a well-known and elementary computation, there is a numerical constant > 0 such that Z p log m : max j x j d ( x ) 2 i 1 i m Hence, for every n 3,

MF

log n :

(3:21)

90

As a next step, we need two technical lemmas to discretize the problem of nding Euclidean subspaces. A -net ( > 0) of a set A in Rk is a set S such that for every x in A one can nd y in S with jx ; yj . Lemma 3.17. For each " > 0 there is = (") > 0 with the following property. Let S be a -net of the unit sphere of Rk . Then, Pk if for some v1 : : : vk in a Banach space E , we have 1 ; 1 + for all x in S , then i=1 xi vi (1 ; ")1=2 jxj
k X i=1

xi vi

(1 + ")1=2 jxj

for all x in Rk . Proof. By homogeneity, we can and do assume that jxj = 1. By de nition of S , there exists x0 in S such that jx ; x0 j , hence x = x0 + 1 y with y in the unit sphere of Rk and j 1 j . P1 Iterating the procedure, we get that x = j=0 j xj with xj 2 S and j j j j for all j 0. Hence
k X i=1

xi vi

1 X
j =0 k X i=1

k X i=1

xj i vi

1+ 1;

( < 1). In the same way,

xi vi

1;2 :

It therefore su ces to choose appropriately in function of " > 0 only in order to get the conclusion. The size of a -net of spheres in nite dimension is easily estimated in terms of > 0 and the dimension, as is shown in the next lemma which follows from a simple volume argument. Lemma 3.18. There is a -net S of the unit sphere of Rk of cardinality less than (1 + 2 )k e2k= . 91

Proof. Let x1 : : : x` be maximal in the unit sphere of Rk under the relations jxi ; xj j > for i 6= j . Then the balls in Rk with centers xi and radius 2 are disjoint and are all contained in the ball with center the origin and radius 1 + 2 . By comparing the volumes, we get `( 2 )k (1 + 2 )k from which the lemma follows.

We are in a position to perform the proof of Theorem 3.15. The main argument is the concentration property for Gaussian measures (3.20). The idea is to span the Euclidean subspace by the Gaussian rotational invariance. Consider the Lipschitz function F de ned in (3.19) and recall the Gaussian measure on Rm , m = n=2]. Let k 1 be speci ed later on and let x = (x1 : : : xk ) in the unit sphere of Rk (i.e. jxj = 1). By the Gaussian rotational invariance, the distribution of the map
i=1 under the product measure k on (Rm )k is the same as . There-

(X1 : : : Xk ) 2 (Rm )k 7!

k X

xi Xi 2 Rm

fore, by (3.20), for every r > 0,


k

(X1 : : : Xk ) 2 (Rm )k F

k X i=1

xi Xi ; MF

r
2 e;r2 =2 :

Now let " > 0 be xed and choose = (") > 0 according to Lemma 3.17. Let furthermore S be a -net in the unit sphere of Rk , that can be chosen with cardinality less than or equal to e2k= (Lemma 3.18). Let r = M . The preceding inequality together with homogeneity of F thus implies that
k

9x 2 S

k X i=1

Xi ;1 xi M
F

2 Card(S ) e; MF =2 2 2 2e(2k= );( MF =2):


2 2

92

Assume then n to be p large enough (otherwise there is nothing to prove). Since MF log n, we can then choose k = log n] for = ( ) = (") small enough such that the preceding probability is strictly less than 1. It follows that there is at least one realization of (v1 : : : vk ) = (MF );1 (X1 : : : Xk ) in E such that for all x in S, 1;
k X i=1

xi vi

1+ :

Together with Lemma 3.17, the proof of the theorem follows. The preceding proof of Theorem 3.15 actually shows that most random sections (in the sense of k ) will produce almost Euclidean subspaces. Further use of concentration arguments (by means of empirical processes) show that the dependence of " of the function (") is of the order of "2 Sche2]. As announced, the original measure concentration argument by V. Milman Mi3] used spherical rather than Gaussian concentration. One nice feature of the spherical proof is the following geometric consequence Mi3], Mi5]. Theorem 3.19. For each > 0 there existsn = ( ) > 0 such that for any continuous function F on S , there exists a k-dimensional sphere Sk Sn (i.e. the unit sphere of a (k + 1)dimensional subspace) with k = n] such that for any x in Sk ,

F (x) ; mF < !F (2 )
where mF is a median of F (for n ) and !F is the modulus of continuity of F . Proof. It is a consequence of Levy's inequality (2.6). Let A = fjF ; mF j < !F ( )g. By (2.6), we know that
n (A)
2 1 ; 2 e(n;1) =2:

Fix x in Sn and let be the Haar probability measure on SOn. Clearly,


;

fT 2 SO n Tx 2 Ag = n(A) 1 ; 2 e;(n;1) 2=2 :


93

This implies that for any nite set S in Sn,


;

fT 2 SOn T S Ag

2 1 ; 2 Card(S ) e;(n;1) =2 :

Therefore, if 2Card(S ) < e(n;1) 2=2 , we may nd a rotation T such that T S is a subset of A. Choose then S to be a -net of Sk according to Lemma 3.17. This is thus possible for k = n] for some appropriate = ( ) > 0 from which the result follows.

Notes and Remarks


Dvoretzky's theorem on Euclidean subspaces of Banach spaces (Theorem 3.15) was established rst in Dv]. The new proof by V. Milman Mi3] is at the starting point of the developments of the concentration ideas. Since then, V. Milman indeed strongly promoted the usefulness of the concentration of measure phenomenon in various contexts Mi5], and convinced in particular M. Talagrand of the importance of this simple, but useful, property. The proof presented in Section 3.4 of Dvoretzky's theorem is the Gaussian version due to G. Pisier Pis2], Pis3], of Milman's argument (that used concentration on spheres). Lemma 3.16 is an important step known as the Dvoretzky-Rodgers lemma D-R]. Applications of concentration to Dvoretzky-like theorems have been a main issue in the local theory of Banach spaces in the decades 1970-90, starting with the fundamental contribution F-L-M] by T. Figiel, J. Lindenstrauss and V. Milman. The main results are extensively reviewed in the Lecture Notes M-S] by V. Milman and G. Schechtman and more recently in the Handbook in the geometry of Banach spaces J-L] (cf. the contributions Sche5], Gi-M1], J-S3]...) See also Pis3]. In particular, ne embedding of subspaces of Lp in nite dimensional `p spaces and nite dimensional `p-subspaces, 1 < p 2, in connection with stable type have been investigated by W. Johnson and G. Schechtman J-S1] and G. Pisier Pis1] together with further probabilistic concentration inequalities for series of independent random vectors (cf. Pis2], Sche5]). Approximation of zonoids by zonotopes in B-LM] by random embeddings makes use of concentration through 94

empirical process methods as initiated in Sche3]. Theorem 3.19 is taken from Milman's original proof Mi3]. V. Milman also used concentration for some in nite dimensional integration questions and other problems in Mi2]. Concentration under spectral properties were rst put forward by M. Gromov and V. Milman in Gr-M1]. The corresponding results on graphs and applications to expander graphs and superconcentrators have been investigated by N. Alon and V. Milman Al-M], Alo]. See also B-H-T]. Futher geometric comments may be found in Grom2]. Connections between concentration and Poincare inequalities strongly developed on the probabilistic side during the past decade (cf. Le5] and the references therein). The results of Section 3.2 are inspired by early contributions of R. Brooks Bro]. Theorem 3.5 is implicit in Le5] where the analogous results between bounds on the diameter and logarithmic Sobolev constants are developed. Cheng's inequality (3.8) was established in Chen]. Levy families are introduced and analyzed from a geometric and topological point of view by M. Gromov and V. Milman in Gr-M1] where a number of geometric examples are provided. The results of Sections 3.3 and 3.4 are taken from this reference. See also Grom2], Mi5]. Theorem 3.14 is due to V. Milman Mi4]. Recently, V. Pestov Pe1], Pe2] found a non-trivial way from concentration to ergodic theory by showing that an action on some group has concentration if and only if it is amenable. Perspectives and developments related to the concentration of measure phenomenon in various areas of mathematics and its applications are discussed in the references Mi7] and Grom2], Grom3].

95

4. CONCENTRATION IN PRODUCT SPACES

While products of Levy families are still Levy families, at a more quantitative level the concentration functions of product spaces are very sensible to dimension. An important issue addressed in this chapter is to describe dimension free concentration properties in product spaces. Several methods are developed to this task, and further ones will be presented in Chapters 5 and 6. We already described in Chapter 1 concentration in product spaces for the `1-metric and the Hamming metric. These however depend rather badly on dimension and one main issue adressed in this chapter (and the subsequent ones) will be to deal with the `2metric to produce dimension free concentration properties such as the one for Gaussian measures on Euclidean space. We start with a useful martingale method that discretizes some of the martingale tools of Chapter 2. We next present some aspects of the deep investigation by M. Talagrand Tal7] of concentration in product spaces. Following his work, we analyze successively the convex hull and q-point approximations of powerful use in discrete and combinatorial probability theory as illustrated in Section 8.4. In particular, while the basic de nition of concentration deals with metric spaces, we investigate here examples in product spaces that go outside the usual metric setting. Proofs rely on a basic induction scheme together with geometric arguments. Section 4.4 is devoted to the application of in mum-convolutions to concentration for product measures as already outlined at the end of the rst chapter. The last part describes the striking concentration phenomenon for the exponential measure that goes beyond the 96

concentration for Gaussian measures. From a probabilistic point of view, most of the concentration and deviation inequalities we will obtain extend classical inqualities on sums of independent random variables to arbitrary Lipschitz (or Lipschitz and convex) functionals of the sample, leading to powerful tools in applications (see Chapters 7 and 8).

4.1 Martingale methods


We examine here some useful martingale inequalities that yield a variety of concentration results. Let ( F P) be a probability space. Denote by E (f ) expectation of an integrable real-valued function f on ( F ) with respect to P. Consider a nite ltration of sub- - elds f g = F0 F1 Fn = F of the - eld F . For every i = 1 : : : n, set di = E Fi (f ) ; E Fi 1 (f ) where E Fi denotes conditional expectation with respect to the sub- - eld Fi. The rst lemma bounds deviation of a function f from its mean in terms of the sizes of the increments di . It is the discrete analogue of (2.34). Lemma 4.1. Let f be integrable with respect to P. For every r 0, ; 2 2 P f E (f ) + r e;r =4D Pn 2 where D2 i=1 kdi k1 . Applied simultaneously to ;f , Lemma 4.1 yields the concentration of f around its mean ; 2 2 P f ; E (f ) r 2 e;r =2D : (4:1)
;

Proof. By convexity, for every 2 R and ;1 u +1, u e + 1 ; u e; : e u 1+ 2 2 97

Hence, by homogeneity and since E Fi 1 (di ) = 0, for every 2 R,


;

cosh kdi k1 e 2kdi k2 : The properties of conditional expectations then prove that
1

E Fi;1 (e di )

n i=1 di

=E e

n;1 di Fn;1 dn i=1 E (e )


E
;

; e 2kdnk2 =2 E e
1

n;1 i=1 di

e i=1 di e D =2 : P Since f ; E (f ) = n i=1 di , by Chebyshev's exponential inequality, for every r 0,


2 2

Iterating,

; P

E (f ) + r

; e; r E e

n i=1 di

2 2 e; r+ D =2:

Minimizing in 0 thus yields the inequality of the lemma. Lemma 4.1 is the exact martingale extension of the Hoe ding type inequality (1.23). These type of inequalities have been proved extremely useful in the study of limit theorems in classical probability theory and in discrete algorithmic mathematics MD2]. Lemma 4.1 is of special interest once the decomposition

g = F0 F1

Fn = F

in sub- - elds is such that the martingale di erences di of a given function f can be controlled. The following metric decomposition is a useful tool in this regard. A nite metric space (X d) is of length at most ` if there exist an increasing sequence fX g = X 0 X 1 : : : X n = ffxggx2X of partitions of X (X i ; is a re nement of X i;1) and positive numbers P 2 1=2 such that, if X i = fAi g1 j m , a1 : : : an with ` = n i j i=1 ai i i for all i = 1 : : : n, p = 1 : : : mi;1 and j k such that Aj Ak Aip;1 , there exists a one-to-one and onto function ' : Aij ! Aik such that d(x '(x)) ai for every x 2 Ai j . Note that ` is always smaller or equal to the diameter of X . 98

Theorem 4.2. Let (X d) be a nite metric space of length at most


`, and let be the normalized counting measure on X . Then, for every 1-Lipschitz function F on (X d) and every r 0,
;

fF

Fd + rg
2 2 e;r =16`

e;r2 =2`2 :

In particular,
(X d )(r)

r > 0:

Before turning to the proof of the theorem, let us present examples that motivate the de nition of spaces with controlled length in the above sense. Let rst X be the discrete cube f0 1gn equipped with the normalized Hamming metric and normalized counting measure . The obvious choice of partitions (adding one coordinate at each step) shows that f0 1gn is of length at most p n. Therefore, we deduce from Theorem 4.2 that
(f0 1gn ) (r)
2 e;r =16n

r > 0:

(4:2)

Compare with Theorem 2.9. Actually, the same would apply to any product measure with respect to the Hamming metric. Let indeed ( i i i ), i = 1 : : : n, be arbitrary probability spaces and let P be the product measure 1 n on the product space X = 1 n . A point x in X has coordinates x = (x1 : : : xn). Equip X = 1 n with the Hamming metric

d(x y) = Card f1 i n xi 6= yi g:
Choose the partitions X i as

Aik =

(xi+1 : : : xn )

xi+1 : : : xn 2 :

It is clear that the sequence of partitions (X i)1 i n full lls the de nition of the length of a metric space with all the ai 's equal to 1 by de nition of the Hamming metric. We thus recover in this way (1.24). 99

A further, non product, example is given by the symmetric group n of permutations of f1 : : : ng equipped with the metric d( ) = Card fi (i) 6= (i)g and uniform measure (assigning mass (n!);1 to each permutation). For every i = 1 : : : n, consider the elements

Aj1 ::: ji =

n (1) = j1

: : : (i) = ji

where the j1 : : : ji are distinct in f1 : : : ng. These elements form a partition X i of n. Now, given A = Aj1 ::: ji as above, let B = Aj1 ::: ji p and C = Aj1 ::: ji q contained in A. Let be the transposition that changes p with q. De ne ' : B ! C by letting '( ) = . It is easily seen that d( '( )) 2 so that we may take the ai 's equal to 2. Corollary 4.3. Let be uniform measure on the symmetric group n . For any 1-Lipschitz function F on ( n d) and any r 0,
;

fF

Fd + rg
2 e;r =32n

2 e;r =8n:

In particular,
( n d ) (r )

r > 0:

We now present the proof of Theorem 4.2. Proof of Theorem 4.2. Denote by Fi the - eld generated by the partition X i , i = 1 : : : n. Given F integrable with respect to P , set Fi = E Fi (F ), i = 0 1 : : : n (F0 = E (F )). For any B = Ai j i;1 , Fi is constant on B and C and and C = Ai contained in A p k

FijB ; FijC

ai

(4:3)

where FijB , FijC denote the restriction of Fi on B, respectively C . P Indeed, FijB = (Card (B));1 2B F ( ) while
X X ; ; ; FijC = Card (C ) ;1 F ( ) = Card (B) ;1 F '( )

2C

2B

100

so that (4.3) immediately follows. Therefore, for any A, B as before,

FijB ; Fi;1jA
Indeed, 1 Fi;1jA = N
X

ai : FijA ai :

with N = Card fC C Ag so that 1 X F ;F FijB ; Fi;1jA N i jB ijC C A

C A

Hence kdi k1 ai for any i = 1 : : : n and Lemma 4.1 allows us to conclude. The proof of Theorem 4.2 is complete. The examples of the discrete cube and symmetric group suggest one further extension of the method. Given a compact metric group G with translation invariant metric d, and a closed subgroup H of G, on can de ne a natural metric on the quotient G=H by ; (gH g0 H ) = d(g g0 H ) = d g0;1 g H : The translation invariance of d implies that this is actually a metric and that d(g g0 H ) does not depend on the representative g of gH . Theorem 4.4. Let G be a compact metric group with a translation invariant metric d and let G = G0 G1 Gn = feg be a decreasing sequence of closed subspaces of G.;Denote by ai Pn 2 the diameter of Gi;1 =Gi , i = 1 : : : n, and let ` = i=1 ai 1=2. Let be Haar measure on G. Then,
(G d )(r)
2 2 e;r =8`

r > 0:

Proof. We show that for every mean zero 1-Lipschitz function F on (G d) and every r 0,
;

fF rg
101

e;r2 =2`2 :

Let Fi , i = 0 : : : n, be the -algebra generated by the sets fgGigg2G . Note that if gGi;1 hGi, then g;1h 2 Gi;1. If both gGi;1 h1 Gi and gGi;1 h2Gi , let s 2 Gi;1 be such that Diam (Gi;1 =Gi ) = d(g;1 h1 g;1h2Gi ) = d(g;1h1 g;1h2s): De ne then ' : h1Gi ! h2Gi by '(h1r) = h2sr. Then

d h1r '(h1 r) = d(H1 h2s) = Diam (Gi;1 =Gi ) = ai :


Therefore, if Fi = E Fi (F ), i = 0 : : : n, the oscillation of Fi on each atom of Fi;1 is at most ai , so that, in the notation of Lemma 4.1, kdik1 ai . Lemma 4.1 then allows us to conclude. The theorem is established. In case of the symmetric group, we may take G = n, Gi = n;i and ai = 2. One further instance of this theorem is the n-dimensional torus Tn equipped with the normalized product Pn 1 measure and the ` -metric i=1 jsi ;tij. Taking Ti , i = 0 : : : n, as subgroups, one gets ai 1 for every i so that, by Theorem 4.4,
(T n d ) (r)
2 2 e;r =16n

r > 0:

This result has to be compared with Proposition 2.7 where Euclidean metric was in forced and for which dimension free concentration holds. We complete this section by another application of the martingale method to norms of sums of independent random vectors (one can also use the language of supremum of empirical processes as in Chapter 7). Let Y1 : : : Yn be independent integrable random variables on some probability space (n F P) taking values P in a Banach space (E k k) and let S = i=1 Yi. Now

kS k ; E kS k =

n X i=1

di

can be written as a sum of martingale di erences with respect to the ltration Fi generated by Y1 : : : Yi , i = 1 : : : n, with the 102

property that, for every i = 1 : : : n,

jdij =

by the triangle inequality and independence. In a sense, kS k ; Pn E (kS k) is as good as the sum i=1 kYi k, so that, provided E (kS k) is under control, the classical one-dimensional results apply. As a consequence of Lemma 4.1, we thus obtain the following extension of (1.23). Corollary 4.5. Let Y1 : : : Yn be independentP bounded random n vectors in a Banach space (E k k) and let S = i=1 Yi. For every r 0, n o ; 2 2 P kS k ; E kS k r 2 e;r =2D Pn 2 where D2 i=1 kYi k1 .

; E Fi 1 kS k ; ; = E Fi ; E Fi 1 kS k ; kS ; Yi k ; kYi k + E kYi k
; ;

; F E i

4.2 Convex hull approximation


In this section, we investigate concentration for product measures through geometric considerations involving convexity properties. Let ( i i i ), i = 1 : : : n, be a arbitrary probability spaces and let P be the product measure 1 n on X = 1 n. A point x in X has coordinates x = (x1 : : : xn ). Recall the Hamming distance on X de ned by d(x y) = Card f1 i n xi 6= yi g: As we have seen in (1.24) and (4.2), the concentration function of any product probability P on (X d) satis es ;cr2 =n r > 0 (4:4) ( n d P ) (r) e 1 , for for some numerical constant c > 0. In particular, if P (A) 2 n most the elements x in , there exists y 2 A within distance pn ofof x. 103

The same result actually applies to all the Hamming metrics

da(x y) =
Pn

n X i=1

ai 1fxi 6=yig

a = (a1 : : : an) 2 Rn +

with jaj2 = i=1 a2 i instead of n on the right-hand side of (4.4). Note that various measurability questions may arise in such statements. These are actually completely unessential and will be ignored below (start for example with a nite or Polish probability space ). We now introduce a control that will be uniform in the Hamming metrics da whenever jaj = 1. We namely de ne for every subset A of X = 1 n and every x 2 X a distance 'c A (x) from x to A as 'c A (x) = sup da (x A): This de nition somewhat hides the combinatorial and convexity properties of the functional 'c A that will be needed in its investigation. For a subset A X and x 2 X , let UA(x) = s = (si )1 i n 2 f0 1gn 9 y 2 A such that yi = xi if si = 0 : One can use equivalently the collection of the characteristic functions 1fxi6=yi g, y 2 A. Denote by VA (x) the convex hull of UA (x) as a subset of Rn. Note that 0 2 VA(x) if and only if x 2 A. One may then measure the distance from x to A by the Euclidean distance d(0 VA (x)) from 0 to VA(x). Now, it is easily seen that ; 'c jyj: (4:5) A (x) = d 0 VA (x) = y2inf V (x)
A

jaj=1

Namely, if d(0 VA(x)) r, there exists z in VA(x) with jzj Let a 2 Rn + with jaj = 1. Then inf ha yi ha zi jzj r: y2V (x)
A

r.

Since

y2VA (x)

inf ha yi = s2U inf(x)ha si = da(x A)


A

(4:6)

104

'c A (x) r. Conversely, let z 2 VA (x) be such that jz j = d(0 VA (x)) and set a = z=jzj. Let y 2 VA (x). By convexity, for every 2 0 1], y + (1 ; )z 2 VA (x) so that z + (y ; z) 2 = y + (1 ; )z 2 jzj2 :
Hence, as ! 0, hy ; z zi 0, that is

ha yi jzj = d(0 VA (x))


for every y 2 VA(x). Now, by (4.6),

'c ha yi d 0 VA (x) A (x) da (x A) = y2inf V (x)


A

which is the claim. The next theorem extends the concentration (4.4) to this uniformity with dimension free bounds. n , and Theorem 4.6. For every subset A of X = 1 every product probability P on X ,
Z

A )2 =4 dP e('c

1 : P (A) 1 e;r2 =4 : P (A)

In particular, for every r 0,

P f'c A rg

The general scheme of proof is by induction on the number of coordinates together with geometric argument involving projections and sections to the lower dimensional subspaces. The main di culty in this type of statements is to nd the adapted recurrence hypothesis expressed here by the exponential integral inequalities. For simplicity in the notation, we assume that we are given a probability space ( ) and that P is the n-fold product n of on X = n . This is no loss in generality. Since the crucial inequalities will not depend on n, we need simply to 105

work on products of ( e e) = consider the coordinate map

n i i=1 i

Nn

i=1 i with itself and

x e = (x e1 : : : x en ) 2 e n ! (x ei )i 2

x ei = ((x ei )1 : : : (x ei )n ) 2 e

that only depends on the i-th factor. Proof of Theorem 4.6. The case n = 1 is easy. To go from n to n + 1, let A be a subset of n+1 and let B be its projection on n . Let furthermore, for ! 2 , A(! ) be the section of A along !. If x 2 n and ! 2 , set z = (x !). The key observation is the following: if s 2 UA(!)(x), then (s 0) 2 UA(z), and if t 2 UB (x), then (t 1) 2 UA(z). It follows that if 2 VA(!)(x), 2 VB (x) and 0 1, then ( + (1 ; ) 1 ; ) 2 VA (z). By the description (4.5) of 'c A and convexity of the square function,
2 (1 ; )2 + 'c + (1 ; ) 2 A (z ) (1 ; )2 + j j2 + (1 ; )j j2 :

Hence,
2 (1 ; )2 + 'c (x)2 + (1 ; )'c (x)2 : 'c A (z ) B A(!)

Now, by Holder's inequality and the induction hypothesis, for every ! in ,


Z

e('A(x !)) =4dP (x) n


2

e(1; )2=4

2 1 e(1; ) =4 P (A (!))

c 2 e('A(!) ) =4 dP

1 P (B)

1;

B )2 =4 dP e('c

1;

that is,
Z

e('A(x !)) =4 dP (x)


2

1 e(1; )2=4 P (A(!)) ; : P (B) P (B) 106

Optimize now in . To this aim, use that, for every u 2 0 1],


; e(1; ) =4 inf u 2 0 1]
2

2 ; u:

(4:7)

Taking = 1 + 2 log u if u e;1=2 and = 0 otherwise, and taking logarithms, it su ces to show that log(2 ; u) + log u + (log u)2 0 that is established through elementary calculus. Therefore, by (4.7),
Z

e('A(x !)) =4 dP (x) n


2

1 2 ; P (A(!)) : P (B) P (B) 1 2 ; P (A) P (B) P (B) 1 P (A)

To conclude, integrate in ! so that, by Fubini's theorem,


Z

n+1

e('A(x !)) =4 dP (x)d (!)


2

since u(2 ; u) 1 for every real number u. Theorem 4.6 is established. The strength of the functional 'c A (x) with respect to the Hamming metric is that all choices of ai's (depending upon x) are possible. This makes Theorem 4.6 a principle of considerable power in applications, as will be further demonstrated in Chapters 7 and 8. To illustrate it, let = f0 1g and the probability measure that gives mass p to 1. Denote by P = n the product measure of on n. Consider a subset A of n that is hereditary in the sense that if x = (x1 : : : xn ) 2 A and if y = (y1 : : : yn ) 2 n is such that yi xi for all i = 1 : : : n, then y 2 A. For x 2 n, set J = f1 i n xi = 1g and N (x) = Card(J ). Choosing ai = 1 if i 2 J and 0 otherwise shows that

d(x A) 'c A (x) N (x)


107

where d(x A) is the usual Hamming metric from x to A. Therefore, for every r s > 0,
;1=2 g + P y : N (y ) > s P f'c A rs 1 e;r2 =4s + P ; y : N (y) > s : P (A) Since the last term is very small for s > pn, we see that Theorem 4.6 produces the correct order (pn);1 in the coe cient of r2 . Since the convex hull functional 'c A (x) is not de ned by a distance between points but rather by a distance between points and sets, its application to Lipschitz functions requires some care. Consider namely a function F : X ! R on the product space X= 1 n such that for some C > 0 and every x 2 X there exists a = a(x) 2 Rn + with jaj = 1 such that for every y 2 X , F (x) F (y) + Cda(x y): (4:8) The important feature in the extension provided by Theorem 4.6 is that a may depend on x. Let then A = fF mg. By (4.8) and the de nition of 'c A (x), it is plain that F (x) yinf F (y) + Cda(x A) m + C'c A(x) 2A

P d( A) r

for every x. Hence, by Theorem 4.6, for every r 0, ; ; 1 e;r2 =4 : P fF m + Crg P f'c A rg P (A) In other words,

P fF mg P fF m + rg e;r2 =4C2 r 0: (4:9) Choose then successively m = mF and m = mF ; r where mF is a median of F for P to get the following result. Corollary 4.7. Let P be a product measure on X and let F : X ! R satisfy (4.8) for some constant C > 0. Then, for every r 0, ; P jF ; mF j r 4 e;r2 =4C2
108

where mF is a median of F for P . Corollary 4.7 applies similarly if (4.8) is replaced by F (y) F (x) + Cda(x y): The following is a further important consequence that actually motivated the abstract statement of Theorem 4.6. Namely, it is easy to check that if = 0 1] and if dB is the Euclidean distance to the convex hull B = Conv(A) of A, P then dB 'c A. n k k k Indeed, given x 2 0 1] , for any y 2 A, 0, k = 1,

dB (x)

x;

n X i=1 i=1

k yk k (xi ; y k ) i k1
2 1=2 2 1=2

n X X k

fxi 6=yik g

since xi yik 2 0 1]. The conclusion follows from the combinatorial description (4.5) of 'c A (x). Corollary 4.8. For any product probability measure P on 0 1]n, and any measurable set A 0 1]: Z 2 edB =4 dP P (1A) where B is the convex hull of A. Let then F be a 1-Lipschitz convex function on 0 1]n. Set A = fF mg where m 2 R. Since F is convex, A is convex. For every x 2 0 1]n, such that d(x A) < r, F (x) < m + r. Hence, by Corollary 4.8, ; ; P fF mg P fF m + rg e;r2 =4 : Choosing successively as before m = mF and m = mF ; r where mF is a median of F for P , we get the following important consequence of Theorem 4.6. 109

Corollary 4.9. For every product probability P on 0 1]n, every n


convex 1-Lipschitz function F on R , and every r 0,

P jF ; mF j r

4 e;r2 =4 :

Corollary 4.9 extends to probability measures i supported on ai bi ], i = 1 : : : n. In particular, if P is a product measure on a b]n and if F is convex and 1-Lipschitz on Rn, for every r 0,

P jF ; mF j r

2 2 4 e;r =4(b;a) :

(4:10)

By Proposition 1.8, we may replace the median by the mean up to numerical constants. The numerical constant 4 in the exponent may be improved to get close to the best possible value 2. This concentration inequality is very similar to Gaussian concentration with however F convex. This convexity assumption cannot be omitted in general as shown by the following example. Let P be uniform product measure on f0 1gn and assume that n = 2k is even. Let P A = x 2 f0 1gn n i=1 xi k and F (x) = d(x A), x 2 Rn where d is Euclidean p Pn metric. Clearly kF kLip = 1 and 0 is a median of F . Now, if i=1(2xi ; 1) k n such that for some > 0 to be speci ed below, for all y 2 f 0 1 g Pn i=1 yi k , 2 k Hence d(x A) that
n p

n X i=1

(xi ; yi )

n X i=1

jxi ; yi j2:

2k

1=4 . It follows from the central limit theorem

x d(x A)

n 2 2

1=4 o

n 1 X xp n i=1 (2xi ; 1) p2

1 4

110

for > 0 su ciently small independent of n. Therefore concentration is not available. A typical application of Corollary 4.9 concerns supremum of linear functionals. In a probabilistic language, consider independent real-valued random variables Y1 : : : Yn on some probability space ( A P) such that for real numbers u1 : : : un, ui Yi ui + 1, i = 1 : : : n. Set

Z = sup

n X

t2T i=1

ti Yi

(4:11)

where T is a (countable) family of vectors t = (t1 : : : tn) 2 Rn such that supt2T jtj = < 1. Applying Corollary 4.9 to

F (x) = sup

n X

t2T i=1

ti xi

x = (x1 : : : xn) 2 Rn

under the product measure of the laws of Yi ; ui, i = 1 : : : n, yields the following consequence that has to be compared with (1.23) for T reduced to one point. Corollary 4.10. Let Z be as in (4.11) and denote by mZ a median of Z . Then, for every r 0,
; P

jZ ; mZ j r

4 e;r2 =4 2 :

Theorem 4.6 may be generalized along the same lines of proof to non-quadratic distances. One possibility is to measure the distance to VA (x) in another way. To this task, given 0, set

'A (x) = y2inf V (x)


A

n X i=1

(yi )

where

u (1 ; u) = u log u ; (1 + u) log 11+ +


111

u 2 0 1]: (4:12)

The reader should observe that 'A corresponds in the preceding 2 c notation to ('c A ) rather than to 'A . n , and Theorem 4.11. For every subset A of X = 1 every product probability P on n, Z 1 : e'A dP P (A) In particular, for every r 0, ; 1 e;r : P f'A rg P (A ) We refer to Tal7] for a proof of Theorem 4.11 along the lines of the proof of Theorem 4.6. One striking aspect of this result is that the form of the function is exactly determined by the optimization (4.7) when dealing with the extra parameter . We present a proof of Theorem 4.11 with the tool of information inequalities in Chapter 6.

4.3 Control by several points


The preceding convex hull approximation suggest the possibility of various notions of enlargements in product spaces that go outside the scope of distances. The following is a control by a nite number of points that is not metric. As in the previous section, let ( i i i ), i = 1 : : : n, be arbitrary probability spaces and let P = 1 n be the product measure on X = 1 n . If q is an integer 2 and if A1 : : : Aq are subsets of n , let, for every x = (x1 : : : xn) in n , 'q (x) = 'q 1 A ::: Aq (x) be de ned by

'q (x) = inf k 0 9 y1 2 A1 : : : 9 yq 2 Aq such that Card 1 i n xi 2 = fyi1 : : : yiq g k : (We agree that 'q = 1 if one of the Ai 's is empty.) If, for every i = 1 : : : n, Ai = A for some A n, 'q (x) k means that the
112

coordinates of x may be copied, at the exception of k of them, by the coordinates of q elements in A. Using again a proof by induction on the number of coordinates, we establish the following result. Theorem 4.12. Under the previous notations,
Z

q'q (x) dP (x)

q Y i=1

1 : P (Ai ) 1 : P (Ai )

In particular, for every integer k,

f'q

kg

q ;k

q Y i=1

Since log u u ; 1, u 0, it su ces to show that Z 1 d + qZ gd = Z 1 + qg d q + 1: g g

Proof. We may assume that X = n is the n-fold product of a probability space ( ) with the product measure P = n. One 1 rst observes that if g is a function on such that q g 1, then Z 1 d Z gd q 1: (4:13) g

1 + qu q + 1 for 1 u 1. But this is obvious since u q Let gi, i = 1 : : : q, be functions on such that 0 gi 1. 1 = min(q min1 i q 1 ) yields Applying (4.13) to g given by g gi
Z

1 d min q 1min i q gi

q Z Y i=1

gi d

(4:14)

since gi g for every i = 1 : : : q. We prove the theorem by induction over n. If n = 1, the result follows from (4.13) by taking gi = 1Ai . Assume Theorem 4.12 has been proved for n and let us prove it for n + 1. Consider 113

sets A1 : : : Aq of n+1. For ! 2 , consider Ai (!), i = 1 : : : q, as well as the projections Bi of Ai on n, i = 1 : : : q. Note that if we set gi = P (Ai (!))=P (Bi ) in (4.14) we get by Fubini's theorem that
Z

q Y 1 1 min q P (Bi ) 1min ij ) d j q P ( C i=1 i=1 q Y i=1 P

q Y

(4:15) (Ai )

where C ij = Bi if i 6= j and C ii = Ai (!). The basic observation is now the following: for (x !) 2 n ,
q 'q A1 ::: Aq (x ! ) 1 + 'B 1 ::: B q (x)

and, for every 1 j q,


q 'q A1 ::: Aq (x ! ) 'C 1j ::: C qj (x):

It follows that
Z Z Z

'A1 ::: Aq (x !)dP (x)d (! ) q n+1


;

min q q'B1 ::: Bq (x) 1min q'C1j ::: Cqj (x) dP (x)d (!) j q n+1 min q
Z

'B1 ::: Bq (x)dP (x) q n

q Y 1 1 min q min i ij ) d (! ) 1 j q P ( B ) P ( C i=1 i=1 by the recurrence hypothesis. The conclusion follows from (4.15) and the proof of Theorem 4.12 is thus complete. In the applications, q is usually xed, for example equal to 2. Theorem 4.12 then shows how to control, with a xed subset
Z

q Y

min 1 j q

'C 1j ::: C qj (x) dP (x) d (!) q n

114

A, arbitrary samples with an exponential decay of the probability in the number of coordinates which are neglected. Let us consider a class of functions to which this property may be applied. For Q I f1 : : : ng, set I = i2I i and denote by the collection of all I 's, I f1 : : : ng. If x = (x1 : : : xn) 2 1 n and I f1 : : : ng, we denote similarly xI = (xi )i2I 2 I . A function F : ! R+ is said to be monotone if F (xI ) F (xJ ) whenever I J . F is said to be subadditive if for I J disjoint in f1 : : : ng and xI J 2 I J , ; F xI J F (xI ) + F (xJ ): Typical examples of monotone and subadditive functions are the following. If i = 0 1) for every i = 1 : : : n, set F (xI ) = P i2I xi . If i are subsets of a normed space (X k k), put F (xI ) = E
X

where "i are independent symmetric Bernoulli random variables. Corollary 4.13. Let P be any product measure on X = 1 n and let F be monotone and subadditive on . We denote in the same way the restriction of F to X . Let m be such that 1 . Then, for every integers k q 1, and every P (fF mg) 2 r 0, ; ; P fF qm + rg 2q q;(k+1) + P x I Card( max F (xI ) r : I) k
q k g, Proof. Let A = fF mg and let 'q = 'q A ::: A . If x 2 f' there exist y1 : : : yq in A such that Card (I ) k where I = 1 i n xi 62 fyi1 : : : yiq g : Take then a partition (Jj )1 j q of I c = f1 : : : ng n I such that xi = yij if i 2 Jj . Then, by monotonicity and subadditivity,

i2I

"ixi

F (xI c )

q X j =1

j yJ j

q X j =1

j F yf 1 ::: ng

qm:

115

The conclusion immediately follows from Theorem 4.12. Applied to a sum of non-negative independent random variables yields the following. Corollary 4.14. Let Y1 : : : Yn be non-negative independent random variables on some probability space ( A P) and set S = 1 and denote by Y1 + + Yn. Let m be such that P(fS mg) 2 fY1 : : : Yn g is the non-increasing rearrangement of the sample fY1 : : : Yng. Then, for every integers k q 1, and every r 0,
; P

fS qm + rg

2q q;(k+1) + P

k X i=1

Yi

If the random variables are bounded by some C > 0, k is usually chosen of the order of Cr so that the second term on the right-hand side of the inequality of Corollary 4.14 cancels. Let T be a family of n-tuples t = (t1 : : : tn), ti 0. It is plain that the preceding argument leading to Corollary 4.14 applies in the same way to n X S = sup ti Yi to yield
; P

t2T i=1

fS qm + rg

2q q;(k+1) + P

k X i=1

Yi

(4:16)

where = supfti 1 i n t 2 T g. The case of random variables with arbitrary signs will be studied in this way in Chapter 7 with the tool of (probabilistic) symmetrization.

4.4 Convex in mum-convolution


In this section, we develop the tool of in mum-convolution inequalities introduced in Section 1.6 to product measures to recover some of the results of Section 4.2. Recall from Section 1.6 116

that given a measurable non-negative function c on a topological vector space X and a real-valued measurable function f on X , we denote by Qcf the in mum-convolution of f with respect to the cost c by

Qcf (x) = yinf f (y) + c(x ; y) 2X

x 2 X:

If is a probability measure on the Borel sets of X , and c a non-negative measurable cost function on X , recall we say that satis es an in mum-convolution inequality with respect to the cost c if for all bounded measurable functions f on X ,
Z

eQc f d

e;f d

1:

(4:17)

We have seen in Proposition 1.18 and (1.30) how (4.17) is simply related to concentration by the fact that for every Borel set A and every r 0, ; 1 e;r : 1 ; A + fc < rg (4:18) (A) As a consequence of Proposition 1.19, each time a measure satis es the in mum-convolution inequality (4.17), the product measure n on X n satis Pn es the concentration inequality (4.17) with the cost function i=1 c(xi ), x1 : : : xn 2 X . We apply this observation in the context of the in mumconvolution inequality for convex functions. Let us say that a probability measure on the Borel sets of X satis es a convex in mum-convolution inequality with respect to a convex cost c if for all bounded measurable convex functions f on X ,
Z

eQc f d

e;f d

1:

(4:19)

It is easy to see that the convex in mum-convolution inequality also satis es the product property of Proposition 1.19. In the following statement, X1 : : : Xn is a family of normed vector spaces and a point x in the product space X = X1 Xn will be 117

denoted x = (x1 : : : xn ). The norms on the various Xi will be denoted in the same way k k. Theorem 4.15. Let X1 : : : Xn be a family of normed vector spaces and, for each i = 1 : : : n, let i be a probability measure supported by a set of diameter less than or equal to 1 on Xi . Then the product probability measure P = 1 n satis es the convex in mum-convolution on X with respect to the Pn inequality 2 cost function c(x) = 1 k x k , x 2 X. 4 i=1 i Proof. By the product property of the convex in mum-convolution inequality, it is enough to deal with a single probability measure supported by a set A of diameter less than or equal to 1 on a normed space (X k k). Let f be a convex function on A for which we may assume without loss of generality that inf f (A) = 0. Let x 2 A, " > 0 and a 2 A be such that f (a) ". For 2 0 1], and y = x + (1 ; )a, by convexity, 1 kx ; yk2 Qc f (x) f (y) + 4
2 f (x) + (1 ; )" + 1 4 (1 ; )

since Diam(A) 1. Choosing optimal , we deduce that Qc f (x) (f (x)) where (u) = > 1 : 4 One can check that e (u) 1, 0 u 2
8 > <u

; u2

1 if 0 u 2 if u 1 2: 0. Indeed, for

2 ; e;u for every u


2

1 ;eu;u2 + e;u e;u2=2 cosh u ; u2 2 2 while for u


Z

e;u =2 cosh(u) 1
Z

1 , e;u 2

2 ; e1=4 . It follows that 2;


Z

eQc f d

e;f d 118

e;f d

;1

which proves the theorem. The simple minded Theorem 4.15 has strong consequences. We may indeed recover in this way Corollary 4.8 and thus the main Corollary 4.9. Let indeed Xi = R for every i = 1 : : : n and probability measures i supported by 0 1]. Given a convex set A, apply Theorem 4.15 to the (convex) function f equal to 0 on A 2 and to +1 outside. Since Qc f = 1 4 d( A) , the claim follows.

4.5 The exponential measure


In this section, we consider a special concentration property of the exponential measure. Let n be the product measure on Rn when 1 e;jxj with each factor is endowed with the measure of density 2 respect to Lebesgue measure. Denote by B2 the Euclidean unit ball and by B1 the `1-unit ball in Rn, i.e.
i=1 Theorem 4.16. For every Borel set A in Rn and every r > 0, ; p 1 e;r : 1 ; n A + 6 rB2 + 9rB1 n (A)

B1 = x = (x1 : : :

xn ) 2 Rn

n X

jxi j 1 :

A striking feature of Theorem 4.16 is that it may be used to improve some aspects of concentration for Gaussian measures especially for cubes. Consider indeed the increasing map : R ! R that transforms into the one-dimensional canonical Gaussian measure . It is a simple matter to check that ; (x) ; (y) K min jx ; yj jx ; yj1=2 x y 2 R (4:20) for some numerical constant K > 0. The map : Rn ! Rn de ned by (x) = ( (xi ))1 i n transforms n into n. Consider now Borel a set A of Rn such that n(A) 1 2 . Then ; ; p n ;1 (A)+6 rB2 + 9rB1 ; p = n ;1(A) + 6 rB2 + 9rB1 1 ; 2 e;r : 119

However, it follows from (4.20) that


;

;1 (A) + 6prB2 + 9rB1

A + K 0 rB2 :

Thus Theorem 4.16 implies concentration for n. To actually illustrate the improvement for some classes of sets, let

A = x 2 Rn 1max jx j m i n i
where m =p m(n) is chosen so that n(A) 1 2 (and hence m(n) is of order log n). Then, when r 1 is very small compared to log n, it is easily seen that actually pr ; ;1 p 6 9r B (A) + rB2 + rB1 A + C1 p B 2+ p 1 log n log n r r rB : A + C2 log n 2 To establish Theorem 4.16, we show that the exponential measure satis es the in mum-convolutionp inequality (4.17) for a cost that gives rise to the enlargement 6 rB2 + 9rB1 through (4.18). By Proposition 1.19, it will enough to deal with the dimension one. By a further symmetrization argument, we need only consider the one-sided exponential measure with density e;x with respect to Lebesgue measure on R+. Proposition 4.17. The probability measure with density e;x on R+ satis es the in mum-convolution inequality with cost function c(x ; y) on R R where 8 1 x2 > < if jxj 2 c(x) = > 18 ; : 2 jxj ; 1 if jxj > 2 : 9
Proof. Let f be a bounded measurable function on R+. Let

J0 =

e;f (x);xdx and J1 = 120

eQc f (y);y dy:

For 0 < t < 1, de ne x(t) and y(t) by the relations


Z x(t)

e;f (x);xdx = tJ0 and

Z y(t)

eQc f (y);y dy = tJ1:

By di erentiation,

x0 (t) = J0 ef (x(t))+x(t) and y0 (t) = J1 e;Qc f (y(t))+y(t):


Here y0 is the usual derivative while x0 is understood in the weak sense of the Sobolev space H1. Since
;

Qc f y(t)
we get

f x(t) + c x(t) ; y(t)

y0 (t) J1 e;f (x(t));c(x(t);y(t))+y(t): 1 (x(t) + y (t)) ; c(x(t) ; y (t)). We have Let now z(t) = 2 z0 (t) =
1 ; c0 ;x(t) ; y (t) 2

x0 (t) +

1 2

+ c0 x(t) ; y(t) y0 (t):

Now jc0 j 1 for x(t) and y for y(t), and using the 2 . Writing xp ; 1 inequality au + a v 2 uv for a = ef (x), we get

z0 (t)

0 x+f (x) 2;; c (x ; y ) J0 e + 1 + c0 (x ; y) J1 e;c(x;y)+y;f (x) 2 p p 1 2 (x+y); 2 c(x;y) 1 ; 4c0(x ; y)2 J0J1 e 1 p p 2 c(x;y): J0J1 ez(t) 1 ; 4c0 (x ; y)2 e 1

;1

Now, (1 ; 4c0 (u)2 ) ec(u) 1 for every u 2 R. Since c is even, it is enough to check that for u 0. For u 2, c0 is constant and c is increasing. For 0 u 2, it reduces to the elementary inequality e;v=18 1 ; 4v=81 for 0 v 4. As a consequence, it follows that p e;z(t) z0 (t) J0J1 which, after integration between 0 and 1, yields the result. Proposition 4.17 is established. 121

To complete the proof of Theorem 4.16, observe rst that Proposition 4.17 holds, up to numerical constants, for the twosided exponential measure. By Proposition 1.19, the product n n of the two-sided exponential measure Pn on R satis es the in mumconvolution inequality with cost i=1 c(xi ), x = (x1 : : : xn ) 2 Rn . It is then a mere exercise to check that

x 2 Rn

n X i=1

c(xi ) < r

6 rB2 + 9rB1

form which the conclusion follows by (4.18). As in Chapter 1, the concentration inequality on sets of Theorem 4.16 may be translated equivalently on functions. This is the content of the next proposition. Proposition 4.18. Let F be a real-valued function on Rn such that kF kLip a and such that its Lipschitz coe cient with respect to the `1 -metric is less than or equal to b, that is

F (x) ; F (y)
Then, for every r 0,
n ;fF

n X i=1

jxi ; yi j x y 2 Rn:

M + rg

1 min r r2 K exp ; K b a2

(4:21)

for some numerical constant K > 0 where M is either a median of F for n or its mean. Conversely, if the latter holds, the concentration result of Theorem 4.16 holds. The inequality of Proposition 4.18 extends in the appropriate sense the case of linear functions F . By Rademacher's theorem, the hypotheses on F are equivalent to saying that F is almost everywhere di erentiable with
n X i=1

j@iF j2 a2 and 1max j@ F j b i n i


122

almost everywhere. Proof. Apply rst Theorem 4.16 to A = fF mF g where mF is a median of F for n and note that by the Lipschitz bounds on F, p p A + 6 rB2 + 9rB1 F mF + 6a r + 9br : pr + 9br = s behaves The positive;solution s to the equation 6 a s2 like s min s b a2 . This yields the inequality of the proposition for the median. Using a routine argument (cf. Proposition 1.8), the deviation inequality from either the median or the mean are equivalent up to numerical constants (with possibly a further constant in front of the exponential function). Conversely, for A Rn and x = (x1 : : : xn ) 2 Rn, set

FA (x) = zinf 2A

n X i=1

min jxi ; zi j jxi ; zi j2 :


P

2 4r For r > 0, set further F = min(FA r). We have n i=1 j@i F j and max1 i n j@i F j 2 almost everywhere. Indeed, it is enough to prove this result for G = min(Gz r) for every xed z where

Gz (x) =

n X i=1

min jxi ; zi j jxi ; zi j2 :

Now, almost everywhere, and for every i = 1 : : : n, j@i Gz (x)j 2jxi ; zi j if jxi ; zi j 1 whereas j@iGz (x)j 1 if jxi ; zi j > 1. Therefore, max1 i n j@i Gz (x)j 2 and
n X i=1

@i Gz (x) 2 4

n X i=1

min jxi ; zi j jxi ; zi j2 = 4Gz (x)

1 , 0 is a which yields the announced claim. Now, if n(A) 2 median of F so that, by thehypothesis with M the median,

rg = n fF rg e;r=4K : Since fFA rg A + prB2 + rB1, this amounts to the inequality of Theorem 4.16.
123

n ;fFA

In the nal corollary, we apply Theorem 4.16 to pconcentration of products of the measure p with density cp e;jxj , 1 p < 1, with respect to Lebesgue measure on Rn . We recall the `p-unit ball Bp, 1 p < 1,

Bp = x = (x1 : : :
and

xn) 2 Rn

n X i=1

jxi jp 1

n Denote by n p the product measure of p on R . Theorem 4.19. There is a constant K > 0 nonly depending on 1 p < 1 such that for any Borel set A in R and any r > 0,
; p 1=p 1; n p A + r B2 + r Bp

B1 = x = (x1 : : : xn ) 2 Rn 1max jx j 1 : i n i

1 e;r=K : n p (A)

Proof. For p = 1, this is the content of Theorem 4.16. We show how his case implies the other ones. As for the comparison with Gaussian measures, consider the increasing transformation from R to R that maps 1 into p . Denote then by the map from Rn into itself given by (x) = ( (xi ))1 i n, x = (x1 : : : xn) 2 Rn. 1 n ;1 (A)) 1 . n n transforms n 1 into p . If p (A) 2 , then 1 ( 2 By Theorem 4.16,
n ; ;1 (A) + 6prB2 + 9rB1 1

1 ; 2 e;r

for every r > 0. Thus it su ces to show that


;

;1 (A) + 6pr B2 + 9rB1

A + K r B2 + r1=p Bp : (4:22)

To this task, we need the following simple but a bit technical estimates we leave to the reader (see Tal6]). First, for all x y 2 R, (x) ; (y)

K min jx ; yj jx ; yj1=p :
124

(4:23)

(Compare with (4.20).) Here and below, K > 0 denotes a constant only depending on p and possibly changing from line to line. Second, de ne #p (u) = u2 for juj 1 and #p (u) = up for juj 1 and set for r > 0,

Vp(r) = x 2 Rn
Then while for every p,

n X i=1

#p (xi ) < r :
(4:24) (4:25)

pr B + rB 12V (r) 2 1 1
Vp (r )

pr B + r1=p B : 2 p
A + Vp(Kr):

Therefore, by (4.24) and (4.25), in order to prove (4.22), it su ces to show that for every r > 0,
;

;1 (A) + 2V1 (r)

;1 ;1 (x) + z where x 2 A Consider Pn y 2 zi (A) +2V1 (r). Thus y = and i=1 #p( 2 ) < r. By (4.23),

(yi ) ; xi for every i. If jzi j 1,

K min jzij jzi j1=p zi2

) ; xi #p (yiK while if jzij 1, since #p increases,

) ; xi jz j: #p (yiK i Now #1 (u) = min(u2 juj) K#1 ( u 2 ) so that ) ; xi #p (yiK

#1 (zi ) K#1
125

;z

) ; xi Kr: #p (yiK i=1 The proof is easily completed. As in Proposition 4.18, the concentration result of Theorem 4.19 may be translated equivalently on functions. It should be noted that the argument developed in Proposition 2.8 transferring the Gaussian concentration to the spherical concentration may be applied similarly starting from Theorem 4.16 (or rather Proposition 4.18) to yield a concentration result on the `1 unit ball, or sphere, in Rn. More generally, on the basis of Theorem 4.19, if F is 1-Lipschitz with respect to the `p-norm, 1 p 2, for any r 0, ; n ; jF ; M j r C exp ; c min(rp n1;2=p ) (4:26) p where M is either mean or median. It thus yields concentration for uniform measure on the unit ball, or sphere, of `p in Rn, 1 p 2. Proposition 4.20. If is uniformly distributed on the unit sphere, n p or ball, of ` in R , 1 p 2, and if F is 1-Lipschitz with respect to the `p-norm, then ; jF ; M j r C e;cnr2 r 0 where M is either the mean or a median of F for and where C c > 0 are constants only depending on p. It should be noted that the relevant measure on the `psphere when p > 1 is the one given by (2.24) which is di erent from the usual surface measure. We refer to A-V], Sche4] for details and proof. In particular, we may conclude from these observations that (2.25) indeed holds for any 1 p < 1. In another direction, the paper S-Z2] deals with Lipschitz functions with respect to the Euclidean metric. Proposition 4.21. Ifn is uniformly distributed on the unit ball, or sphere, of `p in R , 1 p 2, and if F is 1-Lipschitz with respect to the Euclidean norm then ; C e;cnrp r 0 jF ; M j r 126

and

n X

where M is either the mean or a median of F for C c > 0 are numerical constants.

and where

Notes and Remarks


This chapter presents various views on concentration for product measures. Lemma 4.1 goes back to Az]. The generality of the martingale method as presented in Section 4.1 was understood by G. Schechtman Sche1] (from which most of the results of this section are taken, see also M-S]) after the important step by B. Maurey Mau1] on the symmetric group (Corollary 4.3). Its rst occurence however goes back to the Yurinskii method Yu1], Yu2] to bound norms of sums of independent random vectors from their mean in probability in Banach spaces (cf. Le-T]). Optimal concentration functions on the discrete cube via the martingale method are discussed in MD1] and Tal7]. The notes MD2] form a complete survey of the applications of the bounded di erence martingale method to algorithmic discrete mathematics. Recent developments of the methods towards uctuations results are due to O. Catoni Ca]. It soon became apparent that this method would not yield optimal results. In Tal1], M. Talagrand established Corollary 4.8 for uniform measure on the discrete cube. W. Johnson and G. Schechtman J-S2] extended the argument to all measures on the discrete cube that prompted M. Talagrand to the abstract formulation presented here as Section 4.2. There is an analogue of Theorem 4.6 on the symmetric group Tal7], extended recently by C. McDiarmid MD3], and presented in Section 8.2. Motivated by questions in probability in Banach spaces (cf. Chapter 7), M. Talagrand Tal2] established simultaneously Theorem 4.12. The rst proof uses delicate rearrangement and symmetrization techniques of isoperimetric avor. The short proof presented here appears in Tal7]. The memoir Tal7] by M. Talagrand is a landmark paper on concentration in product spaces that present de nitive results with applications to a number of questions on norms of sums of in127

dependent random vectors and discrete algorithmic probabilities. See also Tal8] for an introduction. The results of Sections 4.2 and 4.3 are taken from there and some applications are developed in Chapters 7 and 8. The application of in mum-convolutions to concentration properties of product measures was emphasized by B. Maurey in Mau2]. The results of Section 4.4 are taken from this reference. It was again the merit of M. Talagrand Tal3] to emphasize the concentration properties of the exponential measure. The exposition of Section 4.5 is entirely taken from the nice approach developed by B. Maurey Mau2]. The method is however already implicit in Talagrand's work that reaches moreover some isoperimetric statements. Theorem 4.19 is taken from Tal6]. The papers M-P] and S-Z1] develop the crucial argument to transfer the p measure n p to uniform measures on ` balls. Applications of this principle to concentration on the `p balls, 1 p 2, (Propositions 4.20 and 4.21) appear in S-S1], S-Z1], S-Z2], A-V], Sche4]. These results may be used in various embedding questions (cf. Sche5]).

128

5. ENTROPY AND CONCENTRATION

In this chapter, we illustrate how normal concentration properties may follow from a functional logarithmic Sobolev inequality. This is in parallel with the exponential concentration under spectral bounds investigated in Section 3.1. Although rather elementary, this observation is a powerful scheme which allows us to both establish some new concentration inequalities and recover several of the concentration results in product spaces presented in the previous chapter due to the basic product property of entropy. The rst section presents the logarithmic Sobolev inequalities introduced by L. Gross Gros1] and develops the basic scheme from logarithmic Sobolev inequalities to concentration that is going back to some unpublished observation by I. Herbst. We next investigate entropy of product measures and their applications to concentration in product spaces. Section 5.3 analyzes with the tool of logarithmic Sobolev inequalities the concentration properties of the exponential measure while section 5.4 is devoted to some discrete analogues. The last section describes related covariance identities that also entail measure concentration.

5.1 Logarithmic Sobolev inequalities and concentration


We present in this section the Herbst argument leading to measure concentration from a logarithmic Sobolev inequality. Given a probability measure on some measurable space ( ), for every non-negative measurable function f on ( ), 129

de ne its entropy as Ent (f ) = f log fd ; fd log fd if f log(1 + f )d < 1, +1 if not. Note that Ent (f ) 0 by Jensen's inequality and that entropy is homogeneous of degree 1. We introduce the concept of logarithmic Sobolev inequality. To avoid some technical questions, let us consider rst the case of the Euclidean space Rn. A probability measure on the Borel sets of Rn is said to satisfy a logarithmic Sobolev inequality if for some constant C > 0 and all smooth enough functions f on Rn , Ent (f 2 ) 2C jrf j2d :
Z R Z Z Z

(5:1)

Here rf denotes the usual gradient of f and jrf j its Euclidean length. By smooth we understand above and below enough regularity so that the various terms in (5.1) make sense. The logarithmic Sobolev inequality has to be compared with the Poincare inequality (or spectral gap inequality) Var (f ) C jrf j2d where we recall that Var (f ) =
Z Z Z

(5:2)
2

f 2d

fd

is the variance of (a square integrable) function f . Applying (5.1) to 1 + "f and letting " ! 0 shows by a standard Taylor expansion that the logarithmic Sobolev inequality (5.1) is a stronger statement than the Poincare inequality (5.2). This observation moreover motivates the choice of the normalization 2C in (5.1). The basic example of the canonical Gaussian measure on Rn satis es (5.1) (and thus also (5.2)) with C = 1. Constant 1 in the Poincare inequality for may also be described as the eigenvalue of the rst Hermite polynomial in the orthogonal decomposition 130

of L2( ). Although we already provided a number of arguments leading to concentration of Gaussian measures, we present, for the matter of completeness, a proof of the logarithmic Sobolev inequality for Gaussian measures (that will lead below to concentration). The argument relies on the semigroup tools of Section 2.3. Theorem 5.1. For every smooth enough function f on Rn, Ent (f 2 ) 2 jrf j2 d :
Z

Proof. Recall from Section 2.3 the second-order di erential operator L = ; x r on Rn with associated semigroup (Pt)t 0 called the Ornstein-Uhlenbeck semigroup (cf. Bak1]). In this particular example, (Pt )t 0 admits an explicit representation as
; Ptf (x) = f e;t x+(1;e;2t )1=2 y d (y) t 0 x 2 Rn: (5:3) Z

Let f be smooth and non-negative on Rn. To be more precise, we take f smooth and such that " f 1=" for some " > 0 that we take toR 0 at the end of the argument. Since P0f = f and limt!1 Pt f = fd , write

dt Pt f log Pt d dt: By the chain rule formula and integration by parts for L, d Z P f log P d = Z LP f log P fd + Z LP fd t t t t dt t Z Ptf j2 d 1 = ; 2 jrP f
0

Ent (f ) = ;

1d

since is invariant under the action of Pt and thus LPt fd = 0. Now, by the integral representation (5.3), for every t 0,

rPtf = e;t Pt(rf )


131

and thus

e;t Pt jrf j : By the Cauchy-Schwarz inequality for Pt ,

rPtf

(5:4)

Pt jrf j
Summarizing, Ent (f ) 1 Z 1e;t 2 0
Z

2 Pt f Pt jrff j :

Z 2 2 jr f j 1 Pt f d dt = 2 jrff j d :

By the change of f into f 2 the theorem is established. The preceding proof works similarly for all log-concave measures d = e;U dx such that Hess U (x) c Id > 0 for some c > 0 uniformly in x 2 Rn. U dx where, for some c > 0, Hess U (x) Theorem 5.2. Let d = e;n c Id > 0 uniformly in x 2 R . Then for all smooth functions f on Rn , Z 2 2 Ent (f ) c jrf j2 d : In order to establish Theorem 5.2, one should however note that the argument above (cf. (5.4)) requires to improve (2.31) into

rPt f

e;ct Pt jrf j

(cf. Bak1], Bak2], Le5], Le6] where the proof of Theorem 5.2 can be found). As we have seen in Theorem 2.6, such measures satisfy a Gaussian type isoperimetric inequality. It may be shown that this isoperimetric inequality actually implies the logarithmic Sobolev inequality (cf. Le5]). The proof of Theorems 5.1 and 5.2 further extends to Riemannian manifolds (X g) with a strictly positive lower bound c = c(X ) > 0 on the Ricci curvature. As for the rst non-trivial eigenvalue 1 = 1(X ) of the Laplace operator on a compact 132

Riemannian manifold (X g), we may de ne formally the logarithmic Sobolev constant of the normalized Riemannian measure on (X g) as the constant 0 = 0(X ) such that
0 Ent

(f 2 )

2 f (; f )d = 2 jrf j2 d

(5:5)

holds for all smooth enough functions f on (X g). By the Sobolev embedding theorem, it is known Rot1] that 0 > 0 on any compact manifold, and furthermore as above that 0 1 . The proof of Theorem 5.1 thus extends, as for Theorem 5.2, to show that
1 0

c(X ):

(5:6)

In analogy with the Lichnerowicz lower bound on 1 Cha1], GH-L], this may actually be improved B-E] together with the dimension n of X as nc(X ) : 1 0 n;1 In particular, Z 2 2 (5:7) Ent n (f ) n jrf j2 d n

on the standard n-sphere Sn, including n = 1. We refer to Bak1], Le6] for further details on these aspects and for proofs of (5.6) and (5.7). We now present the Herbst argument from a logarithmic Sobolev inequality to concentration. The principle is similar to the application of spectral properties to concentration presented in Section 3.1, but logarithmic Sobolev inequalities allows us to reach normal concentration. To start with, let us consider a probability measure on the Borel sets of Rn. Assume that satis es the logarithmic Sobolev inequality (5.1). We will then show that
2 E ( ) eC =2

where we recall that E is the Laplace functional of (Section n 1.6). Let thus R F be a smooth bounded Lipschitz function on R such that Fd = 0. In particular, since F is assumed to be 133

Setting ( ) = e F ;C kF kLip=2 d , 2 R, by the de nition of entropy, Z h i 2 kF k2 e F ;C 2 kF k2 Lip =2 d ; ( ) log ( ) F;C Lip 2 C 2kF k2 ( ): Lip 2
2 2

regular enough, we can have that jrF j kF kLip at every point. 2 2 Apply now (5.1) to f 2 = e F ;C kF kLip=2 for every 2 R. We have Z 2Z 2 Lip =2 d jrf j d = 4 jrF j2e F ;C 2kF k2 2Z F ;C 2 kF k2 Lip =2 d : e 4
R

In other words,

0( )

( ) log ( )

2 R:

If H ( ) = 1 log ( ) (with H (0) = 0 (0)= (0) = Fd = 0), 2 R, H0( ) 0 for every . Therefore, H is non-increasing and thus ( ) 1 for every , that is Z 2 2 e F d eC kF kLip=2: (5:8) Replacing F by a smooth convolution, (5.8) extends to all meanzero Lipschitz functions. In particular, by Proposition 1.14 every 1-Lipschitz function F on Rn is integrable with respect to and, for every r 0,
;

fF

Fd + rg

e;r2 =2C :

In particular

e;r2 =8C r > 0: The preceding argument actually extends to measures on arbitrary metric spaces provided a natural extension of the length
(R n )

134

of the gradient is chosen. Given a locally Lipschitz function f on a metric space (X d), de ne the length of the gradient of f at the point x 2 X as ) ; f (y)j : jrf j(x) = lim sup jf (x (5:9) d(x y) y!x This gradient satis es the chain rule

'0(f ) jrf j (5:10) for ' : R ! R smooth enough. Note that jrf j(x) kF kLip at any x. By Rademacher's theorem, a Lipschitz function F on Rn is almost everywhere di erentiable and krF k1 = kF kLip. Furthermore, on Rn, (5.1) extends to all locally Lipschitz functions. With a proof that just extends the case of Rn, the next result indicates that under a logarithmic Sobolev inequality, the underlying measure has normal concentration (Herbst's argument). Theorem 5.3. Let be a probability measure on (X d) such that for some C > 0 and all locally Lipschitz functions f on X ,

jr'(f )j

Ent

(f 2 )

2C jrf j2d :
R

Then, every 1-Lipschitz function F : X ! such that for every r 0,


;

is integrable and

fF

Fd + rg
e;r =8C
2

2 e;r =2C :

In particular,
(X d )(r)

r > 0:

The normal concentration produced by Theorem 5.3 is optimal as shown by the example of the Gaussian measure for which C = 1. We saw Corollary 3.2 that if a probability measure on (X d) satis es a Poincare inequality Var (f ) C jrf j2 d 135
Z

for all locally Lipschitz functions f on (X d), then has exponential concentration. Theorem 5.3 thus improves this result to normal concentration under a logarithmic Sobolev inequality. Therefore, as spectrum was used in Section 3.1 to recover the exponential isoperimetric concentration of Proposition 2.10, entropy and logarithmic Sobolev inequalities may be used to reach the Gaussian isoperimetric concentration of Theorem 2.6. On a Riemannian manifold, Theorem 5.3 yields the following consequence that has to be compared with Theorem 3.1 and that may be used to produce examples of normal Levy families in the sense of Section 3.2. Recall the logarithmic Sobolev constant 0 of (5.5). Corollary 5.4. Let (X g) be a compact Riemannian manifold with normalized Riemannian measure . Then,
(X g ) (r)
2 e; 0r =8

r>0

where 0 > 0 is the logarithmic Sobolev constant of the Laplace operator on (X g). By (5.6), Corollary 5.4 covers Theorem 2.4. It might be worthwhile noting that Corollary 5.4 together with (5.7) improves the numerical constant of Theorem 2.3 including the one-dimensional torus. To conclude this section, we brie y mention that both Poincare and logarithmic Sobolev inequalities are stable by perturbation by a bounded potential. This is the content of the next simple proposition. Proposition 5.5. Let be a probability measure on the Borel n n sets of R and let U be bounded on R . De ne d = Z ;1eU d where Z is the normalization factor. Then if satis es a Poincare or logarithmic Sobolev inequality with some constant C , then satis es the same inequality with the constant C e4kU k . Proof. Note rst that e;kU k Z ekU k . Let be a smooth convex function on some open interval I of the real line. Typically, (x) = x2 on R or (x) = x log x on R+. By convexity of , for
1 1 1

136

any probability measure and any smooth (bounded) f ,


Z

(f )d ;

fd

=t inf 2I

(f ) ;

(t) + (t ; f ) 0 (t) d

(5:11)

where the integrand on the right-hand side is non-negative. It immediately follows that
Z

(f )d ;

fd

e2kU k

(f )d ;

fd

Hence, if

Var (f ) C jrf j2d

taking (x) = x2 in (5.11), Var (f ) If

C e2kU k
Ent

Z
1

jrf j2 d
Z

C e4kU k

Z
1

jrf j2d :

(f 2 )
Z
1

2C jrf j2d

then, taking (x) = x log x in (5.11), Ent (f 2 ) 2C e2kU k

jrf j2 d

2C e4kU k

jrf j2d :

Proposition 5.5 is proved.

5.2 Product measures


One important feature of entropy is its product property. Together with the Herbst argument, this will give rise to a powerful tool to analyze dimension free concentration properties of product measures P = 1 n . To illustrate the usefulness of this tool in 137

product spaces, recall from Chapter 1 that the `1-metric cannot re ect dimension free concentration properties since the Lipschitz norm is not additive. We thus have to work at a higher level that motivates the interest into logarithmic Sobolev inequalities that deal (on Rn for example) with the energy
Z

jrf j2 dP

Z X n

which is clearly better adapted to product spaces than the Lipschitz coe cient kf kLip = krf k1. Assume thus we are given probability spaces ( i i i ), i = 1 : : : n. Denote by P the product probability measure P = 1 n on the product space X = 1 n equipped with the product - eld. A point x in X is denoted x = (x1 : : : xn ), xi 2 i, i = 1 : : : n. Given f on the product space, we write furthermore fi , i = 1 : : : n, for the function on i de ned by fi (xi ) = f (x1 : : : xi;1 xi xi+1 : : : xn) with x1 : : : xi;1 xi+1 : : : xn xed. The rst result is the additivity of entropy. Proposition 5.6. For every non-negative function f on the product space X , n Z X EntP (f ) Ent i (fi )dP:
i=1

i=1

(@i f )2 d

The statement is the same for variances, namely for any f on the product space, VarP (f )
n Z X i=1

Var i (fi )dP:

(5:12)

Proof. We only prove it for entropy, the proof for variance being entirely similar, and even simpler. For a non-negative function f on some probability space ( ),

Ent (f ) = sup

fgd

eg d

1 :

(5:13)

138

Indeed, assume by homogeneity that fd = 1. By Young's inequality uv u log u ; u + ev u 0 v 2 R R we get, for eg d 1,


Z

fgd

f log fd ; 1 +

eg d

f log fd :

The converse is obvious. R To prove Proposition 5.6, given g such that eg dP 1, set, for every i = 1 : : : n, R g(x ::: x ) 1 n 1 (x1 ) d i;1 (xi;1 ) gi(xi : : : xn ) = log eR eg(x1 ::: xd : n ) d 1 (x1 ) d i (xi ) Then g
Pn R (gi ) i i=1 g and e i d i = 1. Therefore, Z n Z X fgdP fgi dP i=1 n Z Z X = fi (gi)i d i dP i=1 n Z X Ent i (fi )dP i=1

which is the result. Proposition 5.6 is established.

Corollary 5.7. Let i on (Xi di) satis es the logarithmic Sobolev


inequality 2Ci jrif j2 d i for every locally Lipschitz function f on Xi , i = 1 : : : n, where jrif j is the generalized modulus of gradient in the i-th space Xi . Then the product measure P = 1 n on the product space X = X1 Xn satis es the logarithmic Sobolev inequality Ent
i

(f 2 )

EntP (f 2 )

2 dP 2 1max C jr f j i i n

139

for every locally Lipschitz function f on X where


X jrf j2 = jrif j2 : i=1

The same assertion holds for the Poincare inequality. In particular, if (Xi gi), i = 1 : : : n, are compact Riemannian manifolds, and if X is the Riemannian product of the Xi 's, (X ) 1 (X ) = 1min i n 1 i and (X ): 0 (X ) = 1min i n 0 i

The main consequence of Corollary 5.7 is that it yields, together with the Herbst argument, concentration results in product ; Pn 1=2 spaces with respect to the Euclidean metric i=1 d2 which i are independent of the dimension. Corollary 5.7 may be combined in applications with the perturbation result of Proposition 5.5. One odd feature is that this usually yields rather poor constants as functions of the dimension. The following is a simple consequence of Proposition 5.6 that bounds in a useful way the entropies along each coordinates. Corollary 5.8. For every f on the product space X = 1 n and every product probability measure P = 1 n, EntP (ef ) where, for i = 1 : : : n,
ZZ

n Z 1X fi 2 i=1 Ei (e )(x)dP (x)

Ei (efi )(x) =

ffi (xi ) fi (yi )g

fi (xi );fi (yi ) 2efi (xi)d i (xi )d i (yi ):

Proof. The proof is elementary. We may assume f bounded. By the product property of entropy (Proposition 5.6), it is enough to deal with the case n = 1. By Jensen's inequality,

EntP (ef )

f ef dP
140

ef dP

fdP:

The right-hand side of the latter may then be rewritten as 1 Z Z ;f (x) ; f (y) ;ef (x) ; ef (y) dP (x)dP (y) 2 ZZ ; ; = f (x) ; f (y) ef (x) ; ef (y) dP (x)dP (y):
ff (x) f (y)g

Since for u v (u ; v)(eu ; ev )


1 (u ; v )2 (eu + ev ) 2

(u ; v)2 eu

the conclusion easily follows. Corollary 5.8 is established. On the basis of these results, we rst illustrate the use of logarithmic Sobolev inequalities in the basic example of the Hamming metric. As a consequence of Corollary 5.8, if

N (f ) = sup
then

n Z X

x2X i=1

f (x) ; fi (yi ) 2d i (yi ) N (f )2


Z

1=2

EntP (ef )

ef dP:

(5:14)

If F is a 1-Lipschitz function with respect to the Hamming metric on the product space X then N (F )2 n. Now, the proof of Theorem 5.3 above immediately shows from (5.14) that if F is 1-Lipschitz and has mean zero, then for every 2 R,
Z

e F dP

en :
2

Therefore, for any product probability P on the product space X= 1 n equipped with the Hamming metric,
(X d P )(r)

e;r =16n
2

r > 0:

that should be compared once more to (1.24) (and (4.2)). In the last part of this section, we provide along the same lines an elementary approach to Corollary 4.9. The point is that 141

while the deviation inequalities have no reason to be tensorizable, they are actually consequences of a logarithmic Sobolev inequality, which only needs to be proved in dimension one. The main result in this direction is the following statement. Let 1 : : : n be arbitrary probability measures on 0 1] and let P be the product probability P = 1 n . We say that a function on n R is separately convex if it is convex in each coordinate. Recall that a convex function on R is continuous and almost everywhere di erentiable. Theorem 5.9. Let F be separately convex and 1-Lipschitz on n R . Then, for every product probability measure P on 0 1]n , and every r 0, ; R P fF FdP + rg e;r2 =4 :
Proof. The theorem will follow from the following stronger statement of possible independent interest. Proposition 5.10. For every separately convex function f and n every product probability P on R ,

EntP (ef )

ZZ X n

i=1

(xi ; yi )2 (@i f )2 (x) ef (x) dP (x)dP (y):

Proof. By Corollary 5.8, it is enough to show that for every i = 1 : : : n and every x,

Ei (efi )(x)

ZZ

(xi ; yi )2 fi 0 (xi )2 efi (xi)d i(xi )d i (yi ):

Now, since f is separately convex, for all x y 2 Rn, fi (xi ) ; fi (yi ) (xi ; yi )fi0 (xi): The proof is easily completed. Proposition 5.10 is established. We now complete the proof of Theorem 5.9 following the Herbst argument. Let F be a separately convex Lipschitz function such that kF kLip 1 by homogeneity. By Rademacher's theorem, jrF j 1 almost everywhere. Replacing F by a convolution 142

with a Gaussian kernel, we may actually suppose that jrF j 1 everywhere. Then, the argument is entirely similar to the proof of Theorem 5.3. Applying namely Proposition 5.10 to F ; 2, 0, we deduce that
2 EntP (e F ; )

jrF j2e F ; 2 dP

where we used that P is concentrated on 0 1]n. Therefore, for every 0, 0( ) ( ) log ( ) R where ( ) = e F ; 2 dP . The proof is completed as in Theorem 5.3. It should be pointed out that Theorem 5.9 does not produce deviation inequalities under the mean since we have to work with 0 to preserve convexity. The point is that Theorem 5.9 is stated for separately convex functions, whereas deviations under the mean seem to require the full convexity assumption. Proposition 5.10 is of particular use for norms of sums of independent random vectors and will be crucial in the investigation of sharp bounds in Section 7.3. The proposition puts forward the generalized gradient (in dimension one)

jrf j(x) =
of statistical interest.

(x ; y)2 f 0 (y)2 d

(y )

1=2

5.3 Modi ed logarithmic Sobolev inequalities


As we have seen in Section 4.5, products of the usual exponential distribution somewhat surprisingly satisfy a concentration property which, in some respect, is stronger than concentration for Gaussian measures. Our rst aim here will be to show that this result can be seen as a consequence of some appropriate logarithmic Sobolev inequality which we call modi ed. One of its main interest is that it tensorizes with two parameters on the gradient, 143

one on its supremum norm, and one on the usual quadratic norm. This feature is the appropriate explanation for the concentration property of the exponential measure. We rst investigate a modi ed logarithmic Sobolev inequality for the exponential measure. We then describe the product properties of modi ed logarithmic Sobolev inequalities and their applications to concentration. We then observe that all measures satisfying a Poincare inequality do satisfy the same modi ed inequality as the exponential distribution. Let us start back with the concentration property described by Theorem 4.16. Let n be the product measure on Rn when 1 e;jxj with each factor is endowed with the measure of density 2 respect to Lebesgue measure. Recall B2 the Euclidean unit ball and B1 the `1 unit ball in Rn. Then, for every Borel set A in Rn and every r > 0, ; p 1 ;r 1 ; n A + 6 rB2 + 9rB1 (5:15) n (A) e : As we have seen in Proposition 4.18, this concentration property on sets may be translated equivalently on functions. Let F be a real-valued function on Rn such that kF kLip a and Lipschitz coe cient b with respect to the `1-metric, that is

F (x) ; F (y)
Then, for every r 0,
n ;fF

n X i=1

jxi ; yi j x y 2 Rn:

1 min r r2 (5:16) exp ; K b a2 for some numerical constant K > 0 where M is a median or the mean of F for n. Our rst aim here will be to present an elementary proof of (5.16) based on logarithmic Sobolev inequalities. Following the Herbst argument in case of the exponential distribution would require to determine the appropriate logarithmic Sobolev inequality satis ed by n. We cannot hope for a classical logarithmic

M + rg

144

Sobolev inequality to hold simply because it would imply that linear functions have a Gaussian tail for n. To investigate logarithmic Sobolev inequalities for n, it is enough, by the fundamental product property of entropy, to deal with the dimension one. One rst inequality may be deduced from the Gaussian logarithmic Sobolev inequality. Given a smooth function f on R, apply 2 +y2 x Theorem 5.1 in dimension 2 to g(x y) = f ( 2 ). Letting ~ denote the one-sided exponential distribution with density e;x with respect to Lebesgue measure on R+, we get Ent ~ (f 2 ) 4 xf 0 (x)2d ~(x):
Z

If ~n denote the product measure on Rn + , we have similarly for n every smooth f on R+, Ent ~n (f 2 ) 4
Z X n

i=1

xi @i f (x) 2d ~n(x):

(5:17)

It does not seem however that this logarithmic Sobolev inequality (5.17) can yield (5.16), and thus the geometric concentration (5.15), via the Herbst argument. In a sense, this negative observation is compatible with the fact that (5.15) improves upon some aspects of the concentration for Gaussian measures as we have seen in Section 4.5. We thus have to look for some other version of the logarithmic Sobolev inequality for the exponential distribution. To this task, let us observe that, at the level of Poincare inequalities, there are two distinct inequalities. For simplicity, let us deal again only with n = 1. The rst one, in the spirit of (5.17), is Z Var ~ (f ) xf 0 (x)2d ~(x): (5:18) This may be shown, either from the Gaussian Poincare inequality as before, with however a worse constant, or by noting that the rst eigenvalue of the Laguerre generator with invariant measure ~ is 1. Actually, (5.17) is exactly the (optimal) logarithmic Sobolev inequality associated to the Laguerre generator. By the way, that 145

4 is the best constant in (5.17) is an easy consequence of our arguments. Namely, if (5.17) holds with a constant C < 4, a function F on RR+ 2such that xF 0 (x)2 1 almost everywhere would be such that eF =4 d ~1 < 1 by Theoremp 5.3 (together with Proposition 1.9). But the example of F (x) = 2 x contradicts this consequence (cf. Ko-S]). A second Poincare inequality states that Var ~ (f ) 4 f 0 2 d ~:
Z

(5:19)

Inequalities (5.18) and (5.19) are not directly comparable and, in a sense, we are looking for an analogue of (5.19) for entropy. To introduce this result, let us rst recall the proof of (5.19). We will work with the double exponential distribution . It is plain that all the results hold, with the obvious modi cations, for the one-sided exponential distribution ~. We work below with smooth functions. (One may consider for example the space of all continuous everywhere di erentiable functions f : Rn ! R R almost R such that jf jd n < 1, jrf jd n < 1 and lim1 e;jxij f (x1 : : : xi : : : xn ) = 0 xi ! for every i = 1 : : : n and x1 : : : xi;1 xi+1 : : : xn 2 R.) The main argument of the proof is the following simple observation. If ' is smooth on R, by the integration by parts formula,
Z Z

'd = '(0) + sgn(x)'0 (x)d (x):


Var (f ) 4 f 0 2 d :
Z

(5:20)

Lemma 5.11. For every smooth fZon R,

Proof. Set g(x) = f (x) ; f (0). Then, by (5.20) and the CauchySchwarz inequality,
Z

g2d

= 2 sgn(x)g0 (x)g(x)d (x) 2


Z

g0 2d )
146

1=2 Z

g2d

1=2

Since Var (f ) = Var (g) g2d , and g0 = f 0 , the lemma follows. We turn to the corresponding inequality for entropy and the main result of this section. Theorem 5.12. For every 0 < < 1 and every Lipschitz function f on R such that jf 0 j < 1 almost everywhere, Z ; f 2 0 2 ef d : Ent e f 1; Note that Theorem 5.12, when applied to functions "f as " ! 0, implies Lemma 5.11. Theorem 5.12 is the rst example of what we will call a modi ed logarithmic Sobolev inequality. Theorem 5.12 is basically only used below for some xed valued 1. of , for example = 2 Proof. Changing f into f + const we may assume that f (0) = 0. Since u log u u ; 1 u 0 we have Z ; f ; f Ent e f e ; ef + 1 d : Since jf 0 j < 1 almost everywhere, the functions ef , f ef and 2 f f e are smooth in our preceding sense. Therefore, by repeated use of (5.20),
Z ;

f ef

; ef + 1

d = sgn(x)f 0 (x)f (x)ef (x) d (x)


Z

and

f 2 ef d

= 2 sgn(x)f 0 (x)f (x)ef (x) d (x) + sgn(x)f 0 (x)f (x)2 ef (x)d (x):
Z

By the Cauchy-Schwarz inequality and the assumption on f 0 ,


Z

f 2 ef d

f 0 2 ef d

1=2 Z

f 2 ef d

1=2

f 2 ef d

147

so that

f 0 2 ef d : 1; Now, by the Cauchy-Schwarz inequality again, f 2 ef d


Z

2Z

Ent ef

sgn(x)f 0 (x)f (x)ef (x) d (x)


Z

f 0 2ef d
Z

1=2 Z

f 2 ef d

1=2

1; which is the result. Theorem 5.12 is proved. We are now ready to describe the application to concentration. As a consequence of Theorem 5.12 and of the product property of entropy (Proposition 5.10), for every smooth enough function F on Rn such that max1 i n j@i F j 1 almost everywhere and every , j j < 1, Ent n e F
;

f 0 2ef d

2 2 1;

Z X n

i=1

(@i F )2e F d :

(5:21)

Let us take for simplicity = 1 2 (although < 1 might improveP some numerical constants below). Assume now moreover 2 a2 almost everywhere. Then, by (5.21), that n i=1 (@i F ) Ent n (e F ) 4a2 2
Z

e Fd n

for every j j R 1 argue as in the proof of Theorem 5.3. 2 . We 2now 2 1 . The preceding inequality F ; 4 a Let ( ) = e d n, j j 2 then expresses that 0( ) ( ) log ( ) j j 1 2: Integrating this di erential inequality shows that
Z

e Fd n

e Fd 148

n +4a2

that only holds for j j 1 2 . By Chebyshev's exponential inequality, n ;fF R Fd n + rg e; r+4a2 2 1 if and minimizing over (take = r=8a2 if r 4a2 and = 2 r 4a2 ) shows that, for every r 0,
n ;fF
R

Fd

n + rg

2 r 1 exp ; 4 min r 4a2

By homogeneity, this inequality amounts to (5.16) (with K = 16) and our claim is proved. As already mentioned, we have a similar result for the one-sided exponential measure. The inequality put forward in Theorem 5.12 for the exponential measure is an example of what can be called a modi ed logarithmic Sobolev inequalities. In order to describe this notion in some generality, we set the following de nition. It allows us to describe various types of concentration behavior by the appropriate functional logarithmic Sobolev inequality. We take again the framework of a metric space (X d) with the generalized length of a gradient jrf j. We say that a probability measure on the Borel sets of a metric space (X d) satis es a modi ed logarithmic Sobolev inequality if there is a function ( ) 0 on R+ such that, whenever krf k1 , Ent (ef )
R

( ) jrf j2 ef d

(5:22)

for all f 's such that ef d < 1. According to Theorem 5.12, the exponential measure on the line satis es a modi ed logarithmic Sobolev inequality with respect to the usual gradient with ( ) bounded for the small values of . On the other hand, the canonical Gaussian measure 1, satis es a modi ed logarithmic Sobolev inequality with ( ) = 2 0. The de nition (5.22) implies that Ent (ef )
2

( ) ef d

149

for every f with krf k1 . In particular, if ( ) is bounded for the small values of , Lipschitz functions will have an exponential tail. The main new feature here is that the modi ed logarithmic Sobolev inequalities tensorize in terms of two parameters rather than only the Lipschitz bound. This property is summarized in the next proposition which is an elementary consequence of the product property of entropy (Proposition 5.10). It allows new concentration behaviors. Let 1 : : : n be probability measures on respective metric spaces (Xi di ), i = 1 : : : n, and let P = 1 n be the product probability on the product space X = X1 Xn. If f is a function on the product space, for each i, fi is the function f depending on the i-th variable with the other coordinates xed. As in Corollary 5.7, we denote by jrif j the generalized modulus of gradient in the i-th space Xi . Proposition 5.13. Assume that for every f on Xi such that krif k1 , Ent i (ef ) ( ) jrif j2 ef d i
Z

i = 1 : : : n. Then, for every f on the product space such that max1 i n krifi k1 ,
EntP (ef )
P

( ) jrf j2ef dP

2 where jrf j2 = n i=1 jri f j . According to the behavior of ( ), this proposition yields concentration properties in terms of two parameters,

n X i=1

jrif j2

and

1 i n

max krif k1:

For example, if ( ) c for 0 0 , the product measure P will satisfy the same concentration inequality as the one for 150

the exponential measure (5.16) In the next section, we investigate cases such as ( ) ced , 0, related to the Poisson measure. We now observe that the concentration properties of the exponential measure are actually shared by all measures satisfying a Poincare inequality. More precisely, every such measure satis es the modi ed logarithmic Sobolev inequality of Theorem 5.12. Let thus jrf j be a generalized modulus of gradient (5.10) on a metric space (X d), satisfying thus the chain rule formula (5.11). Let be a probability measure on the Borel - eld of X such that for some C > 0 and all locally Lipschitz functions f , Var (f ) C jrf j2 d :
Z

(5:23)

We already know from Corollary 3.2 that such a Poincare inequality implies exponential integrability of Lipschitz functions. We now show, without proof, that it also implies a modi ed logarithmic Sobolev inequality which yields, as just discussed, concentration properties for the product measures n similar to those of the products of the exponential measure. We refer to Bo-L1] for a proof of Theorem 5.14 below. Theorem 5.14. Let on (X d) satisfying the Poincare inequality (5.23). < p Then, for any function f on X such that krf k1 2= C , Z ; f Ent e ( ) jrf j2ef d where
2 p 2 + C C p e 5C : ( )= 2 2; C

Note that ( ) is uniformly bounded forpthe small values of , for example b( ) 3e5 C=2 when 1= C . As a corollary, we obtain, following the preceding discussion, a concentration inequality of exponential type for the product measure n of on X n. 151

Corollary 5.15. Let satisfying (5.23) and denote by


n X i=1

product of on X n . Then, every function F on X n such that

n the

jriF j2 a2 and 1max jr F j b i n i

every r 0,

n -almost everywhere is integrable with respect to n , and for n ;fF


R

Fd n + rg

1 min r r2 exp ; K b a2

where K > 0 only depends on the constant C in the Poincare inequality (5.23). One may obtain a similar statement for products of possibly di erent measures with a uniform lower bound on the constants in the Poincare inequalities (5.23). Corollary 5.15 may be turned into an inequality on sets such 1 , for every r 0 and some as (5.15). More precisely, if n(A) 2 numerical constant K > 0,
n ;fF h A

rg

e;r=K

where h(x y) = min(d(x y) d(x y)2 ), x y 2 X , and, for x = (x1 : : : xn) 2 X n and A X n,
n X h FA (x) = ainf h(x a ): 2A i=1 i i

An important feature of the constant ( ) of Theorem 5.14 is that ( ) ! C=2 as ! 0. In particular, the modi ed logarithmic Sobolev inequality of Theorem 5.14 also implies the Poincare inequality (5.23) by applying it to functions "f with " ! 0. Poincare inequality and the modi ed logarithmic Sobolev inequality of Theorem 5.14 thus appear as formally equivalent. 152

5.4 Discrete settings


In this section, we brie y describe how the entropic method may be developed similarly in discrete structures for which the chain rule on gradients is not available. We then discuss more simply the examples of uniform measure on the discrete cube f0 1gn and of standard Poisson measure. We take again the discrete setting of the end of Section 3.1 where exponential concentration for Markov chains and graphs under spectral properties was investigated. Let thus X be a nite or countable set. Let K (x y) 0 satisfy
X

for every x 2 X . Asssume furthermore that there is a symmetric invariant probability measure on X , that is K (x y) (x) is symP metric in x and y and x K (x y) (x) = (y) for every y 2 X . In other words, (K ) is a symmetric Markov chain. De ne 1 X f (x) ; f (y) 2K (x y) ;fxg : E (f f ) = 2 x y 2X Set 1 jjjf jjj2 1 = 2 sup
X

y2X

K (x y) = 1

We already mention that the jjj jjj1-norm tries to be as close as possible to the sup-norm of a gradient in a continuous setting. The next statement is thus the analogue of Theorem 5.3 in this setting, and the logarithmic Sobolev version of the Poincare Theorem 3.3. Theorem 5.16. Let (K ) be a symmetric Markov chain on X as before, and assume that for some constant C > 0 and all f 's on X , Ent (f 2 ) 2C E (f f ): Then, whenever jjjF jjj1 1, F is integrable with respect to and for every r 0,
;

x2X y2X

f (x) ; f (y) 2 K (x y):

fF

Fd + rg
153

e;r2 =4C :

Proof. We apply the logarithmic Sobolev inequality to f 2 = e F , 2 R. Recall from (3.4) that

e F=2 e F=2

jjjF jjj2
Z

e Fd :

Therefore, for every 2 R, Ent (e F )

C e Fd

from which the proof is completed as in Theorem 5.3. The results follows. While the norm jjj jjj1 might appear as the proper generalization of the Lipschitz norm on X , it does not always re ect accurately discrete situations. Discrete gradients may also be examined in another way. For example, if f is a real-valued function on Z, set Df (x) = f (x + 1) ; f (x) x 2 Z: (5:24) One may then consider as Lipschitz norm supx2Z d jDf (x)j that will prove below more adapted to a number of cases, such as for example Poisson measures. The lack of chain rule (for example, jD(ef )j jDf jejDf j ef only in general) will then have to be handled by other means, but will also determine the best that can be expected under some appropriate logarithmic Sobolev inequality. This is the subject we turn to now. Actually, the norm jjj jjj1 is in fact just adapted to produce Gaussian bounds and, as such, is not suited to a number of discrete examples. d. The preceding example may be further generalized to Z Similarly, in thed context of statistical mechanics, we may consider X = f;1 +1gZ and let

Df (!) =

k2Z d

@k f (!)

1=2

where @k f (!) = f (!k ) ; f (!) where !k is the element of X obtained from ! by replacing the k-th coordinate with ;!k . 154

As a starting point of our investigation, consider the logarithmic Sobolev inequality for uniform measure = n on f0 1gn expressed by n Z X 1 2 jDi f j2 d (5:25) Ent (f ) 2 for any function f
i=1 n on f0 1g where
;

Di f (x) = f (x) ; f si (x)

x = (x1 : : : xn) 2 f0 1gn

with si (x) = (x1 : : : xi;1 1 ; xi xi+1 : : : xn). The gradients Di are thus (5.25) on the two-point space. By the product property of entropy, this inequality need only be proven in dimension one, for which it amounts to the inequality
2 v2 u2 log u2 + v2 log v2 ; (u2 + v2) log u + (u ; v)2 2 for any reals u v (cf. Gros1]). By means of the central limit theorem, it may be used to prove the logarithmic Sobolev inequality for Gaussian measures. This logarithmic Sobolev inequality may be used to recover concentration with respect to the Hamming metric on f0 1gn. Namely, let F be Lipschitz with respect to the Hamming metric on the discrete cube with kF kLip 1. Following the Herbst argument, apply (5.25) to f 2 = e F , 2 R. Although the Di 's do not satisfy the chain rule formula, since jF (x) ; F (si (x))j 1, it is easily seen that

Di (e

F=2 )(x) = e F (x)=2 ; e F (si (x))=2

j je
R

F (x)=2

for every j j 1. Hence, setting as usual ( ) = e F ;(n 2=2)d , 2 R, 0( ) ( ) log ( ) j j 1: We integrate this di erential inequality as in Section 5.1 to get that Z R 2 F e d e Fd +(n =2) 155

for every j j 1. By Chebyshev's inequality,


;

fF

Fd + rg

e;r2 =2n

(5:26)

however only for 0 R r n. But since kF kLip 1 for the Hamming metric, jF ; Fd j n so that (5.26) actually holds true for every r 0. It might be further observe that is F is convex on 0 1]n, then

Di F (x) 2
Z

@i F (x) 2 + @i F si (x)

where @i is partial derivative of F along the i-th coordinate. Then Ent (e F )

r(e F=2 ) 2d

2Z

jrF j2e F d :

Together with the Herbst argument, we thus recover in this way Corollary 4.9 with improved constants but only for uniform measure on f0 1gn. Let now, for 0 p 1, p = n p the product measure on n f0 1g when each factor is endowed with the Bernoulli measure p 1 + q 0 , q = 1 ; p, with probability of success p. Then the constant in the logarithmic Sobolev inequality (5.25) for p does not behave nicely as a function of p. Namely (see SC], An]), Ent where (f 2 )

Cp

n Z X i=1

jDi f j2 d

(5:27)

In particular, Cp ! +1 as p ! 0 or 1. This prevents to use (5.27) in any limit theorem as p ! 0, such as Poissonian limits. There is of course a good reason for that, namely that Poisson measures do not satisfy standard logarithmic Sobolev inequalities with respect to the gradient D of (5.24). Denote indeed by the 156

; log q : Cp = pq log p p;q

Poisson measure on N with parameter > 0 and let us assume that, for some constant C > 0, and all f , say bounded, on N, Ent (f 2 )

C jDf j2 d

(5:28)

where we recall that Df (x) = f (x +1) ; f (x), x 2 N. Apply (5.28) to the indicator function of the interval k + 1 1) for each k 2 N. We get

k + 1 1) log

k + 1 1)

fkg

which is clearly impossible as k goes to in nity. There is howewer an alternate version of the logarithmic Sobolev inequality on the two-point space for which Poissonian limits may be performed. Recall p = n p. Lemma 5.17. For any non-negative function f on f;1 +1gn, Ent p (f ) pq
Z X n

i=1

Di fDi (log f )d p :

When the gradients Di satisfy the chain rule formula, the form of the logarithmic Sobolev inequality of Lemma 5.17 is equivalent to the standard form by integration by parts. For example, the logarithmic Sobolev inequality for the canonical Gaussian measure on Rn is equivalent to say that for any smooth nonnegative function f on Rn, Z 1 Ent (f ) 2 hrf r(log f )id : The proof of Lemma 5.17 reduces as usual to dimension one, in which case it amounts to establish the inequality

pu log u + qv log v ; (pu + qv) log(pu + qv) pq(u ; v)(log u ; log v)


for any u v 0 (see Wu], An]) 157

Due to constant pq in the logarithmic Sobolev inequality of Lemma 5.17, we may perform a Poisson limit as p ! 0. Take ' on N such that 0 < " ' 1=" < 1 and apply Lemma 5.17 to f (x) = f (x1 : : : xn) = '(x1 + + xn) x = (x1 : : : xn ) 2 f0 1gn with p = n , > 0 (for every n large enough). Then, setting Sn = x1 + + xn,
n X i=1

DifDi (log f )

= (n ; Sn) '(Sn + 1) ; '(Sn) log '(Sn + 1) ; log '(Sn) + Sn '(Sn) ; '(Sn ; 1) log '(Sn) ; log '(Sn ; 1) : The distribution of Sn under n=n converges to . Using that 0 < " ' 1=" < 1, it follows from the preceding that

D'D(log ')d : n 1 ; n i=1 We may therefore state the following corollary. Corollary 5.18. For any non-negative function f on N
Ent (f )
Z

Z X n

Di fDi (log f )d n=n !

DfD(log f )d

where we recall that here Df (x) = f (x + 1) ; f (x), x 2 N. The example of f (x) = e;cx, x 2 N, as c ! 1 shows that one cannot expect a better factor of in the preceding corollary. To conclude this section, we deduce concentration inequalities for measures on N, for simplicity, satisfying a logarithmic Sobolev inequality such as the Poisson measure . Corollary 5.19. Let be a probability measure on N such that for some constant C > 0 and all non-negative functions f on N, Ent (f ) C
Z

DfD(log f )d :

158

Then,R if F is such that jF (x + 1) ; F (x)j 1 for all x 2 N, we have jF jd < 1 and, for every r 0, ; R r log 1 + r : fF Fd + rg exp ; 4 2C The numerical constant in Corollary 5.19 has no reason to be sharp. However, the order of growth is optimal as shown by the example of Poisson measure itself. More precisely, the tail of the Lipschitz function F is Gaussian for the small values of r and Poissonian for the large values (with respect to C ). Proof. It follows the common pattern of proofs in this chapter. We apply the logarithmic Sobolev inequality of Corollary 5.19 to f = e F with F say bounded to start with. The gradient D is not local, however, since jF (x + 1) ; F (x)j 1 for all x 2 N, for every 0, ; ; F (x + 1) ; F (x) e F (x+1) ; e F (x) e e F (x ) : Hence, by the usual notation, Ent (e F ) e ( ): Integration of this di erential inequality together with Chebyschev's inequality easily yields the result. Corollary 5.19 is established. According to this proof, the preceding logarithmic Sobolev inequalities are part of the family of modi ed logarithmic Sobolev inequalities investigated in Section 5.3, with a function ( ) of the order of e2 , 0. Following Proposition 5.13 it may be tensorized in terms of two distinct norms on the gradients. The following statement is then an easy consequence of this observation. It extends to Lipschitz functions known inequalities for sums of independent Bernoulli or Poisson random variables. We let (e1 : : : en ) be the canonical basis of Rn. Corollary 5.20. Let be some measure on N. Assume that for every f on N with supx2N jDf (x)j , Ent (ef ) ( ) jDf j2 ef d 159
Z

where, as function of

0, B( ) c ed for some c d > 0. Denote by n the product measure on Nn. Let F be a function on Nn such that, for every x 2 Nn,
n X i=1

F x + ei ; F (x) 2 a2 and 1max F x + ei ; F (x) i n :

b:

R Then jF jd n < 1 and, for every r 0, bdr n ;fF R Fd n + rg log 1 + exp ; 2r db 4ca2

5.5 Covariance identities


In this last section, we consider yet another approach to some of the concentration properties we described in this chapter and the preceding ones based on some covariance identities. We only describe the basic principle in the Gaussian case, referreing to B-G-H], Hou], Ho-P] for extensions to Poisson and in nitively divisible measures. Recall the Ornstein-Uhlenbeck semigroup (5.3)
; Ptf (x) = f e;t x + (1 ; e;2t )1=2 y d (y) t 0 x 2 Rn Z

with symmetric and invariant measure the canonical Gaussian measure on Rn. Let f and g be smooth functions on Rn . Usd Pt g = LPt g where L is the second order di erential ing that dt operator ;hx ri, we may write as in Sections 2.3 and 5.1 that Cov (f g) =
Z

=; =

f g ; gd d d
1Z

0Z Z 1 0

f LPt gd dt

hrf rPtgid dt:

160

Coming back to the expression of Ptg, Cov (f g) =


Z

1Z Z

hrf (x) rg e;t x + (1 ; e;2t)1=2 y id (x)d (y) dt:


Rn

Therefore, if t is the Gaussian measure on Rn image measure of by the map


; (x y) 7! x e;t x + (1 ; e;2t )1=2 y

which is the

R and if denotes the probability measure 01 t dt, we get that

Cov (f g) =

ZZ

hrf (x) rg(y)id (x y):

(5:29)

It is of course assumed in the preceding argument all the regularity and integrability on f and g to justify di erentiation and the use of Fubini's theorem. The representation formula (5.29) (that may be shown to hold in further situations) is a simple tool to reach measure concentration. Proposition 5.21. For every smooth enough functions f and g on Rn such that f is 1-Lispchitz, Cov (f g)
Z

jrgjd :

Now, Proposition 5.21 may be used as logarithmic Sobolev inequalities to produce measure concentration with a modi cation of the Herbst argument. Namely, given F 1-Lipschitz and with mean-zero with respect to , apply Proposition 5.21 to f = F and g = e F , 0. Then
Z

F e F d = Cov (F e F )

jrF j e F d

e Fd :

161

For the function J ( ) = log e F d , 0, we thus have the 0 di erential inequality J( ) . Since J (0) = 0, we conclude 2 that J ( ) 2 . Hence
Z

e Fd

2 e =2:

We thus recover from the covariance identity the concentration properties of Gaussian measures. This type of argument may be on one side used to sharpen some of the deviation inequalities for Gaussian measures, on the other side extended for various kind of distributions including binomial, Poisson and in nitively divisible laws. We refer to B-GH], H-PA-S], Hou] and Ho-P].

Notes and Remarks


Logarithmic Sobolev inequalities were introduced by L. Gross Gros1] as the in nimetisal version of hypercontractivity in quantum elds theory (see Gros2] and the references therein). They soon became a tool of fundamental importance in in nite dimensional analysis, with increasing activity Gros2], both on the side of lattice spin systems and Gibbs measures in statistical mechanics ( Str1], G-Zeg], Roy] etc), and of path and loop spaces in in nite dimensional stochastic analysis ( Hs]). Applications of logarithmic Sobolev inequalities to convergence to equilibrium of nite state Markov chain are presented in D-SC], SC]. A pedestrian introduction to logarithmic Sobolev inequalities may be found in An]. One of the early questions on logarithmic Sobolev inequalities was to determine conditions on measures on Rn to satisfy a logarithmic Sobolev inequality Ent (f 2 ) 2C
Z

jrf j2d

for all smooth enough functions f . To this question raised by L. Gross, I. Herbst (in an unpublished letter to L. Gross) found as 162

necessary condition the fact that


Z
2 e jxj d (x) < 1

for some > 0 small enough. Herbst's argument is presented in Da-S] with however a small error in the argument. This was settled in the paper A-M-S] that revived the interest in Gross's question and that gave rise to several subsequent contributions A-S], Le2], G-R], Rot2] etc. In particular, Herbst's argument as a Laplace bound of concentration spirit is exposed in Le2] and is presented this way as Theorem 5.3. The Herbst argument motivated the investigation Le4] of concentration inequalities for product measures by the entropic method. Most of the results of Section 5.2 are taken from this reference. Some sharper numerical constants may be found in Le4], Bob1], Bob2], Mas1] etc. Modi ed logarithmic Sobolev inequalities and their application to the concentration properties of products of the exponential measure have been investigated in Bo-L1] from which the results of Section 5.3 are taken. The corresponding results for product of Markoc chains are discussed in H-T]. Concentration between Poincare and logarithmic Sobolev inequalities with application to families of log-concave measures is investigated in L-O] (see also Bar]) through a family of functional inequalities that interpolate between Poincare and Sobolev (and that do tensorize similarly). The product measures n p , 1 p 2, of Section 4.5 are the prototype of measures satisfying such an interpolated inequality. The discrete setting is studied in a number of references including A-S], G-R] etc. The results of Section 5.4 are taken from Bo-L2] where Poissonian bounds are analyzed. See also Wu] and the references on Section 5.5. Most of the results of this chapter are taken from the survey Le5] where the reader will nd further details and references. It must also be emphasized that a great deal of the intense activity on logarithmic Sobolev inequalities in the recent years has been dealing with logarithmic Sobolev inequalities for lattice spin systems and Gibbs states in statistical mechanics, in particular 163

with the works of D. Stroock and B. Zegarlinski (cf. Str1], GZeg], Roy]...) . These are examples of dimension free logarithmic Sobolev inequalities which are highly non-product measures. Together with the Herbst argument, these results thus yield new concentration properties far away from product spaces. Another direction to non-product measures will be alluded to in Section 6.3 of the next chapter. The result and method of Section 5.5 are taken from the works B-G-H], H-PA-S], Hou] and Ho-P] to which we refer for further developments. In particular, it is shown in B-G-H] that the method can be improved to yield sharper bounds.

164

6. TRANSPORTATION COST INEQUALITIES

In this chapter, we investigate a third description of concentration. After the geometric description of the concentration function itself, and the Laplace bounds related to logarithmic Sobolev inequalities, we turn to a dual formulation of the latter in terms of distances between measures. This approach was put forward by K. Marton. While equally suited to product measures, it seems a convenient tool to investigate concentration in some dependent structures such as contractive Markov chains. After describing information inequalities and their relation to concentration, we study quadratic transportation cost inequalities. While these are actually consequences of logarithmic Sobolev inequalities, their have interest in their own. In the last section, we present proofs of some of the geometric inequalities of Sections 4.2 and 4.3 with the transportation cost approach, and discuss some extensions to non-product measures.

6.1 Information inequalities and concentration


To introduce to the topic of this chapter, let us start with the classical Pinsker-Csizsar-Kullback inequality (cf. Pin]) that indicates that whenever and are two probability measures on the Borel sets of a metric space (X d), then r 1 H; j : k ; kTV (6:1) 2 Here k kTV denotes the total variation distance, whereas H( j ) 165

is the relative entropy of with respect to de ned by H j


;

d = Ent d

d d = log d

whenever is absolutely continuous with respect to with Radond , and +1 if not. Inequalities such as (6.1) Nikodym derivative d have been often considered in information theory. That such an inequality is related to concentration properties may be shown in the following way. Given a metric space (X d) and two Borel probability measures and on X , consider the Wasserstein distance between and W1 ( ) = inf
ZZ

d(x y)d (x y)

where the in mum runs over all probability measures on the product space X X with marginals and having a nite rst moment. The total variation distance corresponds to the trivial metric. Given , consider then the inequality W1 ( )
q

2C H j

(6:2)

for some C > 0 and every . Let then A and B with (A) (B) > 0, and consider the conditional probabilities A = ( jA) and B = ( jB ). By the triangle inequality for W1 and (6.2), W1( A B ) W1( A ) + W1 ( B ) q q ; ; 2C H A j + 2C H B j s s 1 = 2C log (A) + 2C log (1 B) : (6:3)

Now, all measures with marginals A and B must be supported on A B, so that, by de nition of W1 , W1 ( A B ) d(A B) = inf d(x y) x 2 A y 2 B : 166

Then (6.3) implies a concentration inequality. Given A and B in X such that d(A B) r > 0, we get
s

1 + 2C log (1 2 C log A) 1 ; (Ar )

(6:4)

where we recall that Ar = fx 2 X d(x A) < rg. Inequality (6.4) appears as a symmetric form of concentration. If, say, (A) 1 2 and B is the complement of Ar for r > 0,

2C log 2 + 2C log 1 ; 1(A ) r

so that, whenever r 2 2C log 2 for example, 1 ; (Ar ) e;r2 =8C : It actually turns out that the transportation cost inequality (6.2) is equivalent to the normal Laplace bound deduced in the preceding chapter from logarithmic Sobolev inequalities, and may be thought of as its dual version. We namely have the following result. Recall the Laplace functional (Section 1.6) of (X d ), E(X d )( ) = sup e F d
Z

where the supremum runs over all 1-Lipschitz mean-zero functions F. Proposition 6.1. Let be a Borel probability measure on a metric space (X d). Then W1 ( )
q

2C H j 0:

(6:5) (6:6)

for some C > 0 and all if and only if E(X d )( ) eC 2=2 167

Proof. By the Monge-Kantorovitch-Rubinstein dual characterization (cf. Du], Ra2]) of the Wasserstein distance,

W1 (

) = sup

gd ; fd

where the supremum is running over all bounded functions f and g such that g(x) f (y) + jx ; yj for every x y 2 Rn. Under (6.5),
Z

gd ; fd

d 2C Ent d

or, equivalently, for every > 0, Z Z d : gd ; fd C2 + 1 Ent d d . The preceding indicates that Set ' = d
Z

'd

Ent (')

where = g ; fd ; C 2=2. Since this inequality holds for R every R choice of ' (i.e. ), applying it to ' = e = e d yields log e d 0. In other words,
Z

e Fd

2 e Fd +C =2 :

When F is Lipschitz with kF kLip 1, one may choose F = g = f so that the latter amounts to (6.6). Since Ent (') = sup ' d where the supremum is running over all 's such that e d 1, the preceding argument clearly indicates that (6.6) is actually equivalent to (6.5). The proof of Proposition 6.1 is complete. 168
R Z

The same may be proved on the alternate, more classical and easily equivalent, characterization of W1 as W1 ( ) = sup
Z

Fd ; Fd

where the supremum is running over all 1-Lipschitz functions F .

6.2 Quadratic transportation cost inequalities


Transportation cost do share tensorization properties as logarithmic Sobolev inequalities and may be used, for some appropriate cost functions, to describe dimension free concentration properties. Consider namely metric spaces (Xi di ), i = 1 : : : n, equipped with probability measures i , i = 1 : : : n. Let P denote the product measure P = 1 n on the product space X = X1 Xn . By the product property of Laplace functionals (Proposition 1.15) and Proposition 6.1, if each measure i satises a transportation cost inequality W1( i i)
q

2C H i j i

for all i on Xi , i = 1 : : : n, then the product measure P satis es the transportation cost inequality W1 (P R)
q

2CnH R j P

for every R on the cartesian product space equipped with the `1-metric. As discussed earlier, the drawback of the `1-metric is that it highly takes into account the number of coordinates in the product space. It however allows us to recover once more concentration with respect to the Hamming metric. Indeed, starting from the Pinsker-Csizsar-Kullback inequality (6.1), for any product probability measure P on X equipped with the Hamming metric, r n H;R j P W1(P R) 2 169

which produces concentration. Following the logarithmic Sobolev approach of Chapter 5, it is however more fruitful in order to reach dimension free concentration properties to think in terms of a quadratic cost. To this task, let us restrict ourselves to the case of Rn with the Euclidean norm j j. Given a probability measure on the Borel sets of Rn, say that it satis es a quadratic transportation cost inequality whenever there exists a constant C > 0 such that for all probability measures , q ; W2( ) CH j : (6:7) Here W2 is the Wasserstein distance with quadratic cost ZZ 1 jx ; yj2 d (x y) 2 W2 ( ) = inf 2 where the in mum is running over all probability measures on Rn Rn with respective marginals and . (The in mum in W2 is nite as soon as and have nite second moment which we shall always assume.) It is clear by Jensen's inequality that the quadratic transportation cost inequality is stronger than the transportation cost inequality considered in Section 6.1 with the distance W1 . As such, it implies concentration properties. Since the cost in (6.7) is given by n 1 jxj2 = 1 X c(x) = 2 x2 x = (x1 : : : xn) 2 Rn 2 i=1 i

it is plain that a quadratic transportation cost inequality will share nice tensorization properties. In order to establish these, it is convenient to rst prove a dual version of the quadratic transportation cost inequalities along the lines of Proposition 6.1 above. This analogue is moreover a very useful tool to establish quadratic transportation cost inequalities. The general form of the dual Monge-Kantorovitch-Rubinstein representation indicates that (cf. Ra1], Ra2]) inf
ZZ

c(x y)d (x y) = sup


170

gd ; fd

(6:8)

where the in mum is running over all probability measures with marginals and such that c is integrable with respect to and where the supremum is over all pairs (g f ) of bounded functions (or respectively and -integrable) such that for all x y, g(x) f (y) + c(x y): Here c is upper semicontinuous, -integrable and such that T (x y) a(x) + b(y) for some measurable functions a and b. On Rn, the supremum on the right-hand side of (6.8) may be taken over smaller classes of smooth functions, such as bounded Lipschitz or so on. In particular therefore, W2 ( )2 = sup
Z

gd ; fd

where the supremum is over all pairs (g f ) of bounded functions (or respectively and -integrable) such that for all x y 2 Rn, 2 g(x) f (y) + 1 2 jx ; y j : On the basis of this description, it is a mere exercise to repeat the proof of Proposition 6.1 to conclude to the following. Proposition 6.2. Let be a Borel probability measure on Rn. Then q ; W2 ( ) CH j (6:9) for some C > 0 and all if and only if for all measurable bounded functions f on Rn,
Z

eQc f=C d

e fd =C :

(6:10)

where Qcf is the in mum-convolution of f with the quadratic cost 2 c(x) = 1 2 jxj , that is i h 1 2 Qc f (x) = yinf f ( y ) + j x ; y j x 2 Rn : n 2R 2 171

For the matter of comparison with Proposition 6.1, observe that whenever F is Lipschitz, 1 kF k2 : Qc F F ; 2 Lip The in mum-convolution inequality (6.10) of Proposition 6.2 has of course to be compared with the one introduced in Section 1.6 that reads Z Z Q f c e d e;f d 1: (6:11) e fd , (6.11) is stronger than (6.10). As we have seen in Sections 1.6 and 6.1 they both produce measure concentration for with respect to the cost c. Proposition 6.2 is actually a bridge between the methods developed in Section 1.6 (and Sections 4.4 and 4.5) and the transportation cost inequalities. As announced, it may be used for example to yield a very simple proof of the product property of quadratic transportation cost inequalities along the lines of Proposition 1.19. Propositionn6.3. Let P = 1 n be a product probability measure on R such that each i, i = 1 : : : n, satis es a quadratic transportation cost inequality (6.9) with constant C > 0 on R. Then, q ; W2(P R)) C H RjP for every probability measure R on Rn . Proof. We follow the argument put forward in Proposition 1.19. It 2 is enough to consider n = 2. Replacing the cost c(x) = 1 2 jxj by 1 2 c(x) = 2C jxj , we may assume C = 1 throughout the argument below. f be bounded on R2, and for x2 2 R, set g(x2 ) = R QLet x 2 log e c f d 1 where f x2 ( ) = f ( x2 ). As we already saw in Proposition 1.19, By Jensen's inequality e;f d
Z R R

eQc f d 1d 2
Z

eQc g d 2: e gd
R
2

By (6.10) applied to g, eQc g d 2 172

and by (6.10) applied to f x2 for every x2,

g(x2 ) = log

eQc f x2 d 1

f x2 d 1 :

The claim follows. It is clear that if each i satis es a quadratic transportation cost inequality with constants Ci > 0, i = 1 : : : n, then the product measure P will satisfy a quadratic transportation cost inequality with constant C = min1 i n Ci (compare Corollary 5.7). The quadratic transportation cost inequalities thus turn to be useful tool in the investigation of dimension free concentration properties similar to what has been developed for logarithmic Sobolev inequalities in the preceding chapter. In a sense, the quadratic transportation cost inequalities may be thought of as dual to logarithmic Sobolev inequalities. Proposition 6.2 furthermore emphasizes the use of BrunnMinkowski inequalities in this context that me be compared to what was developed in Section 2.2. Let be a Borel probability measure Rn with density e;U with respect to Lebesgue measure. Assume that U is strictly convex in the sense that, for some c > 0 and every 2 0 1], every x y 2 Rn,

U (x) + (1 ; )U (y) ; U x + (1 ; )y c (1 ; )jx ; yj2 : 2

(6:12)

When the potential U is twice continuously di erentiable this amounts to Hess (U )(x) c Id as symmetric matrices uniformly in x 2 Rn. Theorem 6.4. Let d = e;U dx where U satis es (6.12) with n constant c > 0. Then, for every probability measure on R with a nite second moment, W2( )
r

1 H; j : c

173

Before turning to the proof of Theorem 6.4, note that it contains the case of the canonical Gaussian measure on Rn with c = 1. This case may be proved alternatively starting from dimension one and using induction over the coordinates. As shown in Bl], more generally, an alternate proof of Theorem 6.4 may be given using the Brenier-McCann theorem Bre], MC] about monotone measure preserving maps. Note furthermore that by Proposition 6.2, under the hypotheses of Theorem 6.4,
Z

eQc f d

e fd

for every bounded measurable function f while Theorem 2.13 of Chapter 2 shows that
Z

eQc=2 f d

e;f d

1:

(However Theorem 2.13 only uses (6.12) with = 1 2 .) We thus reach with the quadratic cost transportation inequalities somewhat sharper bounds under somewhat sharper hypotheses. Proof. Given 2 0 1] and x y 2 Rn, let

L (x y) = (1 1 ; ) V (x) + (1 ; )V (y) ; V x + (1 ; )y : We apply the functional form of the Brunn-Minkowski theorem (Section 2.2). Set
h ; i

u(x) = e; f (x);U (x) v(y) = e(1; )g(y);U (y)


we get 1
Z

w(z) = e;U (z)


(6:13) (6:14)

e; f d

1;

e(1; )g d

provided the functions f and g satisfy

g(y) f (x) + L (x y) x y 2 Rn:


174

Given f , the optimal function g = L f in (6.14) is de ned by

L f (y) = xinf g(x) + L (x y) 2R n


so that (6.13) becomes
Z

e; f d

1=

eL f d

1=(1; )

1:

(6:15)

Now, as a consequence of the convexity assumption on V , c jx ; yj2: lim inf L ( x y ) !0 2 As a result, letting ! 0 in (6.15), we arrive at
Z

eQc f d

e fd

which thus holds true for at least all bounded measurable functions f . As a consequence of Proposition 6.2, the theorem is established. Although we need not really be concerned with that, it is important to emphasize that quadratic transportation cost inequalities are actually consequences of logarithmic Sobolev inequalities. In particular, a statement such as Theorem 6.4 above is a consequence of Theorem 5.2 for logarithmic Sobolev inequalities. The derivation of a quadratic transportation cost inequality from a logarithmic Sobolev inequality may be performed on the picture of the Herbst argument. We brie y describe the argument. Assume thus we are given an absolutely continuous probability measure on the Borel sets of Rn such that for all smooth functions f on Rn , Z 2 Ent (f ) 2C jrf j2d : (6:16) Given a (bounded Lipschitz) function f on Rn, apply now the logarithmic Sobolev inequality (6.16) to eQc ( f )=C , where we recall that Qcf is the in mum-convolution of f with the quadratic cost 175

2 c(x) = 1 2 jxj . Now in mum-convolution are the Hopf-Lax representation of solutions of the Hamilton-Jacobi equation (cf. Ev]). Therefore, almost everywhere in space, 2 Qc ( g) = @@ Qc ( g) + 1 r Q ( g ) : c 2 We thus immediately deduce from the logarithmic Sobolev inequality (6.16) the di erential inequality

M 0 ( ) M ( ) log M ( )
on

2R
:

M ( ) = eQc (
R

f )=C d

Since M 0 (0) = C ;1 fd , it follows as in the proof of Theorem 5.1 that Z R eQc f=C d e fd =C that is the in mum-convolution inequality (6.10). As a consequence of the preceding and Proposition 6.2, we may state the following result. Theorem 6.5. Assume that is absolutely continuous and that n for some C > 0 and all smooth enough functions f on R , Ent (f 2 ) 2C
Z

jrf j2d :

Then, for every probability measure absolutely continuous with respect to , q ; CH j : W2( ) Replacing jx ; yj by the Riemannian distance d(x y) would yield the same conclusion on a smooth complete Riemannian manifold (X g). In particular, most of the logarithmic Sobolev inequalities we discussed in Chapter 5 may be turned into quadratic transportation inequality. 176

This line of reasoning may be pushed further to show that the modi ed logarithmic Sobolev inequality satis ed by the exponential measure (Theorem 5.12) also implies a transportation cost inequality for the cost associated in Proposition 4.17. For the particular example of the exponential measure itself with den;jxj with respect to Lebesgue measure on R, we have the sity 1 2e following statement. Let Wc ( ) = inf
ZZ

c(x ; y)d (x y)

where the in mum is running over all probability measures on R R with respective (integrable) marginals and and where c is the cost function of Proposition 4.17. Theorem 6.6. For every probability measure absolutely continuous with respect to the exponential measure on R, Wc( )
q

CH j

where C > 0 is a universal constant. The argument is quite similar to the one for the quadratic cost and we refer to B-G-L] for details. The transportation cost inequality of Theorem 6.6 is equivalent to the one put forward in Tal10]. It may be tensorized to dimension free transportation inequalities for products of the exponential measure to recover its sharp concentration properties Tal10] as exposed in Sections 4.5 and 5.3 by other means.

6.3 Transportation for product and non-product measures


In this section, we show how the transportation approach may be developed to obtain some of the geometric results of Sections 4.2 and 4.3. The arguments rely on somewhat delicate coupling argument. One main interest in this approach is that it allows us to investigate some non-product situations as emphasized by K. 177

Marton Mar2], Mar3]. We only review here a few recent results in this direction. To avoid unessential measurability questions, let for simplicity (Xi di ), i = 1 : : : n, be arbitrary Polish (complete separable) metric spaces and let X = X1 Xn . A point x in X has coordinates x = (x1 : : : xn ). Denote by P (Q R) the set of all probability measures on X X with marginals Q and R. De ne the coupling distance

d(Q R) = inf

Z X n

i=1

y ;fx xi = 6 yi g 2dR(y)

where the in mum is running over all 2 P (Q R). If is a probability measure on a product space k = 1 k , we z denote by , where

z = (zi1 : : : zi` ) 2
Z Z Z

`= i 1

i`

the (regular) conditional distribution of given z, that is such that for every bounded measurable function ' on k ,

'd =

k;

z (w) d ` (z ) ' ( w z ) d `

where ` is the marginal of bilistic notation,

over ` (cf. e.g. Str2]) In proba-

d(Q R) =

Z X n ; inf P

i=1

6 ij =y i=

dR(y)

(6:17)

where the couple of random variables ( ) has distribution and = ( 1 : : : n ), = ( 1 : : : n). Note that d(Q R) is not symmetric in Q R. To better understand the signi cance of the coupling distance d, let us relate it to the convex hull functional 'c A of Section 4.2. To this task recall from (4.5) that, for a given subset A of X and x 2 A, 'c jy j A (x) = y2inf V (x)
A

178

where VA (x) is the convex hull of (1fy1 6=x1g : : : 1fyn 6=xng ), y 2 A, in 0 1]n. It is easily seen that by Cauchy-Schwarz,

'c A (x) = inf

n ; X i=1

fy yi 6= xi g 2

where the in mum is running over all probability measures on X such that (A) = 1. As a consequence, for any Q R,
Z

2 ('c A ) dR d(Q R):

(6:18)

The following result will contain the main conclusions of Section 4.2. Theorem 6.7. For any product measure P on X = X1 Xn, and any probability measures Q and R on X , 1 d(Q R) H;R j P + H;Q j P : 4 To apply Theorem 6.7, take A X and assume that P (A) > 0. De ne rst Q( ) = P ( jA), for which H(Q j P ) = log P (1A) . Let R be such that dR = 1 e('c A )2 =4 dP Z where Z is the normalization constant, for which H RjP = By (6.18), H RjP
; ; Z

1 c 2 4 ('A ) dR ; log Z:

1 d(Q R) ; log Z: 4 It then follows from Theorem 6.7 that ; log Z H Q j P = log 1 P (A) 179

that is

A )2 =4 dP Z = e('c

which is the content of Theorem 4.6. The proof of Theorem 6.7 uses somewhat delicate coupling arguments. We brie y indicate the sketch of the proof. We actually deal with the more general version of Theorem 4.11. To this task, given 0, recall the function u u 0: (1 ; u) = u log u ; (1 + u) log 11+ + De ne, for every 0 the coupling distance

1 P (A)

d (Q R) = inf

Z X n

i=1

y ;fx xi 6= yi g

dR(y)

where the in mum is taken as in (6.17). Note that (u) u42 for = 1. The following theorem thus covers Theorem 6.7 and will imply exactly in the same way Theorem 4.11. Theorem 6.8. For any 0, any product measure P on X = X1 Xn , and any probability measures Q and R on X ,

d (Q R) H R j P + H Q j P :
Proof. The following (one-dimensional) coupling lemma is the key to the proof of Theorem 6.8. Lemma 6.9. Let (X d) be a Polish space. For any 0 and any probability measures Q and R on X , de ne

(Q R) =
Z ;

1 ; dQ dR

dR:

Then, there exists 2 P (Q R) such that


y (fx

x 6= yg dR(y) =
180

(Q R):

Furthermore, for any probability measure P on X , (Q R) H R j P + H Q j P :


Proof. Let (R ; Q)+ denote the positive part of the nite signed measure R ; Q and let Q ^ R denote the positive measure
; ;

R ; (R ; Q)+ = Q ; (Q ; R)+ : Set p = (R ; Q)+ (X ). Suppose 0 < p < 1, and consider independent random variables U , V and W with respective distributions (1 ; p);1 Q ^ R p;1(R ; Q)+ and p;1 (Q ; R)+: Let I 2 f1 2g be chosen independently of these variables such that I = 2 with probability p. Set = = U when I = 1 and = W 6= V = when I = 2. If p = 1, then we do not need the variable V for the construction of , whereas for p = 0 we never use W and V . The coupling ( ) with distribution 2 P (Q R) is such that ; f(x y) x 6= y y 2 g = (R ; Q)+( ): By the de nition of conditional probability distributions, for every Borel set B in X ,
Z

y ;fx

x 6= yg dR(y) =
=

y ; (x y ) y2B

x 6= y dR(y)

(x y) x 6= yg

= (R ; Q)+ (B)
+ implying that y (fx x 6= yg) = 1 ; dQ dR for R-almost every y . The rst part of the lemma follows. Turning to the second part, it su ces to consider P such that both f = dR=dP and g = dQ=dP exist. Let 1 (R + Q) P = 1+
;

181

so that Hence
;

dP = h = 1 (f + g): dP 1+
H RjP + H QjP =
; Z

f log f + g log g dP Z h i g f f log h + g log h dP R since h log hdP = H (P jP ) 0. Set = dQ=dR so that g = 1+ g= : and h 1+ f Hence, by the preceding,
H RjP + H QjP
; ; Z

(1 ; )dR

from which the desired claim follows since (1 ; u) ((1 ; u)+ ) for every u 0. Lemma 6.9 is established. The proof of Theorem 6.8 is now the proper tensorization of Lemma 6.9. Proof of Theorem 6.8. Fix > 0 and P = 1 n . For x ::: x 1 i 1 i = 1 : : : n, denote by Q the (regular) conditional disx 1 ::: xi 1 tribution of Q given x1 : : : xi;1 , and by Qi its marginal 1 ::: xi 1 along the xi coordinate. Similarly for Rx . By the rst part i of Lemma 6.9, there exists, for every i = 1 : : : n, a probability x1 ::: xi 1 ) such that 1 ::: xi 1 measure i in P (Qx R i i ; x1 ::: xi 1 x1 ::: xi 1 Qi Ri
; ; ; ; ; ; ;

= (B) =
Z Z

y ;fx i

1 ::: xi 1 x 6= yg dRx (y): i


;

Let now on X X be de ned by


B
1 (dx1

dy1 )
182

x ::: x n1 n

x1 y1 (dx
;

dy2 ) 1 y1 ::: yn 1 (dx dy ): n n


2
;

Note that 2 P (Q R). By the properties of conditional expectations and convexity of ,


Z ;

y ;fx xi 6= yi g
Z Z

d (x y)
; ;

yi ; i fxi 6= yi g
;

d (x y )
;

1 ::: xi 1 1 ::: yi 1 Qx Ry d (x y): i i

As a consequence of Lemma 6.9,

d (Q R)

n Z X
Z X n

i=1

1 ::: xi 1 1 ::: yi 1 Qx Ry d (x y ) i i
; ;

i=1

; 1 ::: yi 1 ; x1 ::: xi 1 H Ry j + H Qi j i d (x y): i i


; ;

The chain rule for relative entropy (cf. De-S], Dem-Z1]) indicates that n Z ; X ; 1 ::: xi 1 H Q jP = H Qx j i dQ(x) i
;

and similarly for H(RjP ). Theorem 6.8 is thus established. As shown by A. Dembo De], the preceding tools may be extended to reach in the same way the q point approximation theorem of Section 4.3. The transportation approach may be well-suited to reach some dependant situations Variants of Theorem 6.7 have namely been used by K. Marton Mar2] in the investigation of some nonproduct Markov chains for which it appears to be a powerful tool. More precisely, let P be a Markov chain on 0 1]n with transition kernels Ki , i = 1 : : : n, that is

i=1

dP (x1 : : : xn ) = Kn(xn dxn;1 )


183

K2 (x2 dx1 )K1 (dx1 ):

Assume that, for some 0 a < 1, for every i = 1 : : : n, and every x y 2 0 1], Ki (x ) ; Ki (y ) TV a: (6:19) The case a = 0 of course corresponds to independent kernels Ki . By means of the analogue of Theorem 6.7, one main result of Mar2] (expressed on functions) is the following. It appears as the proper extension of Corollary 4.9. Theorem 6.10. Let P be a Markov chain on 0 1]n satisfying (6.19) for some 0 a < 1. For every convex Lipschitz map F on Rn with kF kLip 1, and any r 0,

P fjF ; FdP j rg

p 22 2 e;(1; a) r =4:

This result has been extended in Mar4] and, independently in Sa], to larger classes of dependent processes, including Doeblin recurrent Markov chains and -mixing processes. We refer to Mar4], Sa] for details.

Notes and Remarks


The interest of information inequalities to the concentration of measure phenomenon was emphasized in a series of papers by K. Marton Mar1], Mar2]. Proposition 6.1 is due to S. Bobkov and F. Gotze B-G]. The rst quadratic transportation cost inequality for Gaussian measures is due to M. Talagrand Tal10], with a proof that is further extended to strictly log-concave measures in Bl] by means of the Brenier-McCann mass theorem transference Bre], MC]. A simple transparent proof is provided in CE]. Its in mumconvolution description is emphasized in B-G], and further analyzed in Bo-L3] and B-G-L]. That quadratic transportation cost inequalities follows from logarithmic Sobolev inequalities (Theorem 6.5) is due to F. Otto and C. Villani O-V] with a PDE proof. The connection with Hamilton-Jacobi equations and hypercontractity is made clear in B-G-L]. Transportation cost inequalities for the exponential distribution (Theorem 6.6) were rst obtained 184

in Tal10] to produce and alternate approach to the concentration results of Tal3]. The connection with modi ed logarithmic Sobolev inequalities is exposed similarly in B-G-L]. The transportation cost approach was initiated by K. Marton Mar2] to extend Talagrand's convex hull theorem to some contractive Markov chain. Theorem 6.10 is due to K. Marton Mar2]. The method was then systematically applied by A. Dembo De] to recover most of the geometric inequalities for product measures of Section 4.2 and 4.3. See also Dem-Z2]. Theorem 6.8 and its proof are taken from De] and Dem-Z1]. More recent developments for dependent variables and processes are due to K. Marton Mar3], Mar4], P.-M. Samson Sa] and E. Rio Ri1] with applications to bounds on empirical processes in the spirit of the inequalities described in Chapter 7.

185

7. SHARP BOUNDS ON GAUSSIAN AND EMPIRICAL PROCESSS

In this chapter, we illustrate some of the basic principles of concentration to sharp bounds on Gaussian and empirical processes (or norms of sums of independent random vectors). In the rst section, we present the sharp deviation inequality for supremum of Gaussian processes and its consequence to strong integrability. We then turn to bounds on sums of independent random vectors and empirical process, rst with the geometric tools of Chapter 4, while in the last section, we establish sharper bounds with the entropic method. One of the powers of the concentration inequalities is that they extend results for classical sums of samples of independent random variables to Lipschitz functions of such samples, therefore allowing extraordinary applications. This is one typical aspect of the content of this chapter.

7.1 Gaussian processes


We illustrate in this section the use of concentration properties of Gaussian measures to sharp deviation and integrability results for Gaussian processes. On some probability space ( A P), let G = (Gt )t2T be a centered Gaussian process indexed by some parameter set T . We mean thus that for any nite collection (t1 : : : tn) in T , the random vector (Gt1 : : : Gtn ) is a centered Gaussian random vector in Rn. 186

We are interest here in sharp probability inequalities on the distribution of the supremum of G. To this task, assume, to avoid measurability questions, that T is countable. In general this is dispensed with separability assumptions. In any case, the basic inequalities are nite dimensional. We assume that supt2T Gt < 1 almost surely. We rst claim that
; 1=2 = sup E (G2 t) : t2T

(7:1)

3 . Then Let indeed m be such that P(fsupt2T Gt mg) 4 2 ))1=2 , m ; 1( 3 ) > 0 P(fGt mg) 3 and, if = ( E ( G t t 4 4 t from which (7.1) follows (recall that is the distribution function of the standard one-dimensional Gaussian law). Now x t1 : : : tn in T and consider the centered Gaussian random vector (Gt1 : : : Gtn ) in Rn. Denote by ; = M t M its (semi-) positive de nite covariance matrix. The random vector (Gt1 : : : Gtn ) has thus the same distribution as M N where N is distributed according to the canonical Gaussian measure on Rn. Let F : Rn ! R be de ned by

F (x) = 1max (Mx)i i n

x = (x1 : : : xn ) 2 Rn:

Hence the distribution of F under is the distribution of the random variable max1 i n Gti . It is easily seen that the Lipschitz norm kF kLip of F is less than or equal to the norm kM k of M as an operator from Rn equipped with the Euclidean norm into Rn equipped with the supremum norm. Furthermore, this operator norm kM k is equal by construction to max 1 i n
n X j =1

M2

ij

1=2

; 1=2 = 1max E (G2 ) t i i n

where we denote by Mij the entries of the matrix M . Applying thus (2.32) for example to F yields
P

max G 1 i n ti

max G + r 1 i n ti 187

e;r2 =2

(7:2)

for every r 0. The same inequality applied to ;F yields


P

max G 1 i n ti

max G ; r 1 i n ti

2 2 e;r =2 :

(7:3)

2 =2 2 1 . Let also m be Let then r0 be large enough so that e;r0 <2 large enough in order that P(fsupt2T Gt mg) 1 2 : Intersecting this probability with the one in (7.3) for r = r0 shows that

max G 1 i n ti

r0 + m:

Since m and r0 have been chosen independently of t1 : : : tn, we already get that ; E sup Gt < 1:
t2T

Now, one can use monotone convergence in (7.2) to conclude to the following basic inequality. Theorem 7.1. Let G = (Gt )t2T be a centered Gaussian process indexed by a countable set T such that supt2T Gt < 1 almost surely. Then, for every r 0,
P

sup Gt
t2T

sup Gt + r
t2T

e;r2 =2

where 2 = supt2T E (G2 t ) < 1. Therefore, up to the deviation factor E (supt2T Gt ), the distribution of the supremum supt2T Gt is as good as a one-dimensional Gaussian law with variance the supremum of the variances. Furthermore, for every r 0,
P

sup Gt ; E sup Gt t2T t2T

2 e;r =2 :
2 2

(7:4)

Both Theorem 7.1 and (7.4) hold similarly with the median of supt2T Gt instead of the mean (using (2.11)). One may also work with supt2T jGtj assuming supt2T jGtj < 1 almost surely. 188

As we already discussed it in Chapter 3, the main interest in the inequality of Theorem 7.1 is the relative size of and E (supt2T Gt ). We always have
E
; ; sup Gt = 1 E sup Gs ; Gt 2 s t2T t2T ; =1 E sup jGs ; Gt j 2

1 sup 2 1=2 2 s t2T

s t2T

jGs ; Gt j2
sup Gt :
t2T

1=2

so that, for every s 2 T ,


; 1=2 E (G2 + s)

In general E (supt2T Gt) is much bigger than . Think for example of G being distributed as on Rn for which E (supt2T Gt ) is of p order of log n for n large whereas = 1. This was actually one crucial point in the concentration proof of Dvoretzky's theorem in Section 3.4. On the other hand, E (supt2T Gt ) only appears as a deviation factor and not as a multiplicative factor in the exponential. The concentration inequalities however do not provide any hint on the size of E (supt2T Gt) itself for which independent tools are required (entropy or majorizing measures cf. Le-T], Tal11]). The next theorem is a consequence to strong integrability of the supremum of Gaussian processes. Corollary 7.2. Let G = (Gt )t2T be a centered Gaussian process indexed by a countable set T such that supt2T Gt < 1 almost surely. Then 1 log P; sup G r = ; 1 lim t r!1 r2 2 2 t2T where
2 = sup

t2T E (Gt ) < 1. Equivalently,


2
E

exp (supt2T Xt)2 189

<1

if and only if < 212 . The rst assertion in Corollary 7.2 is a large deviation statement for complements of balls. The upper bound immediately follows from Theorem 7.1. The lower bound is just that
; P

sup Gt r
t2T

; P

fGt rg
t e;r =2 t2
2

=1;

2 (1 + (r= t ))

for every t 2 T and r 0. The second part of the theorem follows by integration of the inequality of Theorem 7.1 in r 0. The deviation inequality of Theorem 7.1 actually contains much more information than just this integrability result. For example, if Gn is a sequence of Gaussian processes as before, and if we let kGnk = supt2T Gn kGnk ! 0 alt , n 2 N, then p most surely as soon as E (kGnk) ! 0 and n log n ! 0 where n = sup (E ((Gn )2 ))1=2 . t2T t ; As we have seen E supt2T jGt j c for some numerical c > 0. Therefore, integration of (7.4) applied to supt2T jGt j shows, as in Proposition 1.10 and (1.19), that for some numerical constant C > 0 and all q 1,
E
;

sup jGtjq
t2T

1=q

C pq

sup jGtj :
t2T

(7:5)

This equivalence of moments is a useful tool in the study of Gaussian processes and measures. The preceding results may be presented equivalently on Gaussian measures on in nite dimensional normed vector spaces (E k k). Now, such GaussianP measures may be represented as (almost sure convergent) series 1 i=1 gi vi where vi , i 1, is a sequence in E and the gi's are independent standard normal random variables on some probability space ( A P). Theorem 7.1 and Corollary 7.2 thus describe the distribution and integrability properties of 190

the norm Note that since

1 X
i=1

gi vi :
1 X

1 X
i=1

givi = sup

k k 1 i=1

gih vi i

where the supremum is running over the unit ball of the dual space, 1 2 = sup Xh v i2 : i Since the norm is convex, the results in Section 4.2 extend these conclusions to supremum of non necessarily Gaussian functionals. For simplicity we deal below with nite sums. Let namely i , i = 1 : : : n, be independent random variables on ( A P) with j ij 1 almost surely. Let vi, i = 1 : : : n, be vectors in an arbitrary Banach space E with norm k k. Since the norm is a supremum of linear functionals, we may directly apply Corollary 4.10 to get deviation and concentration inequalities around a median. Let us however repeat here the argument. Consider the convex function F : Rn ! R de ned by
k k 1 i=1

F (x) =

n X i=1

xi vi
n X i=1

x = (x1 : : : xn) 2 Rn:

Then, by duality, for x y 2 Rn,

F (x) ; F (y)

(xi ; yi )vi
n X

= sup

k k 1 i=1

(xi ; yi )h vi i

jx ; yj

where the last step is obtained from the Cauchy-Schwarz inequality. Hence kF kLip . As an application of Corollary 4.9 or Theorem 5.9, we get the following result. 191

Theorem 7.3. Let 1 : : : n be independent random variables such that j i j 1 almost surely, i = 1 : : : n, and let v1 : : : vn be vectors in a Banach space (E k k). For every r 0,
P

n X i=1

i vi

M +r
n X

2 e;r2 =8
Pn

where M is either the mean or a median of


2

i=1 i vi and where

= sup

k k 1 i=1

h vi i2 :

There is a similar inequality for deviations under M , and thus concentration inequalities. This inequality is the analogue of the Gaussian deviation inequality of Theorem 7.1 and describes one proper in nite dimensional extension of the Hoe ding type inequality (1.23). Theorem 7.3 may be used as for Corollary 7.2 to show that whenever the series 1 X S= i vi is almost surely convergent in B, then its norm kS k is strongly integrable in the sense that
E
;
2 e kSk < 1

i=1

(7:6)

for every . (The fact that it holds for every with respect to the Gaussian case is due to the fact that the i's are bounded.) Theorem 7.3 may be used to proved equivalences of moments as for Gaussian random vectors and series. In particular, if the i 's are symmetric Bernoulli variables taking values 1, the classical Khintchine inequalities show that ; E (kS k) sup E jh S ij
kk 1 kk 1

sup c

jh S ij2

1=2

=c

192

for some numerical c > 0. Hence, as for (7.5),


E
;

kS kq

1=q

C pq E kS k
;

(7:7)

for some numerical constant C > 0 and all q 1. Inequalities (7.7) are the famous Khintchine-Kahane inequalities. As for (7.5), these moment equivalences are part of the geometric reversed Holder inequalities (2.21).

7.2 Bounds on empirical processes


Sums of independent random variables are a natural application of the deviation inequalities for product measures. In this section and the next one, we present estimates on supremum of empirical processes. This paragraph relies on the geometric concentration properties developed in Sections 4.2 and 4.3 while the next one is based on the entropic method of Chapter 5. Tail probabilities for sums of independent random variables have been extensively studied in classical probability theory and limit theorems. We already mentioned in this work the Hoe ding type inequality (1.23) (and its martingale extension Lemma 4.1)
; P

E (S ) + r

e;r2 =2nC2

where S = X1 + + Xn is a sum of independent real random variables bounded by C > 0. One nished result is the so-called Bennett inequality Benn] (after contributions by S. Bernstein, A. N. Kolmogorov, Y. Prokhorov, W. J. Hoe ding etc). As for (1.23), it will be convenient to compare the in nite dimensional extension we will present to this result. Let thus X1 : : : Xn be independent real-valued random variables on some probability space ( A P) such that jXi j C , i = 1 : : : n. Set S = X1 + + Xn. Then, for every r 0,
; P

E (S ) + r

2 exp ; C 2 h Cr 2

(7:8)

193

2 where h(u) = (1 + u) log(1 + u) ; u, u 0, and 2 = n i=1 E (Xi ). Such an inequality describes the Gaussian tail for the values of r which are small with respect to 2 , and the Poissonian behavior for the large values (think, for example, of a sample of independent 1 or of the Bernoulli variables, with probability of success either 2 1 ). order of n Our task in this chapter will be to understand what P is saved of such a sharp inequality for norms of sums S = n i=1 Xi of independent random vectors Xi taking values in some Banach space (E k k). In particular these bounds aim to be as close as possible to the one-dimensional inequality (7.8). They should also compare to the bounds on Gaussian processes of the preceding section involving two main parameters, one on the supremum itself (mean or median), and one on supremum of variances. A rst result in this direction is Corollary 4.5 obtained from martingale inequalities. However, the inequality of Corollary 4.5, while a deviation inequality from the mean E (kS k), does not emphasize, according to Theorems 7.1 and 7.3 the supremum of weak variances n ; X sup E h Xi i2 :

This is why we have to turn to more re ned tools such as the isoperimetric inequalities in product spaces of Sections 4.2 and 4.3 or the entropic method of Chapter 5. Motivated by recent applications, we present the results in the context of supremum of empirical processes rather than sums of random vectors. This is however only di erent at a notational level. Indeed, in statistical applications, one is interested in such bounds uniformly over classes of functions, and importance of such inequalities has been emphasized in the statistical treatment of selection of models in B-M1], B-M2], B-B-M] (see B-M]). More precisely, let X1 X2 : : : Xn : : : be independent random variables with values in some measurable space (S S ) with identical distribution P , and let, for n 1, 1 Pn = n
n X i=1 Xi

k k 1 i=1

194

be the empirical measures (on P ). A class F of real measurable functions on S is said to be a Glivenko-Cantelli class if supf 2F jPn(f );P (f )j converges almost surely to 0. It is a Donsker p class if, in a sense to be made precise, n(Pn(f ) ; P (f )), f 2 F , converges in distribution toward a centered Gaussian process GP = fGP (f ) f 2 Fg with covariance function P (fg);P (f )P (g), f g 2 F . These de nitions naturally extend the classic example of the class of all indicator functions of intervals (;1 t], t 2 R (studied precisely by Glivenko-Cantelli and Donsker). These asymptotic properties however often need to be turned into tail inequalities at xed n on classes F which are as rich as possible (to determine accurate approximation by empirical models). In particular, the aim is to reach exact extension of the Gaussian bound of Theorem 7.1 for GP and the Bennett inequality (7.8) corresponding to a class F reduced to one function. In this section, we present a result on the basis of the control by q points presented in Section 4.3. As we have seen there, this method is of particular interest to bounds on sums of independent random variables and we already presented there a useful inequality for non-negative summands (Corollary 4.14). In the study of empirical processes, one does not usually deal with nonnegative summands. One general situation is thus the following. Let X1 : : : Xn be independent random variables taking values in some space S and consider, say, a countable family F of (measurable) real-valued functions on S . We are interested in bounds on the tail of n X Z = sup f (Xi ):
f 2F i=1 n X

(One can deal similarly below with absolute values


f 2F i=1
E (f (Xi ))

sup

f (Xi ) :)

For simplicity, we deal with centered random variables, i.e. = 0 for every 1 i n and every f 2 F . If this is not the case, replace f (Xi ) by f (Xi ) ; E (f (Xi )), although several 195

results do actually still hold for f (Xi ) itself. Standard symmetrization techniques reduce then to the investigation of
n X s Z = sup "i f (Xi ) f 2F i=1

where ("i )1 i n are independent symmetric Bernoulli random vari0 : : : Xn 0 g denotes an ables independent of the Xi 's. Indeed, if fX1 independent copy of the sample fX1 : : : Xn g, it is not di cult to see that for every r a 0,
; P

fZ r + ag

+ sup P
f 2F

f 2F i=1 n X i=1

sup

n X

f (Xi ) ; f (Xi0 ) r f (Xi ) a


n

By symmetry,
X ; sup f (Xi ) ; f (Xi0 ) and sup "i f (Xi ) ; f (Xi0 ) f 2F i=1 f 2F i=1

n X

have the same distribution. Together with some minor modi cations, we may thus reduce to the symmetrized supremum Z s. When dealing with Z or Z s, we may not use directly Corollary 4.14. To turn this di culty, we use the symmetry properties of Z s. Write ; Z s = Z s ; E "(Z s) + E " (Z s) where E " is partial integration with respect to the Bernoulli variables "1 : : : "n. The point is that as we have seen prior to Corollary 4.13, E " (Z s) is monotone and subadditive with respect to the independent random variables X1 : : : Xn . Corollary 4.13 therefore applies to E " (Z s). The remainder term Z s;E (Z s ) is bounded, conditionally on the Xi 's, with the deviation inequality of Theorem 7.3 by a Gaussian tail involving a random supremum of weak variances n 2 = sup X f 2 (Xi ):
f 2F i=1

196

Now 2 is again monotone and subadditive so that Corollary 4.13 may also be applied to it. Assume F consists of functions f with jf j 1. Combining the arguments yields (after some work, cf. Le-T], Tal5]) the inequality, for integers k q 1,
; P

Z s 8qE (Z s ) + 2k

2q q;k + +2 e;k2 =128qE ( 2 ): (7:9)

How is (7.9) used in applications? One possible choice is simply q = 2. We may also optimize the choice of q and take it of the order of k= log k. This choice then leads to the following general statement (cf. Tal5]). Theorem 7.4. If jf j C for every f in F , and if E (f (Xi )) = 0 for every f 2 F and i = 1 : : : n, then, for all r 0,
; P Zs

K E (Z s) + r
P

Cr r log 1 + K exp ; KC 2 E ( ) + C2

2 where 2 = supf 2F n i=1 E f (Xi ) and K > 0 is a numerical constant. To compare more carefully this result to (7.8), it is important to give a more tractable form to E ( 2 ). To this task, we may write
E ( 2)

2+E

f 2F i=1

sup

n X

f 2 (Xi ) ; E f 2 (Xi ) :
n X

By Jensen's inequality
E

f 2F i=1

sup

n X

; f 2 (Xi ) ; E f 2 (Xi )

f 2F i=1

sup

f 2 (Xi ) ; f 2 (Xi0 )

where the Xi0 's are independent copies of the Xi 's, i = 1 : : : n. By symmetry,
E

f 2F i=1

sup

n X

f 2 (Xi ) ; f 2 (Xi0 )
197

2E sup

n X

f 2F i=1

"i f 2 (Xi )

where we recall that "1 : : : "n are independent 1 symmetric Bernoulli random variables independent of the Xi 's. We then have to make use of a contraction inequality Le-T], p. 112,
E

f 2F i=1

sup

n X

" f 2 (X
i

i)

4C E sup

n X

f 2F i=1

"i f (Xi ) :

Alltogether, we conclude that


E ( 2)

2 + 16C E (Z )

(7:10)

where and

2=

f 2F i=1

n ; X sup E f 2 (Xi ) n X

Z = sup

(if Z is de ned without absolute values). With this observation, Theorem 7.4 provides an inequality close to both the classical exponential inequalities for sums of independent real-valued random variables (7.8) and to the bounds of the preceding section on supremum of Gaussian processes with the same basic parameters and E (Z ). Theorem 7.4 is good enough to establish most basic limit theorems in Banach spaces (cf. Le-T], Le-Z]]).

f 2F i=1

f (Xi )

7.3 Bounds via the entropic method


One weak aspect of Theorem 7.4 is that it does not present a deviation inequality from the mean (or median) itself but rather a multiple of the mean. Neither does it provides concentration inequality. This can be obtained, in however a rather delicate way, by further abstract developments of the methods of Sections 4.2 and 4.3 as developed in Tal9]. Here, we observe that the functional approach based on logarithmic Sobolev inequalities developed in Chapter 5 may be used 198

to yield these sharper bounds. The overall appraoch is more simple and more transparent than the preceding developements based on the results of Sections 4.2 and 4.3. We start with a rst result concerning supremum of empirical processes over a class of non-negative bounded functions. As before, let X1 : : : Xn be a sample of independent random variables with values in some measurable space (S S ). Let F be a (countable) class of measurable functions f on S such that 0 f 1. Set n X Z = sup f (Xi ):
f 2F i=1

Theorem 7.5. Under the preceding notation, for any


E
;

0,

eZ

e(e ;1)E (Z): exp ; E (Z )h

In particular, for any r 0,


; P

E (Z ) + r

r E (Z )

where h(u) = (1 + u) log(1 + u) ; u, u 0. The exponential inequality of Theorem 7.5 is the optimal extension of (7.8) for non-negative summands. It is in particular attained when Z has a Poisson distribution. Note that since h(u) 1 2 u log(1 + u) for any u 0, Theorem 7.5 implies that
; P

E (Z ) + r

r log 1 + r exp ; 2 E (Z )

(7:11)

for every r 0. Proof. We may clearly assume that F is a nite class (with N elements). We may then represent Z as a function

Z (x) = 1 max k N
199

n X i=1

xk i

where x = (x1 : : : xn ) 2 E n, E = RN . If we further denote by i the law of f (Xi ), f 2 F , on E , the distribution of Z under P is the same as the distribution of Z (x) under the product measure P of the i 's. Note that since 0 f 1, the i 's are supported by 0 1]N E . Since P is a product measure, we may apply the product property of entropy (Proposition 5.6) to get that EntP (e Z )
n Z X i=1

Ent i (e Zi )dP:

(7:12)

Let (u) = e;u + u ; 1, u 2 R. The variational characterization of entropy indicates that for any probability measure and any (say bounded) function f , Ent (ef ) = u inf0
Z

f ef ; (log u + 1) ef + u d :

For every x = (x1 : : : xn ) in E n, and i = 1 : : : n, set

y = yi (x) = (x1 : : : xi;1 0 xi+1 : : : xn ) 2 E n:


In the variational characterization of Ent i (e Zi ), choose then u = e Z(y) so that
Z

Ent i (e Zi )dP
Z ;

Z (x) ; Z yi (x) Z (x) Z (yi (x))

e Z(x)dP (x):

(7:13)

Note that when xi 2 0 1]N ( i is concentrated on 0 1]N ), by de nition of Z . On the other hand, if (Ak )1 k N is a partition of E n such that

Ak

x 2 E n Z (x) =
200

n X i=1

xk i

and if k = k (x) = 1Ak (x), 0 Z (x) ; Z yi (x) Now, since is convex, for any u ( ). Therefore, for any 0,
; ;

N X k=1

k xk i =h

xi i:

0 and u 2 0 1], ( u)

Z (x) ; Z yi (x)
n Z X
Z

h xi i ( ):
Z (x) dP (x)

Together with (7.12) and (7.13), it follows that EntP (e Z ) ( ) = ( )


P

i=1

h xi i e

Z e Z dP

(7:14)

since by construction n h xi i (x). i=1 R=Z Z Z Let now ( ) = E (e ) = e dP , 0, be the Laplace transform of Z . What we have obtained in (7.14) is that, for every 0, 0 ( ) ; ( ) log ( ) ( ) 0( ) that is ; 1 ; e; 0( ) ( ) log ( ) 0: This di erential inequality may be integrated in the optimal way. Namely, the function J ( ) = log ( ) satis es

J 0( )
that is

1 ; e; ;1 J ( )

log J ( ) 0 log(e ; 1) 0 : Hence (e ; 1)J ( ) % E (Z ) as & 0 from which the desired claim follows. By Chebyshev's inequality, for every r 0,
; P

E (Z ) + r

E (Z )

( ) e; r

201

and minimizing over 0 yields the deviation inequality of the statement. Theorem 7.5 is established. Now we turn to classes of (bounded) functions with arbitrary signs. Let as before Xi , i = 1 : : : n, be independent random variables with values in some space S , and let F be a countable class of measurable functions on S . Set

Z = sup Z = sup

n X

The arguments developed below are similar for


n X f 2F i=1

f 2F i=1

f (Xi ): f (Xi ) :

2 where 2 = supf 2F n i=1 f (Xi ) is the supremum of random variances and where K > 0 is a numerical constant. This statement is as close as possible to (7.8). With respect to this inequality, the main feature is the concentration property with respect to the mean E (Z ) (or equivalently a median of Z ). As for Gaussian processes, the theorem does not yield any information on the size of E (Z ). The proof of Theorem 7.6 relies as the one of Theorem 7.5 on the entropic method but several technicalities make it a little bit heavier. In particular, we will not look that time for an optimal integration of the basic di erential inequality on Laplace transforms we will get. Reasonable values of K are however deduced in Mas1]. Proof. By homogeneity, take C = 1. As in the proof of Theorem 7.5, we may and do assume that F is nite. We then represent Z as a function n X Z (x) = 1 max xk i k N

Theorem 7.6. If jf j C for every f in F , then, for all r 0, P Z ; E (Z ) r 16 exp ; r log 1 + Cr2 KC E( )
P

i=1

202

where x = (x1 : : : xn) 2 E n, E = RN . As before, if we denote by i the law of f (Xi ), f 2 F , on E , the distribution of Z under P is the same as the distribution of Z (x) under the product measure P of the i 's. With respect to Theorem 7.5, since jf j 1, the i 's are rather supported by ;1 +1]N E . Recall also that we denote by (Ak )1 k N a partition of E n such that

Ak

x 2 E n Z (x) =

n X i=1

xk i :

Now, the convexity properties of Z ensure that for every i = 1 : : : n and every x y 2 E n such that xj = yj , j 6= i,

Z (x) ; Z (y)

N X k=1

k k jxk i ; yi j

(7:15)

where we recall that k = k (x) = 1Ak (x). On the other hand, whenever xi yi 2 ;1 +1]N (the measures i are supported by ;1 +1]N ), Z (x) ; Z (y) 2: (7:16) We make use of Corollary 5.8 together with (7.15) and (7.16). That is, for every 2 R, EntP (e Z )
2X i=1

ZZ

f Zi (xi ) Zi (yi )g

Zi(xi ) ; Zi(yi )

e Zi(xi)d i (xi )d i (yi ) dP (x): Fix i, 1 i n. When


ZZ ZZ X N ;

0,

f Zi (xi ) Zi (yi )g
k=1

Zi(xi ) ; Zi (yi ) 2e

Zi (xi ) d i (xi )d i (yi )

k 2 Z (x ) k (x)(xk i ; yi ) e i i d i (xi )d i (yi )

203

whereas when
ZZ ZZ X N

0,
;

f Zi (xi ) Zi (yi )g
k=1

Zi(xi ) ; Zi (yi ) 2e

Zi (xi ) d i (xi )d i (yi )

k 2 Z (y )+2j jd i (xi )d i (yi ) k (y )(xk i ; yi ) e i i

where we used (7.15) and (7.16). It follows that, for every 2 R, EntP (e Z )
2 e2j j
ZZ

max 1 k N

n X i=1

k 2 Z (x) dP (x)dP (y ): (7:17) (xk i ; yi ) e

The extra factor e2j j with respect to analogous formulas might look annoying. However, we only use (7.17) below with small. Let us not recast inequality (7.17) with the original notations. Using that (u + v)2 2u2 + 2v2 and independence, inequality (7.17) amounts to the inequality in 2 R,
; EntP (e Z ) 2 2e2j j E ( 2)E (e Z ) + E 2 e Z :

Let

0 > 0 to be speci

ed below. In particular, for every j j


2 E ( 2 )E (e Z ) + E ; 2 e Z

0,

EntP (e Z ) c0

(7:18)

with c0 = 2e2 0 . As in Chapter 5 and Theorem 7.5 above, one has to integrate this di erential inequality. We do not try here a sharp integration as in Theorem 7.5, but what follows is su cient for the purpose of the theorem (for a more careful analysis, see Mas1]). e = Z ; E (Z ). In terms of the Laplace transWe work with Z e, (7.18) indicates that, for every j j form e of Z 0,
e 0(

) ; e ( ) log e ( ) c0

2 E ( 2)e (

)+E

; 2 Z e e

204

e). Letting A = f 2 We rst bound the term E ( 2 e Z we have Z ; 2 Z e edP: 2 2e Z E e 4E ( ) e ( ) +

4E ( 2 )g,

By Young's inequality uv e and v = 1 2 1 , u=e Z A 2


Z

A v u log u + e ,
Z

0, v 2 R, with

It is easily seen that


Z

e 2e Z dP

e ee Z E (Z )+2

2 e =2dP + 2:

2 e =2 dP e;2E (

) E ;e

Now, since 0 f 2 1, 2 enters the setting of Theorem 7.5. Using that ( ) 2 for 0 1, we learn from Theorem 7.5 that for every 0 1,
E
;

e
Z

e(

+ 2)E ( 2 ) :

Therefore

Summarizing the preceding estimates, for j j


e 0(

2 e =2dP 1:

0,

) ; e ( ) log e ( ) c0

5E ( 2 ) e ( ) + 2 e 0( ) + 4 :

We now integrate this di erential inequality in the standard way. Set H ( ) = 1 log e ( ), H (0) = 0, 2 R. Since e 1, the preceding inequality implies that

H 0 ( ) c0 5E ( 2 ) + 4 + 2
For any such that j j get (recall H (0) = 0),

) : e( ) 0 , we may integrate the preceding to


2 + 2 2 log e (

e 0(

log e ( ) c0 5E ( 2 ) + 4 205

) :

If 4c0

1, it follows that for every j j


e(

0
2

) e2c0 5E (

)+4]

e10c0 E (

) 2 +2 :

This inequality already yields the Gaussian bounds. Namely, take 1, 1 (to be safe) so that c0 2:5. For every 0 0 = 10 0 = 10 and r 0,
P

Z ; E (Z ) r

e; r+25E (

) 2+2 : 0

Choose then = r=50E ( 2) whenever r 1 2 10 when r 5E ( ) to get


P

5E ( 2 ) and =

Z ; E (Z ) r

for every r we may state.


P

1 min r r2 e2 exp ; 20 5E ( 2 ) 0. Together with the same argument for E (Z ) ; Z , 1 min r r2 15 exp ; 20 C 5E ( 2 )

Proposition 7.7. If jf j C for every f in F , then, for all r 0,


Z ; E (Z ) r :

We now complete the proof of the theorem with the Poissonian bound. We use a truncation argument. For every r 0,
P

Z ;E (Z )
P

4r

Z ; E (Z ) Z = sup
n X

r + P W + E (W ) 3r
n X

where

with F = ff 1fjf j g f 2 Fg, > 0 to be determined, and

f 2F i=1

f (Xi )

W = sup

f 2F i=1

f (Xi ) Ifjf (Xi)j> g :


206

We use Proposition 7.7 for Z to get, for every r 0,


P

Z ;E (Z )

1 min r r2 15 exp ; 20 5E ( 2 )
; P

(7:19)

On the other hand, we apply Theorem 7.5, more precisely (7.11) to W , to get
; P

W + E (W ) 3r
E (W ).

r log 1 + r exp ; 2 E (W )
Choose now = (r) =
E (W ):
p

E (W ) + r

(7:20)
E ( 2 )=r .

provided that r Then, since W

2=

2 r E ( 2) = E ( )

Since for every u 0, min

pu u
5

1 log(1 + 4u) 20

we have by the choice of


2 min r 5E r ( 2)

r log 1 + 4r 20 E ( 2)
log 1 +
r

while

1 log 1 + 4r : 4 E ( 2) Therefore, as a consequence of (7.19) and (7.20),


P

log 1 + r E (W )

E ( 2)

Z ; E (Z )

4r

r log 1 + 4r 16 exp ; 400 E ( 2)


207

Changing r into r=4 completes the proof of Theorem 7.6. Together with (7.10), Theorem 7.6 gives rise to the following more useful version. Recall that when we deal with Z = Pn supf 2F i=1 f (Xi ) without absolute values, we set

Z = sup

n X

f 2F i=1

f (Xi ) :

Corollary 7.8. If jf j C for every f in F , and if E (f (Xi )) = 0 for every f 2 F and i = 1 : : : n, then, for all r 0,
P
;

Z ; E (Z )

r
P

r log 1 + 16 exp ; KC

Cr 2 + C E (Z )

2 where 2 = supf 2F n i=1 E f (Xi ) and K > 0 is a numerical constant. If one is not concerned with Poissonian behavior, Proposition 7.7 may also be stated in the following more tractable way (see Mas1], Mas2]). Corollary 7.9. If jf j C for every f in F , and if E (f (Xi )) = 0 for every f 2 F and i = 1 : : : n, then, for all r 0 and " > 0,
P

Z (1 + ")E (Z ) +

p p

Kr + K (1 + ";1 )Cr

e;r

and
P

Z (1 ; ")E (Z ) ;

Kr ; K (1 + ";1 )Cr

e;r

where K > 0 is numerical.

208

Notes and Remarks


Bounds on norms of sums of independent random vectors or empirical processes motivated the early investigation by M. Talagrand of concentration inequalities in product spaces. Aplications of Gaussian isoperimetry to sharp bounds on Gaussian random vectors and processes, initiated in Bor2], were fully developed in the seventies after the early integrability theorems by H.J Landau and L. A. Shepp L-S] and X. Fernique Fe] (cf. Le-T], Le3], Li], Bog], Fe]... for a description of the historical developments of the Gaussian theory leading to Theorem 7.1 and Corollary 7.2). As alluded to in Section 7.1, the deviation and concentration from or around the mean or median of supremum of Gaussian processes do not lead in general to estimates on the mean or median themselves. These have to be handled by other means such as entropy or majorizing measures, that may also be used to yield, for more explicit Gaussian processes, sharp deviation inequalities (cf. Le-T], Li], Tal11]...). The integrability result (7.6) is due to S. Kwapien. The Khintchine-Kahane inequalities (7.7) go back to Ka]. The early Gaussian study together with some crucial open problems on limit theorems for sums of independent random vectors prompted M. Talagrand to investigate from an abstract measure theoretic point of view concentration properties in product spaces. This led him, after a rst contribution on the discrete cube Tal1] (that covers Theorem 7.4 in this case), to the breakthrough Tal2] on which most of the further developments are based. These results allowed the monograph Le-T] that solved with these tools most of the open questions on limit theorems for Banach space random variables with the approach presented here in Section 7.2 (see also Le-Z]). Theorem 7.4 is taken from Le-T] and Tal5]. For statistical applications, cf. V-W]. The isoperimetric inequality of Tal2], and its application to bound on sums of independent random vectors, then received a number of improvements and simpli ed proofs ( Tal4], Tal7]) that nally led to the memoir Tal7] that crowns with de nitive results and arguments this deep investigation of isoperimetric and concentration inequalities for product measures. 209

To answer questions by L. Birge and P. Massart on concentration inequalities around the mean or median of supremum of empirical processes motivated by applications to selection of models in statistics B-M1], B-M2], B-B-M] (cf. Mas2], B-M3]), M. Talagrand undertook in Tal9] a further technical re nement of his methods to reach the rather de nitive Theorem 7.6. In Le4], a simpli ed approach to this result is presented with the tool of logarithmic Sobolev inequalities and the Herbst argument (entropic method) exposed in Section 5.2. The argument of Le4] has been carefully examined in Mas1] in order to reach numerical constants of statistical use. The optimal Theorem 7.5 is in particular taken from Mas1] (see also Mas2]). Recently, E. Rio Ri2] further improves the entropic method, by clever integration of the di erential inequality, to binomial (in the spirit of Hoe ding's inequalities Hoe]) rather than Poissonian bounds of empirical processes based on classes of indicator functions (see also A-V] for large deviation results). A parallel investigation of exponential integrability of sums of independent random vectors relying on hypercontractive methods is undertaken in Kw-S].

210

8. FURTHER ILLUSTRATIONS

In this chapter, we further illustrate with a few applications the concentration of measure phenomenon. The rst section is devoted to concentration for harmonic measures on the sphere proved by G. Schechtman and M. Schmuckenschlager S-S2]. In Section 8.2, we present recent work of C. McDiarmid MD3] on concentration for independent permutations extending prior contribution in Tal7]. We next describe applications due to M. Talagrand Tal7], mainly without proofs, of the convex hull approximation of Section 4.2 to various problems in discrete algorithmic mathematics, such as tail estimates on the longest increasing subsequence of a sample of independent uniformly distributed random variables on the unit interval, the traveling salesman problem and the minimum spanning tree, rst passage times in percolation and the assignment problem. In Section 8.4, we present recent developments by M. Talagrand Tal12] on tails of the free energy function in the Sherrington-Kirkpatrick spin glass model. Finally we brie y mention in the last part some concentration results for the spectral measure of large random matrices due to A. Guionnet and O. Zeitouni G-Zei].

8.1 Concentration of harmonic measures


Denote by n normalized Lebesgue measure on the unit sphere n in Rn+1 . For a point x in the open unit ball B of Rn+1 , denote S 211

n given by n the probability measure on S by x 2 n (y ) = 1 ; jxj d n (y ) dx jy ; xjn+1

(8:1)

j is Euclidean norm. If f is integrable on where we recall that j n , R fd n is harmonic in B with radial limits equal n -almost S x everywhere to f . n are the so-called harmonic measures on the The measures x sphere Sn. They have a neat probabilistic interpretation in terms of exit times of Brownian motion. Namely, if Px is the probability distribution of a standard Brownian motion (Bt)t 0 in Rn+1 starting from x, and if T is the rst time t for which Bt hits Sn, n. then the distribution of BT under Px is x The following theorem has been proved in S-S2] and extends concentration of the sphere to all harmonic measures uniformly over x. Theorem 8.1. For every jxj < 1,

where c > 0 is numerical. Proof. It is divided in several steps. Lemma 8.2. For all jxj < 1 and 0
Ex
;

n ) (r) (Sn d x

4 e;cnr

r>0

eT

(n+1)2 , 8 2 e2 (1;jxj )=(n+1):

Proof. The function u(y) = e; jyj2 , y 2 Rn+1, satis es the equa2 2 tion 1 2 u + cu = 0 where c = c(y ) = (n + 1) ; 2 jy j . It thus follows from It^ o's formula that

Mt = exp ; jBt

j2 +

is a martingale. In particular, for every integer k 0,


2 e jyj = E x (M0 ) = E x(MT ^k ):

c(Bs)ds

t 0

212

Now, for 0 s T and 0 < 2 2jBsj2 so that


Ex
;

n+1 , we have
4

(n + 1) 2
2

e (n+1)T=21fT kg e (1;jxj ): Replacing by 2 =(n + 1) yields the conclusion. For 0 < jxj 1, let

A(x) = x + 1 ; jxj2 1=2 z jzj = 1 z ? x : The next geometric lemma is left to the reader (see S-S2] for the details). Lemma 8.3. Let 0 < jxj 1. For any y 2 Sn, the Euclidean distance d(y A(x)) satis es x ij jhy ; x jx ; d y A(x) 2 (1 ; jxj2)1j=2 if hy ; x xi > 0 whereas when hy ; x xi < 0, jhy ; x jx ; xj ij : d y A(x) 2 (1 ; hy x i jxj 2 )1=2
;

Note that since


D

E y ; x jx xj ;

1 ; jxj

1 ; jxj2 1=2 (8:2)

we also have

D E 1=2 2 y ; x jx : xj For 0 < jxj 1 and ;1 < < +1, set n;1 (x) = A x : S jxj

d y A(x)

213

Proposition 8.4. Let 0 < jxj 1 and let F n : Sn ! R be a 1Lipschitz function which is constant on each S ;1(x), ;1 < < +1, n 2. Then for some constant aF 2 R, 2 n; 2 e;(n+1)r =32 x jF ; aF j r
for every r 0. ;1 Proof. Let aF be the constant value of F on Sn jxj (x) = A(x). We may and do assume that aF = 0. Note that by (8.2),

r2 o: y y ; x jx xj 2 To evaluate the measure of the last set, we again apply It^ o's formula. For 2 R, consider now

jF j r

y d y A(x)

x u(y) = e hy;x x i y 2 Rn+1: Then u = 2 u and thus D E 2 x Mt = exp Bt ; x jxj ; 2 t t 0 is a bounded martingale under Px. In particular E x(MT ) = M0 = 1 and thus ; hB ;x x i x Ex e T ; 2 x 2 = E x e hBT ;x x i; T e T 1=2 ; 2 ; 2 hBT ;x x i; (2 )2 T 1=2 2 x E x e2 T Ex e e2 2(1;jxj2)=(n+1)
j j j j j j j j

for

we may choose = 4 in Chebyshev's exponential inequality to get from the preceding that
n; x

n+1 where we used Lemma 8.2. 4 n , when r > 2(1 ; jxj2 )1=2 , Since the law of BT under Px is x n+1

jF j r
214

2 e;(n+1)r2=8 :

When r that
n; x

2(1 ; jxj2)1=2 we use the bounds of Lemma 8.3 to get

jF j r

for all 0 <

x r ;1 ; jxj2 1=2 o y y ; x jx j 2 2 ; jxj2 ) r(1 ; jxj2 )1=2 2 exp 2 (1 ; n+1 2 n+1 . Choosing 8 (1 ; jxj2 )1=2 = (n + 1)r, we get 4
n x
n

n; x

jF j r

2 e;(n+1)r =32 :
2

The proof of the proposition is complete. We may now complete the proof of Theorem 8.1. Assume 0 < jxj < 1. Let F be 1-Lipschitz on Sn, n 3. For ;1 < < +1, n denote by a( ) the expectation of F on Sn;1(x) with respect to x conditioned on Sn;1(x) (that can be viewed as normalized Haar measure n;1 on Sn \ x?). Now, x + ;1 ; 2 1=2z F (z) = F jx j is Lipschitz with Lipschitz constant (1 ; 2 )1=2 1 on Sn \ x?. Concentration on spheres (Theorem 2.3) thus yields
2 xi y F (y) ; a hy jx r 2 e;(n;2)r =8 : j n and by Proposition 8.4, for x i) is 1-Lipschitz on S Now, a(hy jx j some a 2 R and all r 0,

n x

;c(n+1)r2 =32 : y a hy jx i ; a r 2 e xj It easily follows from the triangle inequality that


n x

r 0 for some numerical c > 0. By Proposition 1.8, we may then conclude to the result of Theorem 8.1. It is not di cult to check that the theorem also holds for n = 1 2. The proof is complete.
2

n; x

jF ; aj r

4 e;c(n;2)r

215

Harmonic measures are related to images measure of the Haar measure on Sn under the so-called Moebius transformations. A modi cation of the proof of Theorem 8.1 then gives rise to an improvement of the deviation inequalities for Lipschitz functions by replacing the Lipschitz coe cient of a function F by the in mum over the Lipschitz constants of F composed with all Moebius transformations. If x is point in the Euclidean (open) unit ball B of Rn+1, denote by Px the orthogonal projection onto the subspace generated by x, and by Qx the orthogonal projection onto the subspace orthogonal to x. De ne then the Moebius transformation 'x by
2 )1=2 Qx (y ) x ; P ( y ) ; (1 ; j x j x 'x = : 1 ; hy xi

It is known (cf. Ru]) that 'x is an involutive di eomorphism of B onto itself, that 'x restricted to Sn is an involutive di eomorphism of Sn onto itself and that for every y in the closure of B,

jxj2)(1 ; jyj2) : 1 ; 'x(y) 2 = (1 ; (1 ; hy xi)2


Proposition 8.4 may be adapted, or by a more direct approach, to yield the same conclusion for the image measures of Sn under 'x. One may then conclude as in the proof of Theorem 8.1 to the following sharpening on the concentration phenomenon on n. S Theorem 8.5. Let F be a real continuous function on Sn, n 2, such that inf kF 'xkLip 1 where the in mum is running over all x's in the unit ball of Rn+1. Then, for every r 0,
n;

jF ; R Fd j r

4 e;cnr

where c > 0 is numerical.

216

8.2 Concentration for independent permutations


This section is taken from a recent work by C. McDiarmid MD3], that improves upon some early result in Tal7]. It aims to present a uniform version of the concentration property on the symmetric group n of Corollary 4.3 in the same way Theorem 4.6 extends concentration for the Hamming metric. We start with some notation. Recall the symmetric group n over f1 : : : ng . Let B1 B2 B` be a partition of f1 : : : ng Q` and set G = k=1 (Bk ) where (Bk ) is the symmetric group over Bk . We denote by P the product measure on G of uniform probability measures on each (Bk ). Thus P is uniform over G. We may consider the convex functional 'c Section 4.2 on A ( ) of n with thus 2 n , A non-empty subset n of . In particular, as we have seen there, 'c A ( ) = inf jy j
y2VA ( )

where for a permutation = ( (1) : : : (n)) in n and A a nonempty subset of n, VA ( ) denotes the convex hull of the vectors (1f (i)6= (i)g)1 i n , 2 A, in Rn. We recall also that jyj denotes ;P 2 1=2 of y = (y1 : : : yn ) 2 Rn . here the Euclidean norm n i=1 yi Theorem 8.6. Let G be a product group of permutations on f1 : : : ng and let P be uniform measure on G. Then, for every non-empty subset A G, Z 1 : A )2 =16 dP e('c P (A) Before turning to the proof of the theorem, we present some useful lemmas. The rst lemma shows that VA ( ) is stable under composition and inverse. If 2 n , VA( ) denotes the collection of all (v (i))1 i n, v 2 VA ( ). Lemma 8.7. For A n and 2 n , VA ( ) = VA( ) and VA 1 ( ;1 ) = VA( ) ;1 :
;

217

Proof. Simply note that if s = (si )1 i n is de ned by si = 1f (i)6= (i)g i = 1 : : : n

for some 2 A, then s = (s (i) )1 i n is de ned by s (i) = 1f (i)6= (i)g i = 1 : : : n where 2 A . The rst identity follows. The second is proved similarly. Besides 'c A ( ), de ne for i 2 f1 : : : ng,

'c jyj2 + yi2 A ( i) = y2inf V ( )


A

1=2

c Observe that 'c A ( i) = 'A ( ) if (i) = (i) for each 2 A, and in particular if (i) = i for each 2 A f g. As a consequence, the preceding lemma immediately yields the following. Lemma 8.8. For A n, 2 n , and i 2 f1 : : : ng, c c c 'c A ( ) = 'A ( ) 'A ( i) = 'A
;

;1 (i)

and

c 'c A ( i) = 'A

;1

;1 (i) :

The next lemma is crucial in the induction of the proof of Theorem 8.6. It is similar to the argument used for Theorem 4.6. n , 2 n , i 6= j in f1 : : : ng and Lemma 8.9. Let A 0 1. Set Ai = f 2 A (i) = (i)g and suppose that Ai and Aj are non-empty. Then
2 4(1 ; )2 + 'c ( j )2 + (1 ; )'c 2 'c A ( i) Ai Aj ij ( )

where ij is the permutation that exchanges i and j . Proof. Given i 6= j in f1 : : : ng, set further

'c jyj2 ; yi2 ; yj2 A ( i j ) = y2inf V ( )


A

1=2

218

2 2 2 Let 2 VA( ) satisfy i = 0 and 'c Ai ( j ) = j j + j and let 2 2 2 2 2 VA( ) satisfy 'c A ( i j ) = j j ; i ; j . Since VA ( ) is convex, the point v = + (1 ; ) is in VA ( ). Thus 2 jv j2 + v 2 : 'c A ( i) i Furthermore, by convexity of the square function, for every 1 k n, 2 2 2 vk k + (1 ; ) k : Also, 2vi2 = 2(1 ; )2 i2 2(1 ; )2 and vj2 2 2 j2 + 2(1 ; )2 j2 2 j2 + 2(1 ; )2 : Hence, 2 4(1 ; )2 + 2 'c A ( i) 2 + X ; 2 + (1 ; ) 2 k k j k6=i j ; 2 ; j j + j2 + (1 ; ) j j2 ; i2 ; j2 'c ( j )2 + (1 ; )'c ( i j )2 :

= 4(1 ; )2 + = 4(1 ; )2 + Ai Finally note that

c c 'c A ( i j ) 'A ij ( ) 'Ai ij ( ) from which the conclusion follows. The lemma is established. We turn to the proof of Theorem 8.6. We use induction on the order of G to prove a slightly stronger result, namely that for each i in f1 : : : ng, Z c 2 e('A( i)) =16 dP P (1A) : (8:3) The inequality is obvious is G is reduced to the identity permutation since then A = G. It su ces also to assume that (i) 6= i for some 2 G since for each j 2 f1 : : : ng for which this is not the case, c c 'c A ( j ) = 'A ( ) 'A ( i)

219

for every 2 G. We may thus assume, to establish (8.3), that (i) 6= i for some 2 G and that i = 1. Assume now that G is non-trivial and suppose (8.3) holds for any product group which is a proper subgroup of G. Let H = f 2 G (1) = 1g be the stabilizer of the element 1. We may assume without loss of generality that the orbit f (1) 2 Gg is f1 : : : mg for some m 2. Then H is a product group of permutations of n where the block f1 : : : mg has been split into the two blocks f1g and f2 : : : mg. Let H1 = H and for i = 1 : : : m, let Hi denote the coset H 1i = f 1i 2 H g. Thus G is partitioned into the m cosets Hi where Hi consists of the permutations with (i) = 1 and each Hi has size m;1Card(G). To ease the notation, set below, for a non-empty subset A n, 1 ; ( ) 2 and 1 ; ( i) 2 : A ( ) = 16 'c A ( i) = 16 'c A A Let now A G, and for i = 1 : : : m, let Ai = A \ Hi . Choose j such that P (Aj )=P (H ) is maximum, and keep j xed. For every i 2 f1 : : : mg, denote by Pi uniform probability measure on Hi . The main part of the proof is to establish that, for every i,
Z

A ( i)dPi

P (H ) 2 ; P (Ai ) : P (Aj ) P (Aj )


Z

(8:4)

Let us agree that


Z

11

is the identity element. By Lemma 8.8, = e


Ai 1i (
1i 1i

Ai ( j )dPi
Z

(j )) dPi (

) (8:5)

so that

Ai ( j ) dPi

by the induction hypothesis since Ai 1i P (Ai ). By Lemma 8.8 again,


Z

P (H ) P (Ai )

H and P (Ai 1i) =


) dP (

Aj ij dPi

Aj ij 1i (

1i

i )

220

and hence

Aj ij dPi

by the induction hypothesis since Aj ij 1i H and P (Aj ij 1i) = P (Aj ). Suppose rst that i 6= j . By Lemmas 8.8 and 8.9, for every 0 1, as in the proof of Theorem 4.6,
Z

P (H ) P (Aj )

(8:6)

A ( i))dPi (

2 e(1; ) =4 2 e(1; ) =4

Z Z

e e

Ai ( j )+(1; ) Aj ij ( ) dPi ( Ai ( j ) dPi (

)
1;

Aj ij dPi

P (H ) P (H ) 1; P (Ai ) P (Aj ) where we used Holder's inequality and (8.5), (8.6). Minimizing over using (4.7) yields the claim (8.4) in this case. The case i = j immediately follows from (8.5) We may now conclude the proof of the theorem on the basis of (8.4). Write Z m Z X 1 (1)) 1 ( A e dP ( ) = m e A( i)dPi ( ): i=1
e(1; )2=4
;

Hence, by (8.4),
m P (H ) X P (Ai ) 1 e ) m P (A ) 2 ; P (Aj ) j i=1 P (H ) 2 ; P (A)P (H ) =P (Aj ) P (Aj ) 1 P (A) since u(2 ; u) 1 for any real number u. Replace now A by c ;1 (1)) have the same A;1 . By Lemma 8.8, 'c A ( 1) and 'A 1 (
Z

A(

;1

(1))dP (

221

distribution under P from which the desired claim follows since P (A;1 ) = P (A). The proof of Theorem 8.6 is complete.

8.3 Subsequences, percolation, assignment


This section is devoted to application of Theorem 4.6 to various questions in discrete and combinatorial probability theory. These applications are due to M. Talagrand and are taken, mostly without proofs, from the memoir Tal7]. We examine rst subsequences. Consider points x1 : : : xn in 0 1]. Denote by Ln(x1 : : : xn ) the length of the longest increasing subsequence of x1 : : : xn, that is, the largest integer p such that we can nd i1 < < ip for which xi1 xip . It is simple to see that when X1 : : : Xn are independent uniformly distributed over 0 1], the random variable Ln(X1 : : : Xn) is distributed like the length of the longest increasing subsequence of a random permutation of f1 : : : ng. Concentration properties for Ln(X1 : : : Xn ) may be obtained as a simple consequence of the convex hull approximation studied in Section 4.2. Consider = 0 1]. For x = (x1 : : : xn ) 2 n, denote more simply Ln(x) = Ln(x1 : : : xn ). Recall the convex hull approximation functional 'c A from Section 4.2. Lemma 8.10. For all a 0 and x 2 n,

a Ln(x) ; 'c A (x) Ln (x) where A = fLn ag. In particular, u 'c A (x) pa + u whenever Ln(x) a + u. Proof. Write Ln = Ln(x). By de nition, we can nd a subset I of f1 : : : ng of cardinality Ln such that if i j 2 I , i < j , then xi < xj . It follows from the de nition of 'c A that there exists y 2 A such that p Card (J ) 'c A Ln
222

where J = fi 2 I yi 6= xi g. Thus (xi )i2I nJ is an incresing subsequence of y and since y 2 A, we have Card(I n J ) a which is the rst claim of the lemma. The second claim immediately folp lows from the fact that the function u 7! (u ; a)= u increases for u a. Lemma 8.10 is established. The following is the announced concentration properties. Theorem 8.11. Denote by m a median of Ln = Ln(X1 : : : Xn ). For every r 0,
; P

fLn m + rg fLn m ; rg

2 2 e;r =4(m+r)

and

; P

2 e;r =4m :
2

whenever Ln(x) Hence

m (recall A = fLn ag = fLn m ; rg). o n r 1: c pm P 'A 2 On the other hand, by Theorem 4.6 again, n r o 1 e;r2 =4m : p P 'c A m P(A) The required bound on P(A) = P(fLn m ; rg) follows. The theorem is proved.

Proof. The rst inequality is an immediate consequence of Theorem 4.6 combined with the preceding lemma. To establish the second inequality, we use Lemma 8.10 with a = m ; r, u = r, to see that r 'c A (x) pm

We now present without proofs some further applications of the convex hull approximation referring to Tal7] for details. We deal rst with the travelling salesman problem and the minimum spanning tree. Given a nite subset F = fx1 : : : xng in the unit square 0 1]2 in R2 , denote rst by L(F ) the length of the shortest tour 223

through F . We study the random variable Ln = L(fX1 : : : Xn g) where X1 : : : Xn are independently uniformly distributed over 0 1]2. Theorem 8.12. There is a numerical constant K > 0 such that for every r 0,
; P

jLn ; mj r

K e;r2 =K

where m is a median of Ln. The proof of Theorem 8.12 is only based on the following regularity property of L(F ). Denote by Ck , k 1, the family of the 22k dyadic squares with vertices (`12;k `22;k ), 0 `1 `2 2k . Consider F 0 1]2, C 2 Ck , G C , and assume that there is a point of F within distance 2;k+2 of C . Then

L(F ) L(F G) L(F ) + K 2;k Card(G)

(8:7)

for some numerical constant K > 0. The random miminum spanning tree satis es a similar concentration result. A spanning tree of a nite subset F R2 is a connected set that is a union of segments each of which joins two points of F . Its length is the sum of the lengths of these segments. We denote by L(F ) the length of the shortest (minimum) spanning tree of F . With respect to the travelling salesman functional, it may happen here that L(F ) is not monotone. We study again the random variable Ln = L(fX1 : : : Xn g) where X1 : : : Xn are independently uniformly distributed over 0 1]2. Theorem 8.13. There is a numerical constant K > 0 such that for every r 0,
; P

jLn ; mj r

K e;r2 =K

where m is a median of Ln. The regularity property of L used in this theorem is now the following. Consider F 0 1]2 and C 2 Ck , k 1. Assume that 224

each C 0 2 Ck;1 that is within distance 2;k+5 of C meets F . Then, for any G C ,

L(F G) ; L(F )

K 2;k Card(G)

(8:8)

for some numerical constant K > 0. Let us deal next with rst passage time in percolation theory. Let X = (V E ) be a graph with vertices V and edges E . Let, on some probability space ( A P), (Xe )e2E be a family of nonnegative independent and identically distributed random variables with the same distribution as X . Xe represents the passage time through the edge e. Let PT be a family of ( nite) subsets of E , and, for T 2 T , set XT = e2T Xe . If T is made of contiguous edges, XT represents the passage time through the path T . Set ZT = inf T 2T XT and D = supT 2T Card(T ), and let m be a median of ZT . As a corollary of his penalty theorems, M. Talagrand Tal7] proved the following result. Theorem 8.14. There exists a numerical constant K > 0 such that, if E (ecX ) 2, for every r 0,
; P

jZT ; mj r

1 min r r2 exp ; K D

When V is Z2 and E the edges connecting two adjacent points, and when T = Tn is the set of all selfavoiding paths connecting the origin to the point (0 n), H. Kesten Ke] showed that, when 1 (percolation), one 0 X 1 almost surely and P(X = 0) < 2 may reduce, in ZT , to paths with length less than some multiple of n. Together with this result, Theorem 8.14 indicates that
; P ZTn

; mj r

5 exp ; r Cn

for every r n=C where C > 0 is a constant independent of n. This result strengthens the previous estimate by H. Kesten Ke] which was of the order of r=C pn in the exponent and the proof of which was based on martingale inequalities. 225

We close this section with the assignment problem. One basic problem in this area is to study the in mum

An = inf

n X i=1

ci

(i)

where the in mum is over all permutations of f1 : : : ng and the ci j 's are n2 independent random variables with the uniform distribution on 0 1]. It is a remarkable fact that E (An) is bounded as n ! 1, and actually E (An) 2 (see Ste] and the references therein). M. Talagrand proved in Tal7] as an application of the convex hull approach of Section 4.2 that the standard deviation of An is of the order of O(n;1=2 ) up to logarithmic factors. Theorem 8.15. Denote by m a median of An. There a numerplogis ical constant K > 0 such that for n 3 and r n,
P
n

while for r

plog n,
P
n

2o Kr (log n ) jAn ; mj pn log log n

2 e;r

3 log n o Kr jAn ; mj (log r)2 pn

2 e;r :
2

We refer to Tal7] for the proofs, and to Ste], MD2] for further illustrations in geometric and combinatorial probabilities.

8.4 The spin glass free energy


This application concerns the limiting behavior of the spin glass free energy function at high temperature. Consider a sequence ("i )i2N of independent symmetric random variables taking values 1. Each "i represents the spin of particule i. Consider then interactions gij , i < j , between spins. 226

For some parameter > 0 (that plays the role of the inverse of the temperature), the so-called partition function is de ned by

ZN = ZN ( ) = E " exp p N1

where E " is integration with respect to the "0i s. In the model we study, the interactions gij are random and the gij 's will be assumed independent with common standard Gaussian distribution. In particular, we may then also think of ZN as function

i<j N

"i "j gij

N 2

on Rn, n = (N (N ; 1)=2, and equip Rn with the canonical Gaussian measure = n. Set FN = log ZN which de nes the spin glass free energy. Theorem 8.16. For every 0 < < 1, 1F = 2 lim N !1 N N 4 almost surely. The proof is based on concentration of Gaussian distribution together with the so-called second moment method, or PaleyZygmund inequality. p Proof. It is easily seen that kFN kLip (N ; 1)=2 so that, by (2.32) for example, for every r 0,

ZN (x) = E " exp p N1

i<j N

"i "j xij

x = (xij )1

i<j N

FN ; E (FN ) By Jensen's inequality,


E (FN ) = E (log ZN )

; P

2 e;r2 = 2 (N ;1):
2 (N

(8:9)

(8:10) 4 Hence, by (8.10), we have to prove that E (FN ) is actually of this order and use to this task the following lemma that puts forward the so-called second moment method. 227

log E (ZN ) =

; 1) :

Lemma 8.17. For every < 1,


2) E (ZN

K ( ) E (ZN ) 2:
2

with

K( ) = p 1 1;

Proof. By Fubini's theorem, denoting by ("0i )i2N a sequence of independent symmetric Bernoulli random variables, independent from both the sequences ("i )i2N and (gij )i j2N ,
2) E (ZN

= E E "E " exp p ("i "j + "0i "0j )gij N i<j


0

= E " E " exp 2N ("i "j + "0i"0j )2 i<j


0

2 X

Now,

i<j

"i"j + "0i "0j )2 = N (N ; 1) + 2


= N (N ; 2) +

i<j i

("i "j "0i"0j

"i "0i

= N (N ; 2) + (" "0 )2
0 0 where we denote by " "0 theP scalar product N i =1 "i "i . Now " " has the same distribution as N i=1 "i . It follows therefore from the preceding that,
P

exp 2N "i i=1 If g is a standard normal variable,


N

E (Z 2 )

(N ;1)=2 E "

N X

exp

2N i=1

N X

"i

" g = E exp p N i=1 i


228

N X

so that, by Fubini's theorem again,


E"

exp 2N "i i=1

N X

= E coshN p g N 2g2 E exp 2 =p 1 2 1;

where we used that cosh u e;u2 =2. Together with (8.10), the proof of the lemma is complete. We turn back to the proof of the theorem. We make use of the Paley-Zygmund inequality ; E (ZN )2 1 E (ZN ) P ZN (8:11) 2 2): 4E (ZN For a proof, simply note that, for every 0 < < 1, ; E (ZN ) E ZN 1fZN E (ZN )g + E (ZN ) so that, by the Cauchy-Schwarz inequality, (1 ; )E (ZN )
2 )P; E (ZN

ZN

E (ZN )

1=2

1. Choose then = 2 It thus follows from (8.11) and Lemma 8.17 that ; 1 : 1 E (ZN ) P ZN 2 4K ( ) Now, assume rst that r = log( 1 2 E (ZN )) ; E (log ZN ) > 0. Then, by (8.9) applied to this r, ;1 1 P log ZN log 2 E (ZN ) 4K ( ) ; P FN E (FN ) + r 2 2 2 e;r = (N ;1)

229

so that Hence, in any case,


2 (N ; 1)

r
4
E (FN )

N ; 1 log 8K ( ):

almost surely. In particular, for any > 0, 1F 0 lim sup N N


N !1

p p N ; 1 log 8K ( ) log 1 E (ZN ) ; 2 2 (N ; 1) p p ; log 2 ; N ; 1 log 8K ( ): 4 Together with (8.9), Theorem 8.16 immediately follows from these estimates. Note that, by (8.9) and the Borel-Cantelli lemma, for any > 0, 1 F ; 1 E (F ) = 0 lim N N !1 N N N

almost surely. We have seen with Theorem 8.16, that the upper2 bound 4 is actually accurate for 0 < < 1. It may be shown that this is still the case for = 1, but the low temperature regime 1 is of a much higher di culty. Only few rigourous results are known, and challenging replicas conjectures of the physicists have still to be proved Tal12], Tal13]. Even in the high temperature regime 0 < < 1 for which Theorem 8.16 is available, a number of more precise bounds may be analyzed. For example, the proof of Theorem 8.16 displays the inequality
P
n

2 p FN ; (N4 ; 1) r N ; 1 o p p + N ; 1 log 8K ( ) + log 2

2 2 e;r

(8:12)

230

for r 0. The following re nes upon (8.12). Theorem 8.18. For each < 1, there is a constant K 0 ( ) > 0 such that for each N and each r 0,
P
n

FN

2 (N

; 1) ; ro

K 0 ( ) e;r2 =K ( ):
0

Note that by Chebyschev's inequality and Lemma 8.17, for s > 0, ; K( ) : P ZN s E (ZN ) s2 Hence, since log E (ZN ) = 2(N ; 1)=4, for every r 0,
P
n

(8:13) 4 Proof of Theorem 8.18. It is an important observation to note that FN = log ZN is convex (as a function on Rn, n = (N (N ; 1)=2). Furthermore, ; E jrZN j2 K1( )E (ZN )2 : (8:14) Indeed, using the same notation as in the proof of Lemma 8.17,

FN

2 (N

; 1) + ro

K ( ) e;2r :

jrZN j2
2
0

("i "j "0i "0j ) exp p = N E "E " ("i "j + "0i "0j )gij N i<j i<j 2N
2
E " E "0

(" "0 )2 exp p

i<j

("i "j + "0i "0j )gij

Now u eu for every u. Hence, arguing exactly as in the proof of Lemma 8.17,
E
;

jrZN j2

2 2

2 e (N ;1)=2E " exp 2 e (N ;1)=2 p

2+

1;

2N

N X i=1

"i

2;

231

provided that 2 + < 1. It then su ces to appropriately choose > 0 depending on so that (8.14) holds. For 0 < < 1 and c > 0,
; P

ZN

E (ZN ) ; P ZN ; P ZN

jrZN j cZN E (ZN ) jrZN j c E (ZN ) ; E (ZN ) ; P jrZN j > c E (ZN ) (1 ; )2 ; K1 ( ) K( ) c2 2

where we used the Paley-Zygmung inequality (8.11) and Lemma 8.17, as well as Chebyshev's inequality and (8.13) in the last step. 1 and c2 = 32K ( )K1 ( ) so that Choose now = 2
P
n

ZN

1 E (Z ) jrZ j2 32K ( )K ( )Z N 1 N 2 N

1 : 8K ( ) 1 : 8K ( )

In terms of FN = log ZN ,
; P

FN log E (ZN );log 2 jrFN j2 32K ( )K1 ( )

We then make use of the deviation inequality of Proposition 1.6 for the convex function FN with respect to the Gaussian measure on Rn to get that for every r 0,
P
n

FN
p

2 (N

; 1) ; log 2
s o

; 32K ( )K1 ( ) r + 2 log 8K1( )

1 ; e; r 2 =2 :

The conclusion of Theorem 8.18 immediately follows.

232

8.5 Concentration of random matrices


n) Let M n = (Mij 1 i j n be an n n real symmetric matrix with eigenvalues 1 : : : n . Let f : R ! R and let n 1X Fn = Fn(M n ) = n f( i) (8:14) i=1 n, 1 i j considered as a function on Rn of the real entries Mij n. n of M n are We are interested here in the case the entries Mij random variables (on some probability space ( A P)), and we study the concentration properties of Fn. The motivation for such investigation comes from Wigner's theorem (cf. e.g. Hi-P]) assertn , 1 i j n are ing that whenever the random variables Mij n ) = 0, E ((M n )2 ) = independent with nite moments and E (Mij ij 1, 1 1 Pn i < j n , then the empirical distribution n n i=1 i converges almost surely towards the semicircular law with denp 2 ; 1 sity (2 ) 4 ; x on ;2 +2]. The concentration properties presented here are milder results that however holds for any xed n. The crucial simple observation is that Fn is Lipschitz and/or convex on Rn(n+1)=2 as soon as f is Lipschitz and/or convex on R. This is a classical result the proof of which is outlined below. Proposition 8.19. If f : R 2! R is Lipschitz with Lipschitz coe cient kf kLip , then Fn : Rn ! R is Lipschitz with respect to 2 the Euclidean metric on Rn with Lipschitz coe cient kFnkLip kf kLip . If f is convex on R, then Fn is convex on Rn2 . Proof. As it is easily checked on polynomials, the gradient of Fn as a function on Rn(n+1)=2 is identi ed to the symmetric matrix diagonalized by (f 0 ( 1 ) : : : f 0 ( n)). The rst claim immediately follows. Alternatively, it is a classical result going back to H. Weyl (cf. e.g. H-J]) that whenever A and B are symmetric n n matrices with respective eigenvalues i (A), i (B), 1 i n, then the eigenvalues i (A + B), 1 i n, of A + B satisfy (B) i = 1 : : : n: i (A + B ) i (A) + 1max j n j
2

233

Assume by homogeneity that kf kLip 1. It follows that


n ; X 1 Fn(A + B) = n f i(A + B) i=1 Fn(A) + 1max (B) j n j

Fn(A) +

1 ij n

b2 ij

1=2

The rst claim is established. Convexity of Fn was rst established in Davi]. P We may also give a simple argument. Convex f is the same as ` `jx ; t`j so we can assume f (x) = jx ; tj. Subtracting t Id does not change diagonalization so we can assume f (x) = jxj. Hence Fn is the trace class norm and the result follows. The following is then a direct consequence of Proposition 8.19 and the deviation inequalities for Lipschitz functions. Denote by n 's on R(n(n+1)=2 and by P its conP the distribution of the Mij centration function. Corollary 8.20. For any Lipschitz function f : R ! R, and any r > 0, ; P jFn ; mj r 2 P (r ) where m is a median of Fn for P . Under appropriate integrability properties, the same inequality holds with the mean replaced by the median by the results of Section 1.2. n 's Assume for example, as in Wigner's theorem that the Mij 1. are independent centered real Gaussian variables with variance n We then get from Corollary 8.20 that for any r 0,
; P

jFn ; mj r
234

2 2 2 e;n r =2

where m is either the mean or a median of Fn.

n 's are indepenWe may also consider the case where the Mij dent random variables, bounded, in accordance with Wigner's theorem, by p1n . In this case, Proposition 8.19 together with Corollary 4.9 shows that for every r 0,
; P

jFn ; mj r

2 2 4 e;n r =16 :

Complex entries are treated similarly. The structure of the arguments is su ciently general to cover in the same way symn where the aij metric inhomogeneous matrices with entries aij Mij are xed non-random coe cients. This provides concentration for a number of families of random matrices includine diluted, band or Wishart matrices. We refer to G-Zei] for details.

Notes and Remarks


The topics selected in this chapter only re ect a few of the applications of the concentration of measure phenomenon. Geometric and topological applications were already described in Chapter 3, together with the historical application to Euclidean sections of convex bodies. Application to sums of independent random vectors and supremum of empirical processes is the subject of Chapter 7. Concentration for harmonic measures as described in the rst section is taken from the work S-S2] by G. Schechtman and M. Schmuckenschlager to which the reader is referred to for further details and alternate more analytic proofs, in particular of Theorem 8.5. Concentration for independent permutations started with the note Mau1] by B. Maurey based on the martingale method. The question of the convex envelop is addressed in Tal7]. The extension and result presented in Section 8.2 is taken from the recent contribution MD3] by C. McDiarmid motivated by randomized methods for graph colouring. The memoir Tal7] by M. Talagrand contains a number of applications to discrete and geometric probabilities from which the few topics presented in Section 8.3 are taken. Complete exposi235

tions in the framework of probability theory for algorithmic discrete mathematics and combinatorial optimization are the notes MD2] by C. McDiarmid and Ste] by M. Steele. That concentration methods do not provide tight results for the longest increasing subsequence problem is observed in Deu-Z]. Far reaching uctuation results on the distribution of the longest increasing subsequence have been obtained by combinatorial and analytic methods in B-D-J]. Application of concentration for Gaussian measures to the spin glass free energy is due to M. Talagrand from which the results of Section 8.4 are taken Tal7], Tal12], Tal13]. Theorem 8.16 goes back to A-L-R] with a proof based on moment expansion (see also C-N] for a stochastic calculus proof). Concentration of the spectral measure of random matrices was investigated recently by A. Guionnet and O. Zeitouni in GZei] from which the results of Section 8.5 are taken.

236

REFERENCES

A-M-S] A-S] A-L-R] Ale] Alo] Al-M] Am-M] An] A-B-V] A-V] A-V] Az] B-D-J]

ties and exponential integrability. J. Funct. Anal. 126, 83{101 (1994). S. Aida, D. Stroock. Moment estimates derived from Poincare and logarithmic Sobolev inequalities. Math. Res. Lett. 1, 75{86 (1994). M. Aizenman, J. L. Lebowitz, D. Ruelle. Some rigorous results on the Sherrington-Kirkpatrick spin glass model. Comm. Math. Phys. 112, 3{20 (1987). S. Alesker. Localization technique on the sphere and the GromovMilman theorem on the concentration phenomenon on uniformly convex sphere. Convex Geometric Analysis (Berkeley 1996). Math. Sci. Res. Inst. Publ. 34. Cambridge Univ. Press (1999). N. Alon. Eigenvalues and expanders. Combinatorica 6, 83{96 (1986). N. Alon, V. Milman. 1 , isoperimetric inequalities for graphs ans superconcentrators. J. Combin. Theory, Ser. B, 38, 78{88 (1985). D. Amir, V. Milman. Unconditional and symmetric sets in n-dimensional normed spaces. Israel J. Math. 37, 3{20 (1980). C. Ane et al. Sur les inegalite de Sobolev logarithmiques (2000). Panoramas et Syntheses. Soc. Math. de France, to appear. J. Arias-de-Reyna, K. Ball, R. Villa. Concentration of the distance in nite dimensional normed spaces. Mathematica 45, 245{252 (1998). J. Arias-de-Reyna, R. Villa. The uniform concentration of measure phenomenon in `n p , (1 p 2). Geometric Aspects of Functional Analysis, Israel Seminar 1996-2000. Lecture Notes in Math. 1745, 13{18 (2000). Springer. R. Azencott, N. Vayatis. Large deviations bounds for empirical processes (1999). K. Azuma. Weighted sums of certain dependent random variables. Tohoku Math. J. 19, 357{367 (1967). J. Baik, P. A. Deift, K. Johansson. On the distribution of the longest increasing subsequence in a random permutation. J. Amer.

S. Aida, T. Masuda, I. Shigekawa. Logarithmic Sobolev inequali-

237

Bak1] Bak2] B-E] Ba-L] B-B-M] Bar] Benn] Beny] B-M1] B-M2] B-M3] Bl] Bob1] Bob2] Bob3] Bob4] Bob5]

Math. Soc. 12, 1119{1178 (1999). D. Bakry. L'hypercontractivite et son utilisation en theorie des semigroupes. Ecole d'Ete de Probabilites de St-Flour. Lecture Notes in Math. 1581, 1{114 (1994). Springer. D. Bakry. On Sobolev and logarithmic Sobolev inequalities for Markov semigroups. New trends in Stochastic Analysis. 43{75 (1997). World Scienti c. D. Bakry, M. Emery. Di usions hypercontractives. Seminaire de Probabilites XIX. Lecture Notes in Math. 1123, 177{206 (1985). Springer. D. Bakry, M. Ledoux. Levy-Gromov's isoperimetric inequality for an in nite dimensional di usion generator. Invent. math. 123, 259{ 281 (1996). A. Barron, L. Birge, P. Massart. Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113, 301{413 (1999). F. Barthe. Levels of concentration between exponential and Gaussian (2000). G. Bennett. Probability inequalities for sums of independent random variables. J. Amer. Statist. Assoc. 57, 33{45 (1962). Y. Benyamini. Two point symmetrization, the isoperimetric inequality on the sphere and some applications. Longhorn Notes, Functional Analysis Seminar 1983-84, 53{76, University of Texas. L. Birge, P. Massart. From model selection to adaptive estimation. Festschrift for Lucien LeCam: Research papers in Probability and Statistics (D. Pollard, E. Torgersen and G. Yang, eds.) 55{87 (1997). Springer. L. Birge, P. Massart. Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4, 329{375 (1998). L. Birge, P. Massart. Book to appear. G. Blower. The Gaussian isoperimetric inequality and transportation (1999). S. Bobkov. On Gross' and Talagrand's inequalities on the discrete cube. Vestnik of Syktyvkar University, Ser. 1, 1, 12{19 (1995) (in Russian). S. Bobkov. Isoperimetric inequalities for distributions of exponential type. Ann. Probab. 22, 978{994 (1994). S. Bobkov. A functional form of the isoperimetric inequality for the Gaussian measure. J. Funct. Anal. 135, 39{49 (1996). S. Bobkov. An isoperimetric inequality on the discrete cube and an elementary proof of the isoperimetric inequality in Gauss space. Ann. Probab. 25, 206{214 (1997). S. Bobkov. Isoperimetric and analytic inequalities for log-concave probability measures. Ann. Probab., 27, 1903{1921 (1999).

238

Bob6] B-G-L] B-G] B-G-H] B-H-T] Bo-L1] Bo-L2] Bo-L3] Bog] Bor1] Bor2] Bou] B-L-M] Bre] Bro] B-Z] Ca] Cha1] Cha2] Chee]

Jacobi equations (2000). J. Maths. Pures et Appl., to appear. S. Bobkov, F. Gotze. Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. J. Funct. Anal. 163, 1{28 (1999). S. Bobkov, F. Gotze, C. Houdre. On Gaussian and Bernoulli covariance representations (1999). Bernoulli, to appear. S. Bobkov, C. Houdre, P. Tetali. , vertex isoperimetry and concentration. Combinatorica 20, 153{172 (2000). S. Bobkov, M. Ledoux. Poincare's inequalities and Talagrand's concentration phenomenonfor the exponential measure. Probab. Theory Relat. Fields 107, 383{400 (1997). S. Bobkov, M. Ledoux. On modi ed logarithmic Sobolev inequalities for Bernoulli and Poisson measures. J. Funct. Anal. 156, 347{365 (1998). S. Bobkov, M. Ledoux. From Brunn-Minkowki to Brascamp-Lieb and to logarithmic Sobolev inequalities. Geom. funct. anal. 10, 1028{ 1052 (2000). V. I. Bogachev. Gaussian measures. Math. Surveys and Monographs 62. Amer. Math. Soc. (1998). C. Borell. Convex measures on locally convex spaces. Ark. Math. 12, 239{252 (1974). C. Borell. The Brunn-Minkowski inequality in Gauss space. Invent. math. 30, 207{216 (1975). J. Bourgain. On the distribution of polynomials on high dimensional convex sets. Geometric Aspects of Functional Analysis (Israel Seminar, 1989{90). Lecture Notes in Math. 1469, 127{137 (1991). Springer. J. Bourgain, J. Lindenstrauss, V. D. Milman. Approximation of zonoids by zonotopes. Acta Math. 162, 73{141 (1989). Y. Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure Appl. Math. 44, 375{417 (1991). R. Brooks. On the spectrum of non-compact manifolds with nite volume. Math. Z. 187, 425{437 (1984). Y. D. Burago, V. A. Zalgaller. Geometric inequalities. Springer (1988). First Edition (russian): Nauka (1980). O. Catoni. Free energy estimates and deviation inequalities (2000). I. Chavel. Eigenvalues in Riemannian geometry. Academic Press (1984). I. Chavel. Riemannian geometry - A modern introduction. Cambridge Univ. Press (1993). J. Cheeger. A lower bound for the smallest eigenvalue of the Laplacian. Problems in Analysis. Princeton, 195{199 (1970).
1

S. Bobkov. On deviations from medians (1999). S. Bobkov, I. Gentil, M. Ledoux. Hypercontractivity of Hamilton-

239

C-E] Chen] C-N] CE] CE-MC-S] DG] Davie1] Davie2] Da-S] Davi] De] Dem-Z1] Dem-Z2] De-S] Deu-Z] D-SC] Du] Du-S] Dv] D-R]

J. Cheeger, D. Ebin. Comparison theorems in Riemannian geom-

etry. North-Holland (1975). S.-Y. Cheng. Eigenvalue comparison theorems and its geometric applications. Math. Z. 143, 289{297 (1975). F. Comets, J. Neveu. The Sherrington-Kirkpatrick model of spin glas- ses and stochastic calculus: the high temperature case. Comm. Math. Phys. 166, 549{564 (1995). D. Cordero-Erausquin. Some applications of mass transport to Gaussian type inequalities (2000).
D. Cordero-Erausquin, R. McCann, M. Schmuckenschlager.

A Riemannian interpolation inequality a la Borell, Brascamp and Lieb (2000). S. Das Gupta. Brunn-Minkowski inequality and its aftermath. J. Multivariate Anal. 10, 296{318 (1980). E. B. Davies. One-parameter semigroups. Academic Press (1980). E. B. Davies. Heat kernel and spectral theory. Cambridge Univ. Press (1989). E. B. Davies, B. Simon. Ultracontractivity and the heat kernel for Schrodinger operators and Dirichlet Laplacians. J. Funct. Anal. 59, 335{395 (1984). C. Davis. All convex invariant functions of hermitian matrices. Archiv der Math. 8, 276{278 (1957). A. Dembo. Information inequalities and concentration of measure. Ann. Probability 25, 927{939 (1997). A. Dembo, O. Zeitouni. Transportation approach to some concentration inequalities in product spaces. Elect. Comm. in Probab. 1, 83{90 (1996). A. Dembo, O. Zeitouni. Large deviations techniques and applications. Second Edition: Springer (1998). J.-D. Deuschel, D. Stroock. Large deviations. Academic Press (1989). J.-D. Deuschel, O. Zeitouni. On increasing subsequences of i.i.d. samples. Combinatorics,Probabiity and Computing 8, 247{263 (1999). P. Diaconis, L. Saloff-Coste. Logarithmic Sobolev inequalities for nite Markov chains. Ann. Appl. Prob. 6, 695{750 (1996). R. M. Dudley. Real analysis and probability. Chapman & Hall (1989). N. Dunford, J. Schwartz. Linear Operators. Part II: Spectral Theory. Interscience (1963). A. Dvoretzky. Some results on convex bodies and Banach spaces. Proc. Symp. on Linear Spaces, Jerusalem, 123{160 (1961). A. Dvoretzky, C. A. Rogers. Absolute and unconditional convergence in normed linear spaces. Proc. Nat. Acad. Sci. U.S.A. 36, 192{197 (1950).

240

Eh] Ev] Fe] F-L-M] F-F] Fr] F-O-T] G-H-L] Gi-M1] Gi-M2] Grom1] Grom2] Grom3] Gr-M1] Gr-M2] Gros1] Gros2] G-R] Gu] G-Zeg]

53, 281{301 (1983). L. C. Evans. Partial di erential equations. Graduate Studies in Math. 19. Amer. Math. Soc. (1997). X. Fernique. Fonctions aleatoires gaussiennes, vecteurs aleatoires gaussiens. Les publications CRM, Montreal (1997). T. Figiel, J. Lindenstrauss, V. D. Milman. The dimensions of almost spherical sections of convex bodies. Acta Math. 139, 52{94 (1977). P. Frankel, Z. Furedi. A short proof for a theorem of Harper about Hamming spheres. Discrete Math. 34, 311-313 (1981). A. Friedman. Partial di erential equations. Holt, Rinehart, Winston (1969). M. Fukushima, Y. Oshima, M. Takeda. Dirichlet forms and symmetric Markov processes. De Gruyter (1994). S. Gallot, D. Hulin, J. Lafontaine. Riemannian Geometry. Second Edition. Springer (1990). A. Giannopoulos, V. Milman. Euclidean structure in nite dimensional normed spaces. The Handbook in the Geometry of Banach Spaces. Elsevier (2001). A. Giannopoulos, V. Milman. Concentration property on probability spaces. Advances in Math. 156 (2000). M. Gromov. Paul Levy's isoperimetric inequality. Preprint I.H.E.S. (1980). M. Gromov. Metric structures for Riemannian and non-Riemannian spaces. Birkhauser (1998). M. Gromov. Geom. funct. anal. Special Volume, ?{? (2000)|. M. Gromov, V. D. Milman. A topological application of the isoperimetric inequality. Amer. J. Math. 105, 843{854 (1983). M. Gromov, V. D. Milman. Generalization of the spherical isoperimetric inequality to uniformly convex Banach spaces. Compositio Math. 62, 263{282 (1987). L. Gross. Logarithmic Sobolev inequalities. Amer. J. Math. 97, 1061{1083 (1975). L. Gross. Logarithmic Sobolev inequalities and contractive properties of semigroups. Dirichlet Forms, Varenna 1992. Lect. Notes in Math. 1563, 54{88 (1993). Springer. L. Gross, O. Rothaus. Herbst inequalities for supercontractive semigroups. J. Math. Kyoto Univ. 38, 295{318 (1998). O. Guedon. Kahane-Khinchine type inequalities for negative exponent. Mathematica 46, 165{173 (1999). A. Guionnet, B. Zegarlinski. Lectures on logarithmic Sobolev inequalities (2000). Seminaire de Probabilites. Lecture Notes in Math., to appear. Springer.

A. Ehrhard. Symetrisation dans l'espace de Gauss. Math. Scand.

241

G-Zei] Hi-P] Ha] Hoe] Ho-J] Hou] Ho-P] H-T] H-PA-S] Hs] I-S-T] I-W] J-L] J-S1] J-S2] J-S3] Ka] K-L-S] Ke]

for large matrices. Elect. Comm. Probab. 5, 119-136 (2000). F. Hiai, D. Petz. The semicircle law, free random variables and entropy. Amer. Math. Soc. (2000). L. H. Harper. Optimal numbering and isoperimetric problems on graphs. J. Comb. Th. 1, 385{393 (1966). W. J. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58, 713{721 (1963). R. A. Horn, C. Johnson. Matrix analysis. Cambridge Univ. Press (1985). C. Houdre. Remarks on deviation inequalities for functions of innitely divisible random vectors (2000). C. Houdre, V. Paulauskas.. C. Houdre, P. Tetali. Concentration of measure for products of Markov kernels via functional inequalities. Preprint (1997). C. Houdre, V. Perez-Abreu, V. Surgailis. Interpolation, correlation identities and inequalities for in nitively divisible processes. J. Fourier Anal. Appl. 4, 651{668 (1998). E. P. Hsu. Analysis on path and loop spaces. Probability theory and applications (Princeton, NJ, 1996), 277{347, IAS/Park City Math. Ser. 6. Amer. Math. Soc. (1999). I. A. Ibragimov, V. N. Sudakov, B. S. Tsirel'son. Norms of Gaussian sample functions. Proceedings of the third Japan-USSR Symposium on Probability Theory. Lecture Notes in Math. 550, 20{ 41 (1976). Springer. N. Ikeda, S. Watanabe. Stochastic di erential equations and diffusion processes. North-Holland (1989). W. B. Johnson, J. Lindenstrauss, Editors. The Handbook in the Geometry of Banach Spaces. Elsevier (2001). W. B. Johnson, G. Schechtman. Remarks on Talagrand's deviation inequality for Rademacher functions. Functional Analysis Seminar 1987-89, University of Texas. Lecture Notes in Math. 1470, 72{77 (1991). Springer. W. B. Johnson, G. Schechtman. Embedding `m p into `n 1 . Acta Math. 149, 71{85 (1982). W. B. Johnson, G. Schechtman. Finite dimensional subspaces of Lp . The Handbook in the Geometry of Banach Spaces. Elsevier (2001). J.-P. Kahane. Some random series of functions. Heath Math. Monographs (1968). Second Edition: Cambridge Univ. Press (1985). R. Kannan, L. Lovasz, M. Simonovits. Isoperimetric problems for convex bodies and a localization lemma. Discrete Comput. Geom. 13,541{559 (1995). H. Kesten. On the speed of convergence in rst-passage percolation. Ann. Appl. Probab. 3, 296{338 (1993).

A. Guionnet, O. Zeitouni. Concentration of the spectral measure

242

Ko-S] Kw] Kw-S] K-L-O] L-O] Le1] Le2] Le3] Le4] Le5] Le6] Le-T] Le-Z] Le] Li-T] L-S] L-Z]

percontractive semigroups. Proc. Amer. Math. Soc. 94, 87{90 (1985). S. Kwapien. A theorem on the Rademacher series with vector valued coe cients. Probability in Banach Spaces, Oberwolfach 1975. Lecture Notes in Math. 526, 157{158 (1976). Springer. S. Kwapien, J. Szulga. Hypercontraction methods in moment inequalities for series of independent random variables in normed spaces. Ann. Probab. 19, 369{379 (1991). S. Kwapien, R. Latala, K. Oleszkiewicz. Comparison of moments of sums of independent random variables and di erential inequalities. J. Funct. Anal. 136, 258{268 (1996). R. Latala, K. Oleskiewicz. Between Sobolev and Poincare. Geometric Aspects of Functional Analysis, Israel Seminar 1996-2000. Lecture Notes in Math. 1745, 147{168 (2000). Springer. M. Ledoux. A heat semigroup approach to concentration on the sphere and on a compact Riemannian manifold. Geom. funct. anal. 2, 221{224 (1992). M. Ledoux. Remarks on logarithmic Sobolev constants, exponential integrability and bounds on the diameter. J. Math. Kyoto Univ. 35, 211{220 (1995). M. Ledoux. Isoperimetry and Gaussian Analysis. Ecole d'Ete de Probabilites de St-Flour 1994. Lecture Notes in Math. 1648, 165{294 (1996). Springer. M. Ledoux. On Talagrand's deviation inequalities for product measures. ESAIM Prob. & Stat. 1, 63{87 (1996). M. Ledoux. Concentration of measure and logarithmic Sobolev inequalities. Seminaire de Probabilites XXXIII. Lecture Notes in Math. 1709, 120{216 (1999). Springer. M. Ledoux. The geometry of Markov di usion generators. Ann. Fac. Sci. Toulouse IX, 305{366 (2000). M. Ledoux, M. Talagrand. Probability in Banach spaces (Isoperimetry and processes). Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer (1991). M. Ledoux, J. Zinn. Probabilistic limit theorem in the setting of Banach spaces. The Handbook in the Geometry of Banach Spaces. Elsevier (2001). P. Levy. Problemes concrets d'analyse fonctionnelle. Gauthier-Villars (1951). J. Lindenstrauss, L. Tzafriri. Classical Banach spaces II. Springer (1979). L. Lovasz, M. Simonovits. Random walks in a convex body and an improved volume algorithm. Random Structures and Algorithms 4, 369{412 (1993). T. Lyons, W. Zheng. A crossing estimate for the canonical process on a Dirichlet space and tightness result. Colloque Paul Levy,

A. Korzeniowski, D. Stroock. An example in the theory of hy-

243

Ly] MC] MD1] MD2] MD3] MK] Mar1] Mar2] Mar3] Mar4] Mas1] Mas2] Mau1] Mau2] M-P] Mi1] Mi2] Mi3] Mi4]

Asterisque 157-158, 249{272 (1988). T. Lyons. Random thoughts on reversible potential theory. Summer School in Potentiel Theory, Joensuu 1990. Publications in Sciences 26, 71{114. University of Joensuu. R. J. McCann. Existence and uniqueness of monotone measurepreserving maps. Duke Math. J. 80, 309{323 (1995). C. McDiarmid. On the method of bounded di erences. Surveys in Combinatorics. London Math. Soc. Lecture Notes 141, 148{188 (1989). Cambridge Univ. Press. C. McDiarmid. Concentration. Probabilistic methods for algorithmic discrete mathematics, 195{248. Springer (1998). C. McDiarmid. Concentration for independent permutations (2000). H. P. McKean. Geometry of di erential space. Ann. Probability 1, 197{206 (1973). K. Marton. Bounding d-distance by information divergence: a method to prove measure concentration. Ann. Probab. 24, 857{866 (1996). K. Marton. A measure concentration inequality for contracting Markov chains. Geom. funct. anal. 6, 556-571 (1997). K. Marton. Measure concentration for a class of random processes. Probab. Theory Relat. Fields 110, 427{439 (1998). K. Marton. On a measure concentration of Talagrand for dependent random variables. Preprint (1998). P. Massart. About the constants in Talagrand's deviation inequalities for empirical processes (1998). Ann. Probab. 28, 863{884 (2000). P. Massart. Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse IX, 245{303 (2000). B. Maurey. Constructions de suites symetriques. C. R. Acad. Sci. Paris 288, 679{681 (1979). B. Maurey. Some deviations inequalities. Geom. funct. anal. 1, 188{ 197 (1991). M. Meyer, A. Pajor. Sections of the unit ball of Ln p . J. Funct. Anal. 80, 109{123 (1988). V. D. Milman. Asymptotic properties of functions of several variables that are de ned on homogeneous spaces. Dokl. Akad. Nauk SSR 199, 1247{1250 (1971). V. D. Milman. A certain property of functions de ned on in nite dimensional manifolds. Dokl. Akad. Nauk SSR 200, 781{784 (1971). V. D. Milman. New proof of the theorem of Dvoretzky on sections of convex bodies. Funct. Anal. Appl. 5, 28{37 (1971). V. D. Milman. Diameter of a minimal invariant subset of equivariant Lipschitz actions on compact subsets of R k . Geometric Aspects of Functional Analysis, Israel Seminar 1996-2000. Lecture Notes in Math. 1267, 13{20 (1987). Springer.

244

Mi5] Mi6] Mi7] M-S] Nu] O-V] Os] Pe1] Pe2] Pin] Pis1] Pis2] Pis3] Ra1] Ra2] R-Y] Ri1] Ri2] Rot1] Rot2]

analysis. Colloque Paul Levy sur les processus stochastiques. Asterisque 157-158, 273{302 (1988). V. D. Milman. Dvoretzky's theorem - Thirty years later (Survey). Geom. funct. anal. 2, 455{479 (1992). V. D. Milman. Topics in asymptotic geometric analysis. Geom. funct. anal. Special Volume, 792{815 (2000). V. D. Milman, G. Schechtman. Asymptotic theory of nite dimensional normed spaces. Lecture Notes in Math. 1200 (1986). Springer. D. Nualart. The Malliavin calculus and related topics. Springer (1995). F. Otto, C. Villani. Generalization of an inequality by Talagrand, and links with the logarithmic Sobolev inequality. J. Funct. Anal. 173, 361{400 (2000). R. Osserman. The isoperimetric inequality. Bull. Amer. Math. Soc. 84, 1182{1238 (1978). V. Pestov. Amenable representations and dynamics of the unit sphere in an in nite dimensional Hilbert space. Geom. funct. anal. 10, 1171{1201 (2000). V. Pestov. Ramsey-Milman phenomenon, Urysohn metric spaces and extremely amenable groups (2000). Israel J. Math., to appear. M. S. Pinsker. Information and information stability of random variables and processes. Holden-Day, San Franscico (1964). G. Pisier. On the dimension of the `n p -subspaces of Banach spaces, for 1 p<2. Trans. Amer. Math. Soc. 276, 201{211 (1983). G. Pisier. Probabilistic methods in the geometry of Banach spaces. Probability and Analysis, Varenna (Italy) 1985. Lecture Notes in Math. 1206, 167{241 (1986). Springer. G. Pisier.The volume of convex bodies and Banach space geometry. Cambridge Univ. Press (1989). S. T. Rachev. The Monge-Kantorovich mass transference problem and its stochastic applications. Theory Probab. Appl. 24, 647{671 (1984). S. T. Rachev. Probability metrics and the stability of stochastic models. Wiley (1991). D. Revuz, M. Yor. Continuous martingales and Brownian motions. Springer (1991). E. Rio. Inegalites de cHoe ding pour les fonctions lipschitziennes de suites dependantes (2000). E. Rio. Inegalites de concentration pour les processus empiriques (2000). O. Rothaus. Di usion on compact Riemannian manifolds and logarithmic Sobolev inequalities. J. Funct. Anal. 42, 358{367 (1981). O. Rothaus. Logarithmic Sobolev inequalities and the growth of Lp norms. Proc. Amer. Math. Soc. 126, 2309{2314 (1998).

V. D. Milman. The heritage of P. Levy in geometrical functional

245

Roy] Ru] SC] Sa] Sche1] Sche2] Sche3] Sche4] Sche5] S-S1] S-S2] S-Z1] S-Z2] Schmi] Schmu1] Schmu2]

Cours Specialises. Soc. Math. de France (1999). W. Rudin. Real and complex analysis. McGraw-Hill (1987). L. Saloff-Coste. Lectures on nite Markov chains. Ecole d'Ete de Probabilites de St-Flour 1996. Lecture Notes in Math. 1665, 301{413 (1997). Springer. P.-M. Samson. Concentration of measure inequalities for Markov chains and -mixing processes. Ann. Probab. 28, 416{461 (2000). G. Schechtman. Levy type inequality for a class of metric spaces. Martingale Theory and Harmonic Analysis and Banach spaces, Cleveland 1981. Lecture Notes in Math. 939, 211-215 (1981). Springer. G. Schechtman. A remark concerning the dependenceon in Dvoretzky's theorem. Geometric Aspects of Functional Analysis (Israel Seminar, 1987{88). Lecture Notes in Math. 1376, 274{277 (1989). Springer. G. Schechtman. More on embedding of Euclidean subspaces of Lp in `n r . Compositio Math. 61, 159{170 (1987). G. Schechtman. An editorial comment on the preceding paper. Geometric Aspects of Functional Analysis, Israel Seminar 1996-2000. Lecture Notes in Math. 1745, 19{20 (2000). Springer. G. Schechtman. Concentration, results and applications. The Handbook in the Geometry of Banach Spaces. Elsevier (2001). G. Schechtman, M. Schmuckenschlager. Another remark on the volume of the intersection of two Ln p balls. Geometric Aspects of Functional Analysis (Israel Seminar, 1989{90). Lecture Notes in Math. 1469, 174{178 (1991). Springer. G. Schechtman, M. Schmuckenschlager. A concentration inequality for harmonic measures on the sphere. Geometric Aspects of Functional Analysis (Israel Seminar, 1992{1994). Oper. Theory Adv. Appl. 77, 255{273 (1995). Birkhauser. G. Schechtman, J. Zinn. On the volume of the intersection of two Ln p . Proc. Amer. Math. Soc. 110, 217{224 (1990). G. Schechtman, J. Zinn. Concentration on the `n p ball. Geometric Aspects of Functional Analysis, Israel Seminar 1996-2000. Lecture Notes in Math. 1745, 245{256 (2000). Springer. E. Schmidt. Die Brunn-Minkowskische Ungleichung und ihr Spiegelbild sowie die isoperime- trische Eigenschaft der Kugel in der euklidischen und nichteuklidischen Geometrie. Math. Nach. 1, 81{157 (1948). M. Schmuckenschlager. A concentration of measure phenomenon on uniformly convex bodies. Geometric Aspects of Functional Analysis (Israel Seminar, 1992{1994). Oper. Theory Adv. Appl. 77, 275{ 287. Birkhauser (1995). M. Schmuckenschlager. Martingales, Poincare type inequalities and deviations inequalities. J. Funct. Anal. 155, 303{323 (1998).

G. Royer. Une initiation aux inegalites de Sobolev logarithmiques.

246

Schmu3] Ste] Sto] Str1] Str2] S-V] S-T] Tak] Tal1] Tal2] Tal3] Tal4] Tal5] Tal6] Tal7] Tal8] Tal9] Tal10]

M. Schmuckenschlager. Curvature of nonlocal Markov genera-

tors. Convex geometric analysis (Berkeley, CA, 1996), Math. Sci. Res. Inst. Publ. 34, 189{197. Cambridge Univ. Press (1999). J. M. Steele. Probability theory and combinatorial optimization. CBMS-NSF Regional Conference Series in Applied Mathematics 69 (1996). SIAM. W. F. Stout Almost sure convergence. Academic Press (1974). D. Stroock. Logarithmic Sobolev inequalities for Gibbs states. Dirichlet forms, Varenna 1992. Lecture Notes in Math. 1563, 194{228 (1993). D. Stroock. Probability theory. An analytic view. Cambridge Univ. Press (1993). D. Stroock, S. Varadhan. Multidimensional di usion processes. Springer (1979). V. N. Sudakov, B. S. Tsirel'son. Extremal properties of halfspaces for spherically invariant measures. J. Soviet. Math. 9, 9{18 (1978) translated from Zap. Nauch. Sem. L.O.M.I. 41, 14{24 (1974). M. Takeda. On a martingale method for symmetric di usion process and its applications. Osaka J. Math. 26, 605{623 (1989). M. Talagrand. An isoperimetric theorem on the cube and the Khintchine-Kahane inequalities. Proc. Amer. Math. Soc. 104, 905{ 909 (1988). M. Talagrand. Isoperimetry and integrability of the sum of independent Banach space valued random variables. Ann. Probability 17, 1546{1570 (1989). M. Talagrand. A new isoperimetric inequality for product measure, and the concentration of measure phenomenon. Geometric Aspects of Functional Analysis (Israel Seminar, 1989{90) Lecture Notes in Math. 1469, 91{124 (1991). Springer. M. Talagrand. A new isoperimetric inequality for product measures and the tails of sums of independent random variables. Geom. funct. anal. 1, 211-223 (1991). M. Talagrand. Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22, 28{76 (1994). M. Talagrand. The supremum of some canonical processes. Amer. J. Math. 116, 283{325 (1994). M. Talagrand. Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathematiques de l'I.H.E.S. 81, 73{205 (1995). M. Talagrand. A new look at independence. Ann. Probab., 24, 1{34 (1996). M. Talagrand. New concentration inequalities in product spaces. Invent. math. 126, 505{563 (1996). M. Talagrand. Transportation cost for Gaussian and other product measures. Geom. funct. anal. 6, 587{600 (1996).

247

Tal11] Tal12] Tal13] V-W] W-W] Wu] Yu1] Yu2]

M. Talagrand. The Sherrington-Kirkpatrick model: A challenge for M. Talagrand. A rst course on spin glasses. Ecole d'ete de Proba-

Probab. (1997).

M. Talagrand. Majorizing measures: The generic chaining. Ann.

mathematicians. Probab. Theory Relat. Fields 110, 109{176 (1998).

bilites de St-Flour 2000. Lecture Notes in Math., to appear. Springer. A. N. Van der Vaart, J. A. Wellner. Weak convergence of empirical processes with applications to statistics. Springer (1996). D. L. Wang, P. Wang. Extremal con gurations on a discrete torus and a generalization of the generalized Macaulay theorem. Siam J. Appl. Math. 33, 55{59 (1977). L. Wu. A new modi ed logarithmic Sobolev inequality for Poisson point processes and several applications. Probab. Theory Relat. Fields 118, 427{438 (2000). V. V. Yurinskii. Exponential bounds for large deviations. Theor. Probability Appl. 19, 154{155 (1974). V. V. Yurinskii. Exponential inequalities for sums of random vectors. J. Multivariate Anal. 6, 473{499 (1976).

248

INDEX

boundary measure Cheeger constant concentration function concentration inequality convex in mum-convolution cost function coupling distance deviation inequality entropy equivariant essential expander graph expansion coe cient extremal set ltration forward and backward martingale free energy G-space Hamming metric harmonic measure Herbst's argument hereditary in mum-convolution information theory isoperimetric function length (metric space) length (gradient) Levy family Levy's inequality Lipschitz logarithmic Sobolev inequality log-concave measure

249

Markov chain median modi ed logarithmic Sobolev inequality modulus of continuity modulus of convexity monotone normal concentration normal Levy family observable diameter partial diameter partition function Poincare inequality Polish space property of concentration relative entropy subadditive uniformly convex variance

250

You might also like