1 Inequalities: 1.1 Markov

Disclaimer Though I try to be precise and correct, errors are inevitable.
If you spot an
error, please mail me at ariel.yadin@weizmann.ac.il. Otherwise, use with caution.
1 Inequalities
1.1 Markov
Let X be a non-negative random variable. For α > 0,

E(X)
P [X ≥ α] ≤
α
[6]
1.2 Paley-Zygmund
Let X be a random variable such that E [X] ≥ 0. Then for any 0 ≤ λ < 1,
2
2 (E [X])
P [X > λ E [X]] ≥ (1 − λ) .
E [X 2 ]
Proof. Using the Cauchy-Schwartz inequality (1.10 below),

p
E [X] = E X1{X≤λ E[X]} + E X1{X>λ E[X]} ≤ λ E [X] + E [X 2 ] P [X > λ E [X]].
t
u
1.3 Chebychev
For ε > 0,
Var(X)
P [|X − E(X)| ≥ ε] ≤
ε2
[6]
1.4 Kolmogorov
Let X1 , . . . , Xn be pairwise independent RV with E(Xi ) = 0 and Var(Xi ) = σi2 . Let Sk =

Pk
1 Xi for all 1 ≤ k ≤ n. Then for any ε > 0
n
1 X 2
P max |Sk | ≥ ε ≤ 2 σ
k≤n ε i=1 i
1
1.5 Azuma (Bernstein)
Let 0 = X0 , X1 , . . . , Xn be a martingale sequence. Assume |Xi − Xi−1 | ≤ 1. Then for λ > 0,
λ2

P[Xn > λ] < exp − ,
2m
λ2

P[|Xn | > λ] < 2 exp − .
2m
[1]
1.6 Chernoff (Hoeffding)

Pn
Let X1 , . . . , Xn be independent Bernoulli RV with E(Xi ) = P [Xi = 1] = pi . Let Sn = 1 Xi
Pn
and pn = 1 pi . Then for λ > 0,
2λ2

P [Sn − pn > λ] < exp −
n
2λ2

P [|Sn − pn | > λ] < 2 exp −
n
[1]
1.7 Extension of Hoeffding
Let X1 , . . . , Xn be real-valued random variables, such that
1. For all i, Xi is not independent of at most d other variables; i.e.,

max |A| A ⊂ [n] , Xi is not independent of {Xj : j ∈ A} ≤ d.
2. For all i, |Xi | ≤ b.
Pn
Let Sn = 1 Xi . Then for λ > 0,
λ2

P [|Sn − E [Sn ]| ≥ λ] ≤ 2 exp − .
2nb2 (d + 1)
[12]
2
1.8 Cramér’s Theorem
Pn
Let X1 , . . . , Xn be i.i.d. random variables, taking values in R. Let Sn = 1 Xi . Assume
that
def
ϕ(t) = E etXi < ∞ ∀ t ∈ R.

Let µ = E [Xi ]. (Note that ϕ(t) and µ are independent of i.) Then, for any a > µ,
1
lim log P [Sn ≥ an] = −I(a),
n→∞ n
where
def
I(z) = sup {zt − ln ϕ(t)} .
t∈R
[10]
1.9 Jénsen
Let X be a real valued random variable. If f : R → R is a convex function, then
f (E[X]) ≤ E[f (X)].
1.10 Cauchy-Schwartz and Hölder

1 1
Let p, q ∈ [1, ∞] be such that p + q = 1. Then,
Z Z p1 Z q1
p q
|f g| dµ ≤ |f | dµ · |g| dµ .
The Cauchy-Schwartz inequality is Hölder with p = q = 2.
1.11 Doob’s Maximal Lp -inequality
Let X0 , X1 , . . . , Xn be a martingale (or a positive sub-martingale). Let X ∗ = max0≤k≤n |Xk |.

Then, for p ≥ 1 and any λ > 0,
p
∗
kXn kp
P [X ≥ λ] ≤ ,
λp
and for any p > 1,
p
kXn kp ≤ kX ∗ kp ≤ · kXn kp .
p−1
[13]
3
1.12 Harmonic-Geometric-Arithmetic Means
Let a1 , a2 , . . . , an be positive real numbers.

Then,
n n
n Y 1/n 1X
Pn 1 ≤ ai ≤ ai .
i=1 ai i=1
n i=1
[9]
1.13 Prékopa-Leindler
Let f, g, h : Rn → R be non-negative integrable functions and let 0 < λ < 1. If h((1 − λ)x +
λy) ≥ f 1−λ (x)g λ (y) for all x, y ∈ Rn , then
Z Z 1−λ Z λ
h(x)dx ≥ f (x)dx g(x)dx .
Rn Rn Rn
[7]
1.14 Young and Inverse Young

1 1
Let p, q, r > 0 be such that p + q = 1 + 1r . Let f ∈ Lp (Rn ) and g ∈ Lq (Rn ) be non-negative
functions. Then,
if p, q, r ≥ 1 then kf ∗ gkr ≤ C n kf kp kgkq ,
if p, q, r ≤ 1 then kf ∗ gkr ≥ C n kf kp kgkq ,

Cp Cq
where C = Cr , and
v
u 1/s
u |s|
Cs = t 1/s0 ,
|s0 |
1 1
for s + s0 = 1.
[7]
1.15 Brunn-Minkowski
Let X, Y be non-empty bounded measurable sets in Rn . Let s, t > 0 and assume that
n o
sX + tY , sx + ty x ∈ X, y ∈ Y

4
is also measurable. Then,
Vol(sX + tY )1/n ≥ sVol(X)1/n + tVol(Y )1/n .
Equivalently, for any 0 < λ < 1,
Vol(λX + (1 − λ)Y ) ≥ min {Vol(X), Vol(Y )} .
1.16 Newton’s Inequality
Let a1 , . . . , an be real (positive or negative) non-zero numbers. For 0 ≤ j ≤ n, let

1 X Y
pj = n
ai .
j S⊂{1,...,n} i∈S
|S|=j
Then, for all 1 ≤ j ≤ n − 1,

pj−1 pj+1 ≤ p2j .
Corollary. Write
n n
Y X n
(x + ai ) = pj xj .
i=1 j=0
j
Then,
pj−1 pj+1 ≤ p2j .
If the ai ’s are all positive,

1/2
p1 ≥ p 2 ≥ · · · ≥ p1/n
n .
[9]
1.17 Four Functions Theorem and FKG
Let L be a finite distributive lattice; i.e. L is a partially ordered set, such that for all
x, y, z ∈ L,
• There exists a unique element of L that is the maximal lower bound of x, y, denoted
by x ∧ y (and called the meet of x, y).
• There exists a unique element of L that is the minimal upper bound of x, y, denoted
by x ∨ y (and called the join of x, y).
5
• (Distributivity)
x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z).
For X, Y ⊂ L define:
n o n o
X ∨ Y = x ∨ y x ∈ X, y ∈ Y X ∧ Y = x ∧ y x ∈ X, y ∈ Y .

For a real valued function ϕ : L → R and X ⊂ L define

X
ϕ(X) = ϕ(x).
x∈X
Four Functions Theorem (FFT): If α, β, γ, δ : L → R+ are four non-negative real valued

functions of L such that for any x, y ∈ L
α(x)β(y) ≤ γ(x ∨ y)δ(x ∧ y),
then for any X, Y ⊂ L,

α(X)β(Y ) ≤ γ(X ∨ Y )δ(X ∧ Y ).
FKG: Let µ be a probability measure on L such that
µ(x)µ(y) ≤ µ(x ∨ y)µ(x ∧ y).
Let f, g, h : L → R+ be non-negative real valued function of L such that f, g are increasing

and h is decreasing. Then,
E[f g] ≥ E[f ] E[g], and E[f h] ≤ E[f ] E[h].
In other words, f, g are positively correlated, and f, h are negatively correlated.

[1]
1.18 Shearer
For a random vector X = (X1 , . . . , Xm ) and A ⊂ {1, 2, . . . , m}, set XA = (Xi : i ∈ A). H(·)
is the entropy function.
Let X = (X1 , . . . , Xm ) be a random vector, and let A be a collection of subsets of {1, 2, . . . , m},
possibly with repeats, such that every element of {1, 2, . . . , m} is contained at least t sets in
A. Then for any partial order ≺ on {1, 2, . . . , m},
1 X
H(X) ≤ H XA (Xi : i ≺ A) .
t
A∈A
[4]
6
1.19 Berry-Esséen
3
Let X1 , . . . , Xn be i.i.d. with E[Xi ] = 0. Assume that σ , E[Xi2 ] < ∞ and ρ , E[|Xi | ] < ∞.
Pn
Let N be a normal random variable with mean 0 and variance 1. Set X = i=1 Xi . Then,
√ √ ρ
P X > λσ n − P [N > λ] = P X ≤ λσ n − P [N ≤ λ] ≤ √ .
σ3 n
[6]
1.20 Chen-Stein Method (simplified)

PN
Let X1 , . . . , XN be N Bernoulli random variable with pi = E [Xi ] > 0, and let SN = i=1 Xi .
Let G = (V, E) be a graph on the vertex set V = {1, . . . , N }, such that {i, i} ∈ E is an edge
(self loop) for all i ∈ V . Assume that for all i ∈ V ,
X
E Xi X j = pi .
j6∼i
(e.g. this holds if Xi is independent of the set {Xj : j 6∼ i}.)

Let Z be a Poisson random variable with mean
N
X
E [Z] = λ = E [SN ] = pi .
i=1
Define
N X
X
B1 = E [Xi ] E [Xj ] ,
i=1 j∼i
N X
X
B2 = E [Xi Xj ] .
i=1 j∼i
j6=i
Then,
1 − e−λ
sup |P [SN ∈ A] − P [Z ∈ A]| ≤ · (B1 + B2 ).
A⊆N λ
[2]
7
1.21 Suen’s Inequality
For a graph G = (V, E) write i ∼ j if {i, j} is an edge. For subsets A, B ⊆ V , write A ∼ B

if there exists an edge between A and B. Thus, A 6∼ B means that there is no edge between
A and B. i ∼ A means there is an edge between i and some element of A.
PN
Let X1 , . . . , XN be N Bernoulli random variables, and let SN = i=1 Xi .
N
Let G be a dependency graph of {Xi }i=1 ; i.e. G = (V, E) is a graph on the vertex set
V = {1, . . . , N } such that for any two disjoint subsets A, B ⊂ V , if A 6∼ B then the two
families {Xi }i∈A and {Xi }i∈B are independent of each other. (In some texts G is called a
superdependency digraph.)
Define
N
1 XX Y −1
∆= E [Xi Xj ] (1 − E [Xk ]) .
2 i=1 j∼i
k∼{i,j}
N
1 XX Y −1
∆∗ = E [Xi ] E [Xj ] (1 − E [Xk ]) .
2 i=1 j∼i
k∼{i,j}
Then,
N
Y
∆
P [SN = 0] ≤ e (1 − E [Xi ]) ,
i=1
N
Y
P [SN = 0] ≥ 1 − ∆∗ e∆ (1 − E [Xi ]) .
i=1
N
Note that if {Xi }i=1 are all independent, then equality holds in both inequalities (choosing
the empty graph).
[11, 15]
8
2 Convergence Theorems
2.1 The Law of the Iterated Logarithm

Pn
Let X1 , X2 , . . . be i.i.d. with E[Xi ] = 0 and 0 < σ 2 , E[Xi2 ] < ∞. Set Sn = 1 Xi . Then,
" #
Sn
P lim sup p = 1 = 1.
2σ 2 n log log n
[14]
2.2 Monotone Convergence
Let (Ω, F, µ) be a measure space. Let 0 ≤ f1 ≤ f2 ≤ · · · be a monotone sequence of

non-negative measurable functions. Then,
f = lim fn
n→∞
exists a.e., and is a measurable function. Further,

Z Z
f dµ = lim fn dµ.
Ω n→∞ Ω
[5]
2.3 Fatou’s Lemma
Let (Ω, F, µ) be a measure space. Let {fn } be a sequence of non-negative measurable func-
tions. Then, Z Z
lim inf fn dµ ≤ lim inf fn dµ.
Ω n→∞ n→∞ Ω
[5]
2.4 Dominated Convergence
Let (Ω, F, µ) be a measure space. Let {fn } , f, g be measurable functions such that:
(1) f = lim fn a.e.,

n→∞
(2) ∀ n |fn | ≤ g a.e.,
Z
(3) gdµ < ∞.
Ω
9
Then, Z Z
lim fn dµ = f dµ.
n→∞ Ω Ω
Remark: The condition of a.e. convergence in (1), can be replaced by convergence in

measure; i.e. it suffices to require that
(1) ∀ε>0 lim µ {|fn − f | > ε} = 0.

n→∞
[5]
10
3 Formulas
3.1 Stirling’s Formula

√
n! ∼ 2πn(n+1/2) e−n
√ n n 1 √ n n 1
2πn · e 12n+1 < n! < 2πn · e 12n
e e
For 0 < α < 1,
n 1
= 1 ± O n−1 · C · √ · 2nH(α) ,

αn n
1
where H(α) = −α log α − (1 − α) log(1 − α), and the constant is C = √ .
2πα(1−α)
Less accurate, but easier to use:

n k
n ne k
≤ ≤
k k k
1

3.2 1− x
and e:
For all x ≥ 1, x x
1 −1 1
1− ≤e ≤ 1− .
x x+1
3.3 Finite sums of powers

n
X n(n + 1)
k=
2
k=1
n
X n(n + 1)(2n + 1)
k2 =
6
k=1
n 2
X
3 n(n + 1)
k =
2
k=1
n
X n(n + 1)(2n + 1)(3n2 + 3n − 1)
k4 =
30
k=1
[8]
11
3.4 Permutation fixed points
Let D(n, m) denote the number of permutations of {1, . . . , n} that have exactly m fixed
points. Then,
n−m
n! X (−1)k
D(n, m) = .
m! k!
k=0
The proof uses the inclusion-exclusion principle, see e.g.

Wikipedia: Random permutation statistics .
12
4 Complex Analysis
Denote the unit disc in the complex plane by U;

U = z ∈ C |z| < 1 .
4.1 Schwartz Lemma
Let f : U → C be a bounded analytic function such that f (0) = 0. Then,
|f (z)| ≤ |z| · kf k∞ and |f 0 (0)| ≤ kf k∞ .
If there exists z 6= 0 such that equality holds (in one of the) above, then there exists θ ∈ [0, 2π)
such that for all z
f (z) = eiθ kf k∞ · z.
[3]
4.2 Bieberbach Conjecture (De Branges’ Theorem)
Let f : U → C be a conformal map of the unit disc (i.e. f is injective and analytic in the
unit disc). Then for all n ≥ 1,
|f (n) (0)| ≤ n · n! · |f 0 (0)|.
[3]
1
4.3 Koebe 4
Theorem
unit disc). Then,
|f 0 (0)|

1
= |f 0 (0)|U + f (0) ⊆ f (U).

z ∈ C |z − f (0)| <

4 4
Furthermore, let df (z) be the distance of f (z) to ∂f (U); i.e.
df (z) = inf |f (z) − ζ|.

ζ∈∂f (U)
Then,
f 0 (z)
· (1 − |z|2 ) ≤ df (z) ≤ (1 − |z|2 )|f 0 (z)|.
4
[3]
13
4.4 Koebe Distortion Theorem
unit disc). Then,
|z| |z|
· |f 0 (0)| ≤ |f (z) − f (0)| ≤ · |f 0 (0)|,
(1 + |z|)2 (1 − |z|)2
and
1 − |z| 1 + |z|
· |f 0 (0)| ≤ |f 0 (z)| ≤ · |f 0 (0)|.
(1 + |z|)3 (1 − |z|)3
[3]
References
[1] N. Alon, J. H. Spencer, The Probabilistic Method (2000), John Wiley & Sons, Inc.
[2] R. Arratia, L. Goldstein, L. Gordon, Two moments suffice for Poisson approxi-
mations: The Chen-Stein method. Ann. Probab. 17 (1989), 9–25.
[3] J. B. Conway, Functions of One Complex Variable II (1995), Springer-Verlag.
[4] T. M. Cover, J. A. Thomas, Elements of Information Theory (1991), John Wiley

& Sons, Inc.
[5] J. L. Doob, Measure Theory (1994), Springer.
[6] W. Feller, Introduction to Probability Theorey and its Applications (1966), John
Wiley & Sons, Inc.
[7] R. J. Gardner, The Brunn-Minkowski Inequality, Bulletin of the American Math-

ematical Society 39 (2002), no. 3, 355–405. available here
[8] I. S. Gradshteyn, I. M. Ryzhik, Table of Integrals, Series, and Products (1965),

New York : Academic Press.
[9] G. H. Hardy, J. E. Littlewood, G. Pólya, Inequalities (1952), Cambridge University

Press.
[10] F. den Hollander, Large Deviations (2000), AMS.
14
[11] S. Janson, New versions of Suen’s correlation inequality. Random Structures and
Algorithms 13 (1998), 467–483.
[12] S. Janson, Large deviations for sums of partly dependent random variables. Ran-
dom Structures and Algorithms 24 (2004), no. 3, 234–248. available here
[13] D. Revuz, M. Yor, Continuous martingales and Brownian motion (1991),

Springer-Verlag.
[14] A. N. Shiryaev, Probability, Translated by R. P. Boas (1996), Springer.
[15] W. C. S. Suen, A correlation inequality and a Poisson limit theorem for nonover-
lapping balanced subgraphs of a random graph. Random Structures and Algo-
rithms, 1 (1990), 231–242.
15

1 Inequalities: 1.1 Markov

Uploaded by

Copyright:

Available Formats

You might also like

1 Inequalities: 1.1 Markov

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Inequalities: 1.1 Markov

Uploaded by

Copyright:

Available Formats

Disclaimer Though I try to be precise and correct, errors are inevitable.

Let X be a non-negative random variable. For α > 0,

Proof. Using the Cauchy-Schwartz inequality (1.10 below),

Let X1 , . . . , Xn be pairwise independent RV with E(Xi ) = 0 and Var(Xi ) = σi2 . Let Sk =

Let 0 = X0 , X1 , . . . , Xn be a martingale sequence. Assume |Xi − Xi−1 | ≤ 1. Then for λ > 0,

1.6 Chernoff (Hoeffding)

1.7 Extension of Hoeffding

Let X1 , . . . , Xn be real-valued random variables, such that

1. For all i, Xi is not independent of at most d other variables; i.e.,

2. For all i, |Xi | ≤ b.

Let X be a real valued random variable. If f : R → R is a convex function, then

f (E[X]) ≤ E[f (X)].

1.10 Cauchy-Schwartz and Hölder

The Cauchy-Schwartz inequality is Hölder with p = q = 2.

1.11 Doob’s Maximal Lp -inequality

Let X0 , X1 , . . . , Xn be a martingale (or a positive sub-martingale). Let X ∗ = max0≤k≤n |Xk |.

Let a1 , a2 , . . . , an be positive real numbers.

1.14 Young and Inverse Young

if p, q, r ≤ 1 then kf ∗ gkr ≥ C n kf kp kgkq ,

Vol(sX + tY )1/n ≥ sVol(X)1/n + tVol(Y )1/n .

Equivalently, for any 0 < λ < 1,

Vol(λX + (1 − λ)Y ) ≥ min {Vol(X), Vol(Y )} .

1.16 Newton’s Inequality

Let a1 , . . . , an be real (positive or negative) non-zero numbers. For 0 ≤ j ≤ n, let

Then, for all 1 ≤ j ≤ n − 1,

If the ai ’s are all positive,

1.17 Four Functions Theorem and FKG

For a real valued function ϕ : L → R and X ⊂ L define

Four Functions Theorem (FFT): If α, β, γ, δ : L → R+ are four non-negative real valued

α(x)β(y) ≤ γ(x ∨ y)δ(x ∧ y),

then for any X, Y ⊂ L,

FKG: Let µ be a probability measure on L such that

µ(x)µ(y) ≤ µ(x ∨ y)µ(x ∧ y).

Let f, g, h : L → R+ be non-negative real valued function of L such that f, g are increasing

E[f g] ≥ E[f ] E[g], and E[f h] ≤ E[f ] E[h].

In other words, f, g are positively correlated, and f, h are negatively correlated.

1.20 Chen-Stein Method (simplified)

(e.g. this holds if Xi is independent of the set {Xj : j 6∼ i}.)

For a graph G = (V, E) write i ∼ j if {i, j} is an edge. For subsets A, B ⊆ V , write A ∼ B

2.1 The Law of the Iterated Logarithm

2.2 Monotone Convergence

Let (Ω, F, µ) be a measure space. Let 0 ≤ f1 ≤ f2 ≤ · · · be a monotone sequence of

exists a.e., and is a measurable function. Further,

2.3 Fatou’s Lemma

2.4 Dominated Convergence

(1) f = lim fn a.e.,

Remark: The condition of a.e. convergence in (1), can be replaced by convergence in

(1) ∀ε>0 lim µ {|fn − f | > ε} = 0.

3.1 Stirling’s Formula

Less accurate, but easier to use:

3.3 Finite sums of powers

The proof uses the inclusion-exclusion principle, see e.g.

Denote the unit disc in the complex plane by U;

4.1 Schwartz Lemma

Let f : U → C be a bounded analytic function such that f (0) = 0. Then,

|f (z)| ≤ |z| · kf k∞ and |f 0 (0)| ≤ kf k∞ .

4.2 Bieberbach Conjecture (De Branges’ Theorem)