The Density Ratio of Generalized Binomial versus

Poisson Distributions
Lutz Dümbgen (University of Bern)∗
Jon A. Wellner (University of Washington, Seattle)†

October 9, 2019
arXiv:1910.03444v1 [math.ST] 8 Oct 2019

Let b(x) be the probability that a sum of independent Bernoulli random variables with
parameters p1 , p2 , p3 , . . . ∈ [0, 1) equals x, where λ := p1 + p2 + p3 + · · · is finite. We prove
two inequalities for the maximal ratio b(x)/πλ (x), where πλ is the weight function of the
Poisson distribution with parameter λ.

Key words: Poisson approximation, relative errors, total variation distance.

1 Introduction

We consider independent Bernoulli random variables Z1 , Z2 , Z3 , . . . ∈ {0, 1} with param-

eters IP(Zi = 1) = IE(Zi ) = pi ∈ [0, 1) and their random sum X = i≥1 Zi . By the first
and second Borel–Cantelli lemmas, X is almost surely finite if and only if the sequence
p = (pi )i≥1 satisfies

λ := pk < ∞, (1)

and we exclude the trivial case λ = 0. Under this assumption, the distribution Q of X is
given by
IP(X = x) =: b(x) = pi (1 − pk )
J : #J=x i∈J k∈J c

for x ∈ N0 , where J denotes a generic subset of N and J c := N \ J. It is well-known that

the distribution Q may be approximated by the Poisson distribution Poiss(λ) with weights

π(x) = e−λ λx /x!,

Research supported by Swiss National Science Foundation

Research supported in part by: (a) NSF Grant DMS-1566514; and (b) NI-AID Grant 2R01 AI291968-

provided that the quantity
∆ := λ−1 p2i
is small. Indeed, a suitable version of Stein’s method, developed by Chen (1975), leads to
the remarkable bound
dTV Q, Poiss(λ) ≤ (1 − e−λ )∆ ≤ p2i max(1, λ),

where dTV (·, ·) stands for total variation distance; see Theorem 2.M in Barbour et al.
(1992). Note also that
Var(X) = pi (1 − pi ) = λ(1 − ∆).

Conjecture and main results. Motivated by Dümbgen et al. (2019), we are aiming
at upper bounds for the maximal density ratio

ρ Q, Poiss(λ) := sup r(x)

r(x) := .
Note that for arbitrary sets A ⊂ N0 , the probability Q(A) = IP(X ∈ A) is never larger

than the corresponding Poisson probability times ρ Q, Poiss(λ) , no matter how small
the Poisson probability is. Moreover, dTV Q, Poiss(λ) ≤ 1 − ρ Q, Poiss(λ) . Hence,

ρ Q, Poiss(λ) is a strong measure of error when Q is approximated by Poiss(λ). We
conjecture that

ρ Q, Poiss(λ) ≤ (1 − ∆)−1 . (2)

In this note we prove that

ρ Q, Poiss(λ) ≤ (1 − p∗ )−1 (3)

for arbitrary values of λ, where

p∗ := max pi ≥ ∆.

In addition, we prove that in case of λ ≤ 1, a stronger version of (2) is true:

ρ Q, Poiss(λ) ≤ e∆ if λ ≤ 1. (4)

Note that e−∆ > 1 − ∆, whence e∆ < (1 − ∆)−1 .

In Section 2 we provide some basic formulae for the weights b(x) and the ratios r(x).
These lead to a preliminary bound for the maximizer(s) of r = b/π and a first bound for

ρ Q, Poiss(λ) . Then in Section 3 we derive the upper bound (3). In Section 4 we discuss

the case 0 < λ ≤ 1 and provide lower and upper bounds for ρ Q, Poiss(λ) .

2 Preparations

Discrete scores. With n := #{i ≥ 1 : pi > 0} ∈ N ∪ {∞}, note that b(x) > 0 if and
only if x ≤ n. For any x ≥ 0,
π(x + 1) λ
= ,
π(x) x+1
so the “scores” r(x + 1)/r(x) are given by

r(x + 1) (x + 1)b(x + 1)
r(x) λb(x)

for x ≥ 0 with b(x) > 0. If x is a maximizer of r(·), then

(x + 1)b(x + 1) xb(x)
≤ λ ≤ (5)
b(x) b(x − 1)

with b(−1) := 0.

Representing the weight function of Q. The weight function b may be written as

b(x) = w(J) with w(J) := pi (1 − pk ).
J:#J=x i∈J k∈J c

In particular,
Y X 
b(0) = (1 − pk ) = exp log(1 − pk ) < exp(−λ) = π(0),
k≥1 k≥1

because log(1 + y) < y for −1 < y 6= 0. Since

Y pi Y Y pi
w(J) = (1 − pk ) = b(0) ,
1 − pi 1 − pi
i∈J k≥1 i∈J

we can also write w(J) = b(0)W (J) and

b(x) = b(0) W (J) with W (J) := qi ,
J:#J=x i∈J

pi qi
qi := ∈ [0, ∞), pi = .
1 − pi 1 + qi

Ratios of consecutive binomial weights. There are various ways to represent the
ratios b(x + 1)/b(x). In the subsequent versions, the following notation will be useful: For
any set J ⊂ N, we define
s(J) := pi and S(J) := qi .
i∈J i∈J

In case of #J < ∞ we set

s̄(J) := s(J)/#J,
S̄(J) := S(J)/#J,
. X . X
W̄ (J) := W (J) W (L) = w(J) w(L)
L:#L=#J L:#L=#J

with the convention 0/0 := 0. Then for any integer x ≥ 0 with b(x) > 0,
b(x + 1) X X 1 X
= W (L) = W (L \ {k})qk
b(0) x+1
L:#L=x+1 L:#L=x+1 k∈L
1 X X
= W (J) qk
J:#J=x k∈J c
1 X
= W (J)S(J c ).

(x + 1)b(x + 1) X
= W̄ (J)S(J c ). (6)

Alternatively, if b(x + 1) > 0, then

b(x) X X X qk
= W (J) = W (J)
b(0) S(J c )
J:#J=x J:#J=x k∈J c
X X W (J ∪ {k})
qk + S((J ∪ {k})c )
J:#J=x k∈J c
X X 1
= W (L) .
qk + S(Lc )
L:#L=x+1 k∈L

b(x) X 1 X 1
= W̄ (L) . (7)
(x + 1)b(x + 1) x+1 qk + S(Lc )
L:#L=x+1 k∈L
One can repeat the previous arguments with the sums k∈J c pj /s(J c ) = 1 in place of
P c
k∈J c qk /S(J ) = 1. This leads to

b(x) X X W (J)pk
b(0) p + s((J ∪ {k})c )
c k
J:#J=x k∈J
X X 1 − pk
= W (L) ,
pk + s(Lc )
L:#L=x+1 k∈L

because W (J)pk = W (J ∪ {k})(1 − pk ) for k ∈ J c . Consequently,

b(x) X 1 X 1 − pk
= W̄ (L) . (8)
(x + 1)b(x + 1) x+1 pk + s(Lc )
L:#L=x+1 k∈L

Analyzing equation (8) will lead to a first result about the location of maximizers of

r(·) plus a preliminary bound for ρ Q, Poiss(λ) .

Proposition 1. Any maximizer x ∈ N0 of r(x) satisfies the inequalities

1 ≤ x ≤ ⌈λ⌉.

  ep∗ − 1 ⌈λ⌉
ρ Q, Poiss(λ) ≤ 1 + ∆ ≤ e⌈λ⌉p∗ .

Proof of Proposition 1. Since r(0) < 1, any maximizer xo of r(·) has to satisfy xo ≥ 1.
To verify the inequality xo ≤ ⌈λ⌉, it suffices to show that for any x ≥ λ with b(x) > 0,
r(x + 1)
≤ 1.
This is equivalent to
≥ λ−1 . (9)
(x + 1)b(x + 1)
If b(x + 1) = 0, this inequality is trivial. Otherwise, according to (8), the left hand side of
(9) equals
X 1 X 1 − pk
W̄ (L) .
x+1 pk + s(Lc )
L:#L=x+1 k∈L

Since (1 − y)/(y + s(Lc ))

is a convex function of y ≥ 0, Jensen’s inequality implies that
1 X 1 − pk 1 − s̄(L) 1 − s̄(L) 1 − s̄(L)
≥ c
= = .
x+1 pk + s(L ) s̄(L) + s(L ) s̄(L) + λ − s(L) λ − xs̄(L)

But in case of x ≥ λ,
1 − s̄(L) 1 − s̄(L)
≥ = λ−1 ,
λ − xs̄(L) λ − λs̄(L)
whence (9) holds true.
Now we only need an upper bound for r(x) and apply it with x ≤ ⌈λ⌉. First of all,
r(x) = λ−x x! eλ pi (1 − pk )
J:#J=x i∈J k∈J c
= λ−x x! pi epi exp(pk + log(1 − pk ))
J:#J=x i∈J k∈J c
≤ λ−x x! pi epi
J:#J=x i∈J
X x
Y X p 
k pk x
≤ λ−x pk(s) epk(s) = e .
k(1),...,k(x)≥1 s=1 k≥1

epk ≤ 1 + (pk /p∗ )(ep∗ − 1) ≤ ep∗

by convexity and monotonicity of the exponential function, whence

X pk ep∗ − 1
epk ≤ 1 + ∆ ≤ ep∗ .
λ p∗

3 Bounds in terms of p∗
3.1 A general strategy to verify upper bounds

In what follows, the dependency of objects such as Q, b, r, w(J), . . . on the sequence p

is indicated by a subscript p if necessary, leading to Qp , bp , rp , wp (J), . . ., and we write
π = πλ . Let A = A(p) ∈ [0, 1) stand for a positively homogeneous functional of p, i.e.

A(tp) = tA(p) for t ∈ (0, 1].

Two examples for such a functional are A = ∆ and A = p∗ .

Suppose we want to prove that

log ρ Q, Poiss(λ) ≤ g(A)

for a given differentiable function g : [0, 1) → [0, ∞) with g(0) = 0 and g′ (0) ≥ 1. An
explicit example is given by g(s) := − log(1 − s). To verify this conjecture, we analyze the
function f : (0, 1] → R given by

f (t) := log ρ Qtp , Poiss(tλ) − g(tA),

so the assertion is equivalent to f (1) ≤ 0. Hence, it suffices to show that f (0 +) = 0 and

that f is nonincreasing.
Note that replacing p with tp amounts to replacing λ and ∆ with tλ and t∆, respec-
tively. By Proposition 1, we know that

ρ Qtp , Poiss(tλ) = max rtp (x)

f (t) ≤ ⌈tλ⌉tp∗ − g(tA).

This implies already that f (0 +) = 0. If we can show that for any fixed x ∈ {1, . . . , ⌈λ⌉},
the log-density ratio Lx (t) := log rtp (x) is a continuously differentiable function of t ∈
(0, 1], then f is continuous on (0, 1] with limit f (0 +) = 0, and for t < 1,

f ′ (t +) = max L′x (t) − ge(tA)/t, (10)

x∈N (t)

N (t) := arg max rtp (x)

ge(s) := sg ′ (s).

Then a sufficient condition for f (1) ≤ 0 is that f ′ (t +) ≤ 0 for all t ∈ (0, 1), and this can
be rewritten as follows: For t ∈ (0, 1) and 1 ≤ x ≤ ⌈tλ⌉,

L′x (t) ≤ e
g(tA)/t if x ∈ N (t).

In view of (5), a sufficient condition for that is

xbtp (x)
L′x (t) ≤ e
g(tA)/t if ≥ tλ. (11)
btp (x − 1)

Now it is high time to analyze the functions Lx (·) for 1 ≤ x ≤ ⌈λ⌉. The inequality
x ≤ ⌈λ⌉ implies that b(x) > 0, because otherwise, λ would be a sum of x − 1 weights
pi ∈ [0, 1), and this would lead to the contradiction ⌈λ⌉ ≤ x − 1. For a set J ⊂ N with
#J = x,
∂ ∂ xY Y
wtp (J) = t pi (1 − tpk )
∂t ∂t
i∈J k∈J c
= xtx−1 pi (1 − tpk ) − p ℓ tx pi (1 − tpk )
i∈J k∈J c ℓ∈J c i∈J k∈J c \{ℓ}
= xtx−1 pi (1 − tpk ) − tx pi (1 − tpk )
i∈J k∈J c ℓ∈J c i∈J∪{ℓ} k∈(J∪{ℓ})c
x 1 X
= wtp (J) − wtp (J ∪ {ℓ}).
t t c ℓ∈J

∂ X ∂
btp (x) = wtp (J)
∂t ∂t
x X 1 X X
= wtp (J) − wtp (J ∪ {ℓ})
t t
J:#J=x J:#J=x ℓ∈J c
x X 1 X X
= wtp (J) − wtp (L)
t t
J:#J=x L:#L=x+1 ℓ∈L
x X x+1 X
= wtp (J) − wtp (L)
t t
J:#J=x L:#L=x+1
x x+1
= btp (x) − btp (x + 1).
t t
This gives us the identity
∂ x x + 1 btp (x + 1)
log btp (x) = − .
∂t t t btp (x)
An elementary calculation yields
∂ x
log πtλ (x) = − λ,
∂t t

∂ x + 1 btp (x + 1)
L′x (t) = log rtp (x) = λ − .
∂t t btp (x)
Consequently, (11) may be rewritten as follows: For each t ∈ (0, 1) and 1 ≤ x ≤ ⌈tλ⌉,

(x + 1)btp (x + 1) xbtp (x)

≥ tλ − ge(tA) if ≥ tλ.
btp (x) btp (x − 1)

Since we could replace p with tp, it even suffices to show that for 1 ≤ x ≤ ⌈λ⌉,
(x + 1)b(x + 1) xb(x)
≥ λ−eg(A) if ≥ λ. (12)
b(x) b(x − 1)
Note that b(1)/b(0) = i≥1 qi > i≥1 pi = λ, so (12) implies that

≥ λ−e
g (A).

3.2 The main result

In case of A = p∗ and g(s) = − log(1 − s), the strategy just outlined works nicely, leading
to our first main result. Note that e
g(s) = s/(1 − s).
Theorem 1. For any sequence p of probabilities pi ∈ [0, 1) with λ = i≥1 pi < ∞,

ρ Q, Poiss(λ) ≤ (1 − p∗ )−1 .

Proof of Theorem 1. For 1 ≤ x ≤ ⌈λ⌉, the representation (7) with x − 1 in place of x

b(x − 1) X 1X 1
= W̄ (J) .
xb(x) x qi + S(J c )
J:#J=x i∈J

By Jensen’s inequality,
1X 1 1 X −1 −1
≥ (q i + S(J )) = S̄(J) + S(J c ) ,
x qi + S(J c ) x
i∈J i∈J

b(x − 1) X −1
≥ W̄ (J) S̄(J) + S(J c ) .

A second application of Jensen’s inequality yields that

b(x − 1)  X −1
≥ W̄ (J) S̄(J) + S(J c ) .

Consequently, if xb(x)/b(x − 1) ≥ λ, then

W̄ (J) S̄(J) + S(J c ) ≥ λ.

On the other hand, (6) yields

(x + 1)b(x + 1) X
= W̄ (J)S(J c )
X  X
= W̄ (J) S̄(J) + S(J c ) − W̄ (J)S̄(J)
J:#J=x J:#J=x
≥ λ− W̄ (J)S̄(J)
≥ λ− = λ−e
g (p∗ ),
1 − p∗
because for any set J with x elements,

1X p∗
S̄(J) = qi ≤ .
x 1 − p∗

Consequently, (12) is satisfied with A = p∗ , and this yields the assertion.

4 Bounds in terms of ∆

At the moment we do not know whether our general strategy works for A = ∆. Instead
we derive some bounds via direct arguments. We start with an elementary result about
the log-density ratio L1 (t) = log rtp (1).

Proposition 2. The function L1 : [0, 1] → R is twice differentiable with L1 (0) = 0,

L′1 (0) = ∆ and L′′1 ≤ 0 with equality if and only if #{i ≥ 1 : pi > 0} = 1.

Proof of Proposition 2. Note first that for t ∈ (0, 1],

 X Y 
L1 (t) = tλ + log (tλ)−1 (tpi ) (1 − tpk )
i≥1 k6=i
 X Y 
= tλ + log λ−1 pi (1 − tpk )
i≥1 k6=i
X  X pi 
= (tpi + log(1 − tpi )) + log λ−1 .
1 − tpi
i≥1 i≥1

The right hand side is a smooth function of t ∈ [0, 1] with L1 (0) = 0. Moreover,
X pi  X p2i .X p
L′1 (t) = pi − +
1 − tpi (1 − tpi )2 1 − tpi
i≥1 i≥1 i≥1
X p2i X p2i .X pi
= −t + ,
1 − tpi (1 − tpi )2 1 − tpi
i≥1 i≥1 i≥1
X . X
L′1 (0) = p2i pi = ∆.
i≥1 i≥1

Finally, with ai (t) := pi /(1 − tpi ) and S(t) := i≥1 ai (t),

X p2i X p3i .X p
L′′1 (t) = − 2
+ 2 3
(1 − tpi ) (1 − tpi ) 1 − tpi
i≥1 i≥1 i≥1
X p2i .X pi 2

(1 − tpi )2 1 − tpi
i≥1 i≥1
= − ai (t) + 2 ai (t)3 /S(t) − ai (t)2 aj (t)2 /S(t)2
i≥1 i≥1 i,j≥1
≤ − ai (t) − 2ai (t) /S(t) + ai (t)4 /S(t)2
2 2

X 2
= − ai (t)2 1 − ai (t)/S(t)
≤ 0.

The second last inequality is strict, unless #{i ≥ 1 : pi > 0} = 1, and in that case both
preceding inequalities are equalities.

Propositions 1 and 2 are the main ingredients for the following upper bound for

log ρ Q, Poiss(λ) .
Theorem 2. For any sequence p of probabilities pi ∈ [0, 1) with λ = i≥1 pi ≤ 1,
  ∆ λ 
∆ ≥ log ρ Q, Poiss(λ) ≥ ∆ 1 − − .
2 2(1 − p∗ )

Since ∆ ≤ p∗ ≤ λ, this theorem shows that

log ρ Q, Poiss(λ)
→ 1 as λ → 0.

Proof of Theorem 2. We know from Proposition 1 that in case of λ ≤ 1,

log ρ Q, Poiss(λ) = log r(1) = L1 (1).

But Proposition 2 implies that for some ξ ∈ (0, 1),

L1 (1) = L1 (0) + L′1 (0) + 2−1 L′′1 (ξ) = 0 + ∆ + 2−1 L′′1 (ξ) ≤ ∆.

As to the lower bound, recall that

X  X pi 
L1 (1) = (pi + log(1 − pi )) + log λ−1 .
1 − pi
i≥1 i≥1

On the one hand,

X pk p2i X ℓ p2i
pi + log(1 − pi ) = − ≥ − p∗ = − ,
k 2 2(1 − p∗ )
k≥2 ℓ≥0

X 1 X λ
(pi + log(1 − pi )) ≥ − p2i = − ∆.
2(1 − p∗ ) 2(1 − p∗ )
i≥1 i≥1

 X pi   X 
log λ−1 ≥ log λ−1 (pi + p2i ) = log(1 + ∆) ≥ ∆ − ∆2 /2,
1 − pi
i≥1 i≥1

and this implies the asserted lower bound for L1 (1).

Remark 3 (Total variation distance). Since b(0) ≤ π(0), Theorem 2 implies that in case
of λ ≤ 1,
dTV Q, Poiss(λ) = sup Q(A) − Poiss(λ)(A)
≤ sup Q(A) 1 − ρ Q, Poiss(λ)

≤ (1 − b(0))(1 − e−∆ )
≤ λ(1 − e−∆ ) ≤ λ∆ = p2i .
Here we used the elementary inequalities 1 − b(0) = 1 − i≥1 (1 − pi ) ≤ i≥1 pi =
λ and 1 − e−∆ ≤ ∆. Consequently, Theorem 2 implies a reasonable upper bound for

dTV Q, Poiss(λ) .

Acknowledgement. Part of this research was conducted at Mathematisches Forschungs-

institut Oberwolfach (MFO), Germany, in June and July 2019. We are grateful to the
MFO for its generous hospitality and support.


