Lecture03

STA732
Statistical Inference
Lecture 03: Sufficient Statistics
Yuansi Chen
Spring 2023
Duke University
https://www2.stat.duke.edu/courses/Spring23/sta732.01/
1
Recap from Lecture 02
Introduced exponential families: many good properties
• Natural parameter space is convex

• Easy joint density of i.i.d. random variables
• Easy sufficient statistics
• Easy moments
2
Goal of Lecture 03
1. Define sufficiency
2. Factorization theorem
3. Minimal sufficiency
Chap. 3 in Keener or Chap. 1.6 in Lehmann and Casella
3
Sufficiency
Motivation for sufficiency
i.i.d.
Coin flipping experiment: suppose 𝑋1 , … , 𝑋𝑛 ∼ Bernoulli(𝜃).
Then the joint density is
𝑛
𝑝𝜃 (𝑋1 = 𝑥1 , … , 𝑋𝑛 = 𝑥𝑛 ) = ∏ 𝜃𝑥𝑖 (1 − 𝜃)1−𝑥𝑖
𝑖=1
𝑛 𝑛
= 𝜃∑𝑖=1 𝑥𝑖 (1 − 𝜃)𝑛−∑𝑖=1 𝑥𝑖
𝑛
Let 𝑇 (𝑋) = ∑𝑖=1 𝑋𝑖 . 𝑇 (𝑋) follows Binomial distribution
Binom(𝑛, 𝜃), with 𝑝𝜃 (𝑇 (𝑋) = 𝑡) = 𝜃𝑡 (1 − 𝜃)𝑛−𝑡 (𝑛𝑡)
4
Motivation for sufficiency
i.i.d.
Coin flipping experiment: suppose 𝑋1 , … , 𝑋𝑛 ∼ Bernoulli(𝜃).
Then the joint density is
𝑛
𝑝𝜃 (𝑋1 = 𝑥1 , … , 𝑋𝑛 = 𝑥𝑛 ) = ∏ 𝜃𝑥𝑖 (1 − 𝜃)1−𝑥𝑖
𝑖=1
𝑛 𝑛
= 𝜃∑𝑖=1 𝑥𝑖 (1 − 𝜃)𝑛−∑𝑖=1 𝑥𝑖
𝑛
Let 𝑇 (𝑋) = ∑𝑖=1 𝑋𝑖 . 𝑇 (𝑋) follows Binomial distribution
Binom(𝑛, 𝜃), with 𝑝𝜃 (𝑇 (𝑋) = 𝑡) = 𝜃𝑡 (1 − 𝜃)𝑛−𝑡 (𝑛𝑡)
It seems that to estimate 𝜃 it is sufficient to know 𝑇 (𝑋), but 𝑇 (𝑋)
did throw away data. How to justify?
4
Sufficient statistics
Suppose 𝑋 has distribution from a family P = {𝑃𝜃 , 𝜃 ∈ Ω}. 𝑇 (𝑋)

is a sufficient statistics if for every 𝑡 and 𝜃, the conditional
distribution of 𝑋 under 𝑃𝜃 given 𝑇 = 𝑡 does not depend on 𝜃.
5
Back to the coin flipping example
𝑝𝜃 (𝑋 = 𝑥, 𝑇 = 𝑡)
𝑝𝜃 (𝑋 = 𝑥 ∣ 𝑇 = 𝑡) =
𝑝𝜃 (𝑇 = 𝑡)
𝜃∑ 𝑥𝑖 (1 − 𝜃)𝑛−∑ 𝑥𝑖 1∑ 𝑥𝑖 =𝑡
=
𝜃𝑡 (1 − 𝜃)𝑛−𝑡 (𝑛𝑡)
1
= 𝑛 1∑ 𝑥𝑖 =𝑡
(𝑡)
Conditioned on 𝑇 (𝑋) = 𝑡, 𝑋 only takes values on sequences such

that ∑ 𝑋𝑖 = 𝑡 and it is uniformly distributed. The conditional
distribution does not depend on 𝜃!
6
Interpretation of sufficiency
Interpretation 1: sufficiency from generative model perspective
• We care about 𝑋 because

• 𝑋 is generated according to 𝑃𝜃
• 𝑋 can be used to infer properties of 𝜃
• Sufficiency is saying that 𝑇 (𝑋) is informative enough for
estimating 𝜃
• We can think of data being generated in two stages
1. Generate 𝑇 : distribution depends on 𝜃
2. Generate 𝑋 ∣ 𝑇 : distribution does not depend on 𝜃
7
A graphical model for the data generation
𝜃 𝑇 (𝑋) 𝑋
We lose nothing (in terms of 𝜃 estimation) by considering 𝑇 (𝑋)

alone.
8
Fake data generated from 𝑇 is good enough
𝜃 𝑇 (𝑋) 𝑋
𝑋̃
For the purpose of inferring properties of 𝜃, The fake data 𝑋̃ is as

good as 𝑋
9
Interpretation 2: estimator using 𝑇 (𝑋) alone cannot be worse
Theorem 3.3 in Keener

Suppose 𝑇 (𝑋) is sufficient. Then for any estimator 𝛿(𝑋) of 𝑔(𝜃)
there exists a randomized estimator based on 𝑇 that has the same
risk function as 𝛿(𝑋).
10
Interpretation 2: estimator using 𝑇 (𝑋) alone cannot be worse

Suppose 𝑇 (𝑋) is sufficient. Then for any estimator 𝛿(𝑋) of 𝑔(𝜃)
there exists a randomized estimator based on 𝑇 that has the same
risk function as 𝛿(𝑋).
Proof sketch: generate 𝑋̃ from 𝑇 (random step), then 𝛿(𝑋)

̃ should be as good as
𝛿(𝑋).
10
Factorization theorem
Motivation for factorization theorem
• The sufficiency definition is hard to work with in practice

• There is a convenient way to verify sufficiency by factorizing
the density
11
Factorization theorem

Let P = {𝑃𝜃 , 𝜃 ∈ Ω} be a family of distributions dominated by 𝜇
(𝑃𝜃 ≪ 𝜇, ∀𝜃). 𝑇 is sufficient for P iff there exists functions 𝑔𝜃 , ℎ
such that
𝑝𝜃 (𝑥) = 𝑔𝜃 (𝑇 (𝑥))ℎ(𝑥), a.e.
12
proof (see the rigorous proof in Keener 6.4):
13
Ex1: exponential family
𝑇 (𝑋) is sufficient statistics by factorization theorem

⊤
𝑝𝜃 (𝑥) = 𝑒𝜂(𝜃) 𝑇 (𝑥)−𝐵(𝜃)
ℎ(𝑥)
14
Ex2: joint distributuon of i.i.d. uniform
i.i.d.
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑈 [𝜃, 𝜃 + 1]. The joint density is
𝑛
𝑝𝜃 (𝑥) = ∏ 1{𝜃≤𝑥𝑖 ≤𝜃+1}
𝑖=1
= 1{𝜃≤𝑥(1) ,…𝑥(𝑛) ≤𝜃+1} .
The order statistics (𝑋(𝑖) )𝑛𝑖=1 (where 𝑋(𝑘) is the 𝑘-th smallest) is
sufficient
15
Ex3: joint distribution invariant to permutations of 𝑋𝑖
i.i.d. (1)
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑃𝜃 . The joint distribution 𝑃𝜃 is invariant
to permutations of 𝑋 = (𝑋1 , … , 𝑋𝑛 ). What are some sufficient
statistics?
16
Ex3: joint distribution invariant to permutations of 𝑋𝑖
i.i.d. (1)
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑃𝜃 . The joint distribution 𝑃𝜃 is invariant
to permutations of 𝑋 = (𝑋1 , … , 𝑋𝑛 ). What are some sufficient
statistics?
• Order statistics
• Empirical distribution
̂ 1 𝑛
𝑃𝑛 (⋅) = ∑ 𝛿𝑋𝑖 (⋅)
𝑛 𝑖=1
where 𝛿𝑋𝑖 (𝐴) = 1𝑋𝑖 ∈𝐴 .
In the above case, 𝑝𝜃 (𝑋 = 𝑥 ∣ 𝑇 = 𝑡) is a combinatorial problem that does not

depend on 𝜃
16
Minimal sufficiency
Motivation for minimal sufficiency
i.i.d
Consider 𝑋1 , … , 𝑋𝑛 ∼ 𝒩(𝜃, 1). Among the four sufficient
statistics
𝑛
• ∑𝑖=1 𝑋𝑖
𝑛
• 𝑋̄ = 𝑛1 ∑𝑖=1 𝑋𝑖
• 𝑂(𝑋) = (𝑋(1) , … , 𝑋(𝑛) )
• 𝑋 = (𝑋1 , … , 𝑋𝑛 )
which can be recovered from which?
17
Ordering of sufficient statistics
Proposition
If 𝑇 (𝑋) is sufficient and there exists 𝑓 such that 𝑇 (𝑋) = 𝑓(𝑆(𝑋)),
then 𝑆(𝑋) is sufficient
proof: factorization theorem
18
Minimal sufficiency
We say 𝑇 is minimal sufficient if
• 𝑇 is sufficient
• For any other sufficient statistics 𝑆, there exists 𝑓 such that
𝑇 = 𝑓(𝑆)
(a.e. P)
19
Example: which is minimal sufficient?
Suppose 𝑋1 , … , 𝑋2𝑛 are i.i.d. from 𝒩(𝜃, 1), which of the following
statistics is sufficient? is minimal?
• 𝑋1
𝑛
∑𝑖=1 𝑋𝑖
• 𝑇̃ = ( 2𝑛 )
∑𝑖=𝑛+1 𝑋𝑖
2𝑛
• ∑𝑖=1 𝑋𝑖
20
Another way to check minimal sufficiency

Suppose P = {𝑃𝜃 ∶ 𝜃 ∈ Ω} is a dominated family with densities
𝑝𝜃 (𝑥) = 𝑔𝜃 (𝑇 (𝑥))ℎ(𝑥). If 𝑝𝜃 (𝑥) ∝𝜃 𝑝𝜃 (𝑦) implies 𝑇 (𝑥) = 𝑇 (𝑦), then
𝑇 is minimal sufficient.
Interpretation: 𝑇 is sufficient, if there is one-to-one relation

between the statistics and the likelihood shape
21
proof:
22
Ex1: minimal sufficient statistics in exponential family
⊤
𝑝𝜃 (𝑥) = 𝑒𝜂(𝜃) 𝑇 (𝑥)−𝐵(𝜃)
ℎ(𝑥)
Is 𝑇 (𝑋) minimal sufficient?
23
Ex2: two parameter Gaussian
Suppose 𝑋 ∼ 𝒩(𝜇(𝜃), 𝕀2 ), 𝜃 ∈ ℝ. 𝜇(𝜃) = 𝑎 + 𝜃𝑏, for 𝑎, 𝑏 ∈ ℝ2 .

Which is sufficient, which is minimal sufficient?
• 𝑋
• 𝑏⊤ 𝑋
24
Ex3: Laplace location family
i.i.d. (1)
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑝𝜃 (𝑥) = 12 𝑒−|𝑥−𝜃| . Then the joint density
is
𝑛
1
𝑝𝜃 (𝑥) = 𝑛 exp {− ∑ |𝑥𝑖 − 𝜃|}
2 𝑖=1
Is the order statistics sufficient?
25
Summary
• Sufficient statistic 𝑇 : when 𝑇 is known, no information about

𝜃 is left
• Factorization theorem: is a convenient way to check
sufficiency
• It is possible to order the sufficient statstics and define the
minimal sufficient statistics.
26
What is next?
• Completeness
• Ancillarity
• Basu’s theorem (relationship between sufficient and ancillary
statistics)
27
Thank you
28
29

Lecture03

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture03

Uploaded by

Copyright:

Available Formats

STA732

Introduced exponential families: many good properties

• Natural parameter space is convex

Chap. 3 in Keener or Chap. 1.6 in Lehmann and Casella

Suppose 𝑋 has distribution from a family P = {𝑃𝜃 , 𝜃 ∈ Ω}. 𝑇 (𝑋)

Conditioned on 𝑇 (𝑋) = 𝑡, 𝑋 only takes values on sequences such

• We care about 𝑋 because

We lose nothing (in terms of 𝜃 estimation) by considering 𝑇 (𝑋)

For the purpose of inferring properties of 𝜃, The fake data 𝑋̃ is as

Theorem 3.3 in Keener

Theorem 3.3 in Keener

Proof sketch: generate 𝑋̃ from 𝑇 (random step), then 𝛿(𝑋)

• The sufficiency definition is hard to work with in practice

Theorem 3.6 in Keener

𝑝𝜃 (𝑥) = 𝑔𝜃 (𝑇 (𝑥))ℎ(𝑥), a.e.

𝑇 (𝑋) is sufficient statistics by factorization theorem

= 1{𝜃≤𝑥(1) ,…𝑥(𝑛) ≤𝜃+1} .

where 𝛿𝑋𝑖 (𝐴) = 1𝑋𝑖 ∈𝐴 .

In the above case, 𝑝𝜃 (𝑋 = 𝑥 ∣ 𝑇 = 𝑡) is a combinatorial problem that does not

which can be recovered from which?

proof: factorization theorem

We say 𝑇 is minimal sufficient if

Theorem 3.11 in Keener

Interpretation: 𝑇 is sufficient, if there is one-to-one relation

Is 𝑇 (𝑋) minimal sufficient?

Suppose 𝑋 ∼ 𝒩(𝜇(𝜃), 𝕀2 ), 𝜃 ∈ ℝ. 𝜇(𝜃) = 𝑎 + 𝜃𝑏, for 𝑎, 𝑏 ∈ ℝ2 .

Is the order statistics sufficient?

• Sufficient statistic 𝑇 : when 𝑇 is known, no information about

You might also like