Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

STA732

Statistical Inference
Lecture 03: Sufficient Statistics

Yuansi Chen
Spring 2023
Duke University

https://www2.stat.duke.edu/courses/Spring23/sta732.01/

1
Recap from Lecture 02

Introduced exponential families: many good properties

• Natural parameter space is convex


• Easy joint density of i.i.d. random variables
• Easy sufficient statistics
• Easy moments

2
Goal of Lecture 03

1. Define sufficiency
2. Factorization theorem
3. Minimal sufficiency

Chap. 3 in Keener or Chap. 1.6 in Lehmann and Casella

3
Sufficiency
Motivation for sufficiency

i.i.d.
Coin flipping experiment: suppose 𝑋1 , … , 𝑋𝑛 ∼ Bernoulli(𝜃).
Then the joint density is
𝑛
𝑝𝜃 (𝑋1 = 𝑥1 , … , 𝑋𝑛 = 𝑥𝑛 ) = ∏ 𝜃𝑥𝑖 (1 − 𝜃)1−𝑥𝑖
𝑖=1
𝑛 𝑛
= 𝜃∑𝑖=1 𝑥𝑖 (1 − 𝜃)𝑛−∑𝑖=1 𝑥𝑖
𝑛
Let 𝑇 (𝑋) = ∑𝑖=1 𝑋𝑖 . 𝑇 (𝑋) follows Binomial distribution
Binom(𝑛, 𝜃), with 𝑝𝜃 (𝑇 (𝑋) = 𝑡) = 𝜃𝑡 (1 − 𝜃)𝑛−𝑡 (𝑛𝑡)

4
Motivation for sufficiency

i.i.d.
Coin flipping experiment: suppose 𝑋1 , … , 𝑋𝑛 ∼ Bernoulli(𝜃).
Then the joint density is
𝑛
𝑝𝜃 (𝑋1 = 𝑥1 , … , 𝑋𝑛 = 𝑥𝑛 ) = ∏ 𝜃𝑥𝑖 (1 − 𝜃)1−𝑥𝑖
𝑖=1
𝑛 𝑛
= 𝜃∑𝑖=1 𝑥𝑖 (1 − 𝜃)𝑛−∑𝑖=1 𝑥𝑖
𝑛
Let 𝑇 (𝑋) = ∑𝑖=1 𝑋𝑖 . 𝑇 (𝑋) follows Binomial distribution
Binom(𝑛, 𝜃), with 𝑝𝜃 (𝑇 (𝑋) = 𝑡) = 𝜃𝑡 (1 − 𝜃)𝑛−𝑡 (𝑛𝑡)
It seems that to estimate 𝜃 it is sufficient to know 𝑇 (𝑋), but 𝑇 (𝑋)
did throw away data. How to justify?

4
Sufficient statistics

Suppose 𝑋 has distribution from a family P = {𝑃𝜃 , 𝜃 ∈ Ω}. 𝑇 (𝑋)


is a sufficient statistics if for every 𝑡 and 𝜃, the conditional
distribution of 𝑋 under 𝑃𝜃 given 𝑇 = 𝑡 does not depend on 𝜃.

5
Back to the coin flipping example

𝑝𝜃 (𝑋 = 𝑥, 𝑇 = 𝑡)
𝑝𝜃 (𝑋 = 𝑥 ∣ 𝑇 = 𝑡) =
𝑝𝜃 (𝑇 = 𝑡)
𝜃∑ 𝑥𝑖 (1 − 𝜃)𝑛−∑ 𝑥𝑖 1∑ 𝑥𝑖 =𝑡
=
𝜃𝑡 (1 − 𝜃)𝑛−𝑡 (𝑛𝑡)
1
= 𝑛 1∑ 𝑥𝑖 =𝑡
(𝑡)

Conditioned on 𝑇 (𝑋) = 𝑡, 𝑋 only takes values on sequences such


that ∑ 𝑋𝑖 = 𝑡 and it is uniformly distributed. The conditional
distribution does not depend on 𝜃!

6
Interpretation of sufficiency
Interpretation 1: sufficiency from generative model perspective

• We care about 𝑋 because


• 𝑋 is generated according to 𝑃𝜃
• 𝑋 can be used to infer properties of 𝜃
• Sufficiency is saying that 𝑇 (𝑋) is informative enough for
estimating 𝜃
• We can think of data being generated in two stages
1. Generate 𝑇 : distribution depends on 𝜃
2. Generate 𝑋 ∣ 𝑇 : distribution does not depend on 𝜃

7
A graphical model for the data generation

𝜃 𝑇 (𝑋) 𝑋

We lose nothing (in terms of 𝜃 estimation) by considering 𝑇 (𝑋)


alone.

8
Fake data generated from 𝑇 is good enough

𝜃 𝑇 (𝑋) 𝑋

𝑋̃

For the purpose of inferring properties of 𝜃, The fake data 𝑋̃ is as


good as 𝑋

9
Interpretation 2: estimator using 𝑇 (𝑋) alone cannot be worse

Theorem 3.3 in Keener


Suppose 𝑇 (𝑋) is sufficient. Then for any estimator 𝛿(𝑋) of 𝑔(𝜃)
there exists a randomized estimator based on 𝑇 that has the same
risk function as 𝛿(𝑋).

10
Interpretation 2: estimator using 𝑇 (𝑋) alone cannot be worse

Theorem 3.3 in Keener


Suppose 𝑇 (𝑋) is sufficient. Then for any estimator 𝛿(𝑋) of 𝑔(𝜃)
there exists a randomized estimator based on 𝑇 that has the same
risk function as 𝛿(𝑋).

Proof sketch: generate 𝑋̃ from 𝑇 (random step), then 𝛿(𝑋)


̃ should be as good as
𝛿(𝑋).

10
Factorization theorem
Motivation for factorization theorem

• The sufficiency definition is hard to work with in practice


• There is a convenient way to verify sufficiency by factorizing
the density

11
Factorization theorem

Theorem 3.6 in Keener


Let P = {𝑃𝜃 , 𝜃 ∈ Ω} be a family of distributions dominated by 𝜇
(𝑃𝜃 ≪ 𝜇, ∀𝜃). 𝑇 is sufficient for P iff there exists functions 𝑔𝜃 , ℎ
such that

𝑝𝜃 (𝑥) = 𝑔𝜃 (𝑇 (𝑥))ℎ(𝑥), a.e.

12
proof (see the rigorous proof in Keener 6.4):

13
Ex1: exponential family

𝑇 (𝑋) is sufficient statistics by factorization theorem



𝑝𝜃 (𝑥) = 𝑒𝜂(𝜃) 𝑇 (𝑥)−𝐵(𝜃)
ℎ(𝑥)

14
Ex2: joint distributuon of i.i.d. uniform

i.i.d.
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑈 [𝜃, 𝜃 + 1]. The joint density is
𝑛
𝑝𝜃 (𝑥) = ∏ 1{𝜃≤𝑥𝑖 ≤𝜃+1}
𝑖=1

= 1{𝜃≤𝑥(1) ,…𝑥(𝑛) ≤𝜃+1} .

The order statistics (𝑋(𝑖) )𝑛𝑖=1 (where 𝑋(𝑘) is the 𝑘-th smallest) is
sufficient

15
Ex3: joint distribution invariant to permutations of 𝑋𝑖

i.i.d. (1)
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑃𝜃 . The joint distribution 𝑃𝜃 is invariant
to permutations of 𝑋 = (𝑋1 , … , 𝑋𝑛 ). What are some sufficient
statistics?

16
Ex3: joint distribution invariant to permutations of 𝑋𝑖

i.i.d. (1)
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑃𝜃 . The joint distribution 𝑃𝜃 is invariant
to permutations of 𝑋 = (𝑋1 , … , 𝑋𝑛 ). What are some sufficient
statistics?

• Order statistics
• Empirical distribution

̂ 1 𝑛
𝑃𝑛 (⋅) = ∑ 𝛿𝑋𝑖 (⋅)
𝑛 𝑖=1

where 𝛿𝑋𝑖 (𝐴) = 1𝑋𝑖 ∈𝐴 .

In the above case, 𝑝𝜃 (𝑋 = 𝑥 ∣ 𝑇 = 𝑡) is a combinatorial problem that does not


depend on 𝜃

16
Minimal sufficiency
Motivation for minimal sufficiency

i.i.d
Consider 𝑋1 , … , 𝑋𝑛 ∼ 𝒩(𝜃, 1). Among the four sufficient
statistics
𝑛
• ∑𝑖=1 𝑋𝑖
𝑛
• 𝑋̄ = 𝑛1 ∑𝑖=1 𝑋𝑖
• 𝑂(𝑋) = (𝑋(1) , … , 𝑋(𝑛) )
• 𝑋 = (𝑋1 , … , 𝑋𝑛 )

which can be recovered from which?

17
Ordering of sufficient statistics

Proposition
If 𝑇 (𝑋) is sufficient and there exists 𝑓 such that 𝑇 (𝑋) = 𝑓(𝑆(𝑋)),
then 𝑆(𝑋) is sufficient

proof: factorization theorem

18
Minimal sufficiency

We say 𝑇 is minimal sufficient if

• 𝑇 is sufficient
• For any other sufficient statistics 𝑆, there exists 𝑓 such that

𝑇 = 𝑓(𝑆)

(a.e. P)

19
Example: which is minimal sufficient?

Suppose 𝑋1 , … , 𝑋2𝑛 are i.i.d. from 𝒩(𝜃, 1), which of the following
statistics is sufficient? is minimal?

• 𝑋1
𝑛
∑𝑖=1 𝑋𝑖
• 𝑇̃ = ( 2𝑛 )
∑𝑖=𝑛+1 𝑋𝑖
2𝑛
• ∑𝑖=1 𝑋𝑖

20
Another way to check minimal sufficiency

Theorem 3.11 in Keener


Suppose P = {𝑃𝜃 ∶ 𝜃 ∈ Ω} is a dominated family with densities
𝑝𝜃 (𝑥) = 𝑔𝜃 (𝑇 (𝑥))ℎ(𝑥). If 𝑝𝜃 (𝑥) ∝𝜃 𝑝𝜃 (𝑦) implies 𝑇 (𝑥) = 𝑇 (𝑦), then
𝑇 is minimal sufficient.

Interpretation: 𝑇 is sufficient, if there is one-to-one relation


between the statistics and the likelihood shape

21
proof:

22
Ex1: minimal sufficient statistics in exponential family


𝑝𝜃 (𝑥) = 𝑒𝜂(𝜃) 𝑇 (𝑥)−𝐵(𝜃)
ℎ(𝑥)

Is 𝑇 (𝑋) minimal sufficient?

23
Ex2: two parameter Gaussian

Suppose 𝑋 ∼ 𝒩(𝜇(𝜃), 𝕀2 ), 𝜃 ∈ ℝ. 𝜇(𝜃) = 𝑎 + 𝜃𝑏, for 𝑎, 𝑏 ∈ ℝ2 .


Which is sufficient, which is minimal sufficient?
• 𝑋
• 𝑏⊤ 𝑋

24
Ex3: Laplace location family
i.i.d. (1)
Suppose 𝑋1 , … , 𝑋𝑛 ∼ 𝑝𝜃 (𝑥) = 12 𝑒−|𝑥−𝜃| . Then the joint density
is
𝑛
1
𝑝𝜃 (𝑥) = 𝑛 exp {− ∑ |𝑥𝑖 − 𝜃|}
2 𝑖=1

Is the order statistics sufficient?

25
Summary

• Sufficient statistic 𝑇 : when 𝑇 is known, no information about


𝜃 is left
• Factorization theorem: is a convenient way to check
sufficiency
• It is possible to order the sufficient statstics and define the
minimal sufficient statistics.

26
What is next?

• Completeness
• Ancillarity
• Basu’s theorem (relationship between sufficient and ancillary
statistics)

27
Thank you

28
29

You might also like