Lecture02

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

STA732

Statistical Inference
Lecture 02: Exponential families

Yuansi Chen
Spring 2023
Duke University

https://www2.stat.duke.edu/courses/Spring23/sta732.01/

1
Recap from Lecture 01

• Defined statistical inference problem


Statistical experiement, data, statistical model, loss function, risk function

• Discussed how to argue for the optimal estimator


statistical optimality, in addition to empirical success, fast computation,
simplicity, etc.

2
Goal of Lecture 02

• Introduce exponential families


• Examples
• Differential identities (how to get moments and cumulants
from exponential families?)

Chap. 2 in Keener or Chap. 1.5 in Lehmann and Casella

3
Exponential families
Exponential families

An 𝑠-parameter exponential family is a family P = {𝑃𝜂 ∶ 𝜂 ∈ Ξ}


with densities 𝑝𝜂 w.r.t. a common measure 𝜇 on 𝒳 of the form

𝑝𝜂 (𝑥) = exp (𝜂⊤ 𝑇 (𝑥) − 𝐴(𝜂)) ℎ(𝑥)

𝑇 ∶𝒳 → ℝ𝑠 sufficient statistics
ℎ ∶𝒳 → ℝ carrier/base density
𝑠
𝜂∈Ξ⊆ℝ natural parameter
𝐴 ∶ℝ𝑠 → ℝ cumulant-generating function (cgf)

4
Notes on 𝐴(𝜂)

For any 𝜂, the cgf 𝐴(𝜂) is determined by ℎ and 𝑇 . Since ∫ 𝑝𝜂 𝑑𝜇 = 1


holds, we have

𝐴(𝜂) = log [∫ exp (𝜂⊤ 𝑇 (𝑥)) ℎ(𝑥)𝑑𝜇(𝑥)]

• We say 𝑝𝜂 is normalizable if 𝐴(𝜂) < ∞


• So 𝐴(𝜂) is also called the normalizing constant.

5
Example 2.1

Take 𝜇 to be Lebesgue measure on ℝ, 𝑠 = 1, ℎ = 1(0,∞) and


𝑇 (𝑥) = 𝑥. Then we have

𝐴(𝜂) = log ∫ 𝑒𝜂𝑥 𝑑𝑥
0

{log(−1/𝜂), 𝜂<0
=⎨
{
⎩∞, 𝜂 ≥ 0.

What is the corresponding 𝑝𝜂 (𝑥)? What distribution? Is it in the usual form?

6
Notes on the natural parameter

The natural parameter space is the set of all normalizable 𝜂:

Ξ1 = {𝜂 ∶ 𝐴(𝜂) < ∞}

We say P is in canonical form if Ξ = Ξ1 . Sometimes we could take


Ξ ⊂ Ξ1 .

7
Other parameterization for an exponetial family

Take 𝜂 ∶ Ω → Ξ, define

𝑝𝜃 (𝑥) = exp [𝜂(𝜃)⊤ 𝑇 (𝑥) − 𝐵(𝜃)] ℎ(𝑥)


𝐵(𝜃) = 𝐴(𝜂(𝜃))

The family {𝑝𝜃 ∶ 𝜃 ∈ Ω} is also called an exponential family

8
Other parameterization for an exponetial family

Take 𝜂 ∶ Ω → Ξ, define

𝑝𝜃 (𝑥) = exp [𝜂(𝜃)⊤ 𝑇 (𝑥) − 𝐵(𝜃)] ℎ(𝑥)


𝐵(𝜃) = 𝐴(𝜂(𝜃))

The family {𝑝𝜃 ∶ 𝜃 ∈ Ω} is also called an exponential family (Many


distribution belong to exponential families (see Wiki) but often some massaging is
needed to realize)

8
Example 2.2: normal with unknown mean and variance

The normal distribution 𝒩(𝜇, 𝜎2 ), 𝜇 ∈ ℝ, 𝜎2 > 0 has density

1 (𝑥−𝜇)2
𝑝𝜃 (𝑥) = √ 𝑒− 2𝜎2
2𝜋𝜎2
𝜇 1 𝜇2 1
= exp [ 2 𝑥 − 2 𝑥2 − 2 − log (2𝜋𝜎2 )]
𝜎 2𝜎 2𝜎 2

9
Example 2.2: normal with unknown mean and variance

The normal distribution 𝒩(𝜇, 𝜎2 ), 𝜇 ∈ ℝ, 𝜎2 > 0 has density

1 (𝑥−𝜇)2
𝑝𝜃 (𝑥) = √ 𝑒− 2𝜎2
2𝜋𝜎2
𝜇 1 𝜇2 1
= exp [ 2 𝑥 − 2 𝑥2 − 2 − log (2𝜋𝜎2 )]
𝜎 2𝜎 2𝜎 2

We identify
𝜇
𝜇 𝜎2 ) , 𝑇 (𝑥) = ( 𝑥 )
𝜃=( ) , 𝜂(𝜃) = (
𝜎2 − 2𝜎1 2 𝑥2
𝜇2 1
ℎ(𝑥) = 1, 𝐵(𝜃) = 2
+ log (2𝜋𝜎2 )
2𝜎 2
How to write in terms of natural parameters?

9
𝑥
𝑝𝜂 (𝑥) = exp [𝜂⊤ ( 2 ) − 𝐴(𝜂)]
𝑥

where Ξ = {𝜂 ∈ ℝ2 ∣ 𝜂2 < 0} and

−𝜂12 1 𝜋
𝐴(𝜂) = + log (− )
4𝜂2 2 𝜂2

10
{𝑝𝜂 ∶ 𝜂 ∈ Ξ} lives inside a s-dimensional subspace

It is useful to think “log {𝑝𝜂 ∶ 𝜂 ∈ Ξ}” is a subset of an


𝑠-dimensional subspace of the log-density space

• 𝑒𝑓𝜂 (𝑥) is always proportional to a density if integrable


• For exponential family, we can write
𝑓𝜂 (𝑥) = log ℎ(𝑥) + 𝜂⊤ 𝑇 (𝑥) (draw a picture)

11
The form of an exponential family is not unique

Operations to express the same family


1. Change the common measure so ℎ(𝑥) = 1:

𝑑 𝜇̃
𝜇 ⇝ 𝜇̃ with =ℎ
𝑑𝜇

2. Reparameterize so 0 ∈ Ξ: take 𝜂0 ∈ Ξ

𝜂 ⇝ 𝜂 ̃ = 𝜂 − 𝜂0
ℎ ⇝ ℎ̃ = 𝑝𝜂 (𝑥)
0

𝐴 ⇝ 𝐴 ̃ (𝜂)̃ = 𝐴 (𝜂0 + 𝜂)̃ − 𝐴(𝜂0 )

3. Reparameterize with an invertible map ℝ𝑠 → ℝ𝑠 .

...
12
More examples
Example 2.3: joint density of 𝑛 i.i.d. normal

i.i.d.
Given 𝑋1 , … , 𝑋𝑛 ∼ 𝒩(𝜇, 𝜎2 ), the joint density is
𝑛
1 (𝑥𝑖 −𝜇)2
𝑝𝜃 (𝑥) = ∏ [ √ 𝑒− 2𝜎2 ]
𝑖=1 2𝜋𝜎2
𝑛
𝜇 1 2 𝜇2 1
= exp {∑ [ 2
𝑥𝑖 − 2
𝑥𝑖 − 2
− log (2𝜋𝜎2 )]}
𝑖=1
𝜎 2𝜎 2𝜎 2

13
Example 2.3: joint density of 𝑛 i.i.d. normal

i.i.d.
Given 𝑋1 , … , 𝑋𝑛 ∼ 𝒩(𝜇, 𝜎2 ), the joint density is
𝑛
1 (𝑥𝑖 −𝜇)2
𝑝𝜃 (𝑥) = ∏ [ √ 𝑒− 2𝜎2 ]
𝑖=1 2𝜋𝜎2
𝑛
𝜇 1 2 𝜇2 1
= exp {∑ [ 2
𝑥𝑖 − 2
𝑥𝑖 − 2
− log (2𝜋𝜎2 )]}
𝑖=1
𝜎 2𝜎 2𝜎 2

𝜇
𝜎2 ) , 𝑇 (𝑥) ∑ 𝑥𝑖
𝜂(𝜃) = ( =( ) , 𝐵(𝜃) = 𝑛𝐵(1) (𝜃)
− 2𝜎1 2 ∑ 𝑥2𝑖

Ex: in general the joint density of 𝑛 i.i.d. random variables from 𝑠-parameter Exp
family is still an 𝑠-parameter Exp family with the same parameters

13
Example: binomial

For 𝑋 ∼ Binomial(𝑛, 𝜃), 𝑋 has probability mass function

𝑛
𝑝𝜃 (𝑥) = ( ) 𝜃𝑥 (1 − 𝜃)𝑛−𝑥
𝑥
𝑥
𝜃 𝑛
=( ) (1 − 𝜃)𝑛 ( )
1−𝜃 𝑥
𝜃 𝑛
= exp [log ( ) 𝑥 + 𝑛 log(1 − 𝜃)] ( )
1−𝜃 𝑥

This is a 1-parameter exponential family

𝜃
𝑇 (𝑥) = 𝑥, 𝜂(𝜃) = log ( )
1−𝜃

14
Example: Poisson

For 𝑋 ∼ Poisson(𝜃), 𝑋 has probability mass function

𝜆𝑥 𝑒−𝜆
𝑝𝜆 (𝑥) =
𝑥!
1
= exp [log(𝜆)𝑥 − 𝜆]
𝑥!
This is a 1-parameter exponential family

𝜂(𝜆) = log(𝜆)

Ex: try some on Wikipedia: Beta, Gamma, Dirichlet...

15
Differential Identities
Intuition for getting moments from cgf

Because the density integrates to 1, we always have


𝑒𝐴(𝜂) = ∫ 𝑒𝜂 𝑇 (𝑥)
ℎ(𝑥)𝑑𝜇(𝑥)

Whenever a quantity is in the form of “integral of exponential tilt”,


we can obtain moments by differentiating on both sides

16
Intuition for getting moments from cgf

Because the density integrates to 1, we always have


𝑒𝐴(𝜂) = ∫ 𝑒𝜂 𝑇 (𝑥)
ℎ(𝑥)𝑑𝜇(𝑥)

Whenever a quantity is in the form of “integral of exponential tilt”,


we can obtain moments by differentiating on both sides
Be careful: we need to be able to switch the order of derivative and
integral!

16
Theorem 2.4 in Keener

Theorem 2.4
Let Ξ𝑓 be the set of values for 𝜂 ∈ ℝ𝑠 where

∫ |𝑓(𝑥)| exp [𝜂⊤ 𝑇 (𝑥)] ℎ(𝑥)𝑑𝜇(𝑥) < ∞

Then the function

𝑔(𝜂) = ∫ 𝑓(𝑥) exp [𝜂⊤ 𝑇 (𝑥)] ℎ(𝑥)𝑑𝜇(𝑥)

is continuous and has continuous partial derivatives of all orders


for 𝜂 ∈ Ξ𝑜𝑓 .

In particular, taking 𝑓 = 1, 𝐴(𝜂) has all partial derivatives

17
Proof sketch in 1-d (Chap. 2.3. in Keener)

We want to take derivative of 𝑒𝐴(𝜂) = ∫ exp [𝜂𝑇 (𝑥)] ℎ(𝑥)𝑑𝜇(𝑥)


inside integral

• Sufficient to consider 𝜂 ∈ (−3𝜖, 3𝜖) and show the derivative at


𝜂=0
• Idea: use dominated convergence theorem
• Construct a sequence that converges to the actual derivative

18
Proof:

19
What do we get by differentiating 𝐴(𝜂)?

By differentiating once, show that

∇𝐴(𝜂) = 𝔼𝜂 [𝑇 (𝑋)]

Because
𝜕 𝐴(𝜂) 𝜕
𝑒 = ∫ exp [𝜂⊤ 𝑇 (𝑥)] ℎ(𝑥)𝑑𝜇(𝑥)
𝜕𝜂𝑗 𝜕𝜂𝑗

20
Differentiating twice

By differentiating twice, show that

∇2 𝐴(𝜂) = Var𝜂 [𝑇 (𝑋)]

21
Example: Poisson

𝜆𝑥 𝑒−𝜆
𝑝𝜆 (𝑥) =
𝑥!

𝑇 (𝑥) = 𝑥, 𝜂(𝜆) = log(𝜆), 𝐵(𝜆) = 𝜆

22
Example: Poisson

𝜆𝑥 𝑒−𝜆
𝑝𝜆 (𝑥) =
𝑥!

𝑇 (𝑥) = 𝑥, 𝜂(𝜆) = log(𝜆), 𝐵(𝜆) = 𝜆

For the natural parameter 𝜂, 𝐴(𝜂) = 𝑒𝜂 , then

𝑑𝑒𝜂
𝔼𝜂 [𝑋] = = 𝑒𝜂 = 𝜆
𝑑𝜂
𝑑2
Var𝜂 [𝑋] = 2 𝑒𝜂 = 𝑒𝜂 = 𝜆
𝑑𝜂

22
Moment-generating function

For 𝑇 a random vector in ℝ𝑠 , the moment generating function of 𝑇


is

𝑀𝑇 (𝑢) = 𝔼 [𝑒𝑢 𝑇
]

The cumulant generating function is

𝐾𝑇 (𝑢) = log(𝑀𝑇 (𝑢))

23
Useful properties of moment-generating function

1. If two random variables have the same moment-generating


function, then the have the same distribution
2. Moments of 𝑇 , denoted by
𝑟 𝑟
𝔼[𝑇1 1 × ⋯ × 𝑇𝑠 𝑠 ]

can be found by differentiating 𝑀𝑇 at 𝑢 = 0

𝜕 𝑟1 𝜕 𝑟𝑠
𝑟1 ⋯ 𝑟 𝑀 (𝑢)∣
𝜕𝑢1 𝜕𝑢𝑠𝑠 𝑡 𝑢=0

24
Moment-generating function of exponential family

𝑇 (𝑋) ⊤
𝑀𝜂 (𝑢) = 𝔼𝜂 [𝑒𝑢 𝑇 (𝑋)
]

𝑇 𝜂⊤ 𝑇 −𝐴(𝜂)
= ∫ 𝑒𝑢 𝑒 ℎ𝑑𝜇

= 𝑒𝐴(𝜂+𝑢)−𝐴(𝜂) ∫ 𝑒(𝜂+𝑢)⊤ 𝑇 −𝐴(𝜂+𝑢) ℎ𝑑𝜇


⏟⏟⏟⏟⏟⏟⏟⏟⏟
=1
𝐴(𝜂+𝑢)−𝐴(𝜂)
=𝑒

25
Moment-generating function of exponential family

𝑇 (𝑋) ⊤
𝑀𝜂 (𝑢) = 𝔼𝜂 [𝑒𝑢 𝑇 (𝑋)
]

𝑇 𝜂⊤ 𝑇 −𝐴(𝜂)
= ∫ 𝑒𝑢 𝑒 ℎ𝑑𝜇

= 𝑒𝐴(𝜂+𝑢)−𝐴(𝜂) ∫ 𝑒(𝜂+𝑢)⊤ 𝑇 −𝐴(𝜂+𝑢) ℎ𝑑𝜇


⏟⏟⏟⏟⏟⏟⏟⏟⏟
=1
𝐴(𝜂+𝑢)−𝐴(𝜂)
=𝑒

Hence, the cumulant generating function is

𝐾𝑇 (𝑢) = 𝐴(𝑢 + 𝜂) − 𝐴(𝜂)

25
Relationship between the moments and cumulants

For 𝑠 = 1, from 𝑀 = 𝑒𝐾 , we get

𝑀 ′ = 𝐾 ′ 𝑒𝐾 ⇒ 𝔼[𝑇 ] = 𝜅1
𝑀 ″ = (𝐾 ″ + 𝐾 ′2 )𝑒𝐾 ⇒ 𝔼[𝑇 2 ] = 𝜅2 + 𝜅21
𝑀 ‴ = (𝐾 ‴ + 3𝐾 ′ 𝐾 ″ + 𝐾 ′3 )𝑒𝐾 ⇒ 𝔼[𝑇 3 ] = 𝜅3 + 3𝜅1 𝜅2 + 𝜅31

26
Exampe 2.11: moments of normal

• Unknown 𝜇, but known 𝜎2


• Unknown 𝜇 and 𝜎2

27
Proof:

28
Summary of useful properties of exponential families

𝑝𝜂 (𝑥) = exp (𝜂⊤ 𝑇 (𝑥) − 𝐴(𝜂)) ℎ(𝑥)

1. The natural parameter space is convex


2. The joint density of 𝑛 i.i.d. exponential family densities is still
in an exponential family
3. Sufficient statistics 𝑇 (𝑥)
4. 𝐴(𝜂) infinitely differentiable (Theorem 2.4): easy to get
moments

29
What is next?

• Sufficiency
• Factorization theorem
• Minimal sufficiency

30
Thank you

31
32

You might also like