Professional Documents
Culture Documents
The Exponential Family of Distributions: P (X) H (X) e
The Exponential Family of Distributions: P (X) H (X) e
θ > T (x)−A(θ)
p(x) = h(x) e
θ vector of parameters
T (x) vector of “suf£cient statistics”
A(θ) cumulant generating function
h(x)
θ T T (x)
Key point: x and θ only “mix” in e
1
The Exponential Family of Distributions
θ > T (x)−A(θ)
p(x) = h(x) e
so Z
>
eA(θ) = h(x) e θ T (x)
dx,
2
Examples
2
/(2σ 2 )
Gaussian p(x) = √ 1
2πσ 2
e−k x−µ k x∈R
x 1−x
Bernoulli p(x) = α (1 − α) x ∈ {0, 1}
¡ n¢ x n−x
Binomial p(x) = x α (1 − α) x ∈ {0, 1, 2, . . . , n}
Qn P
Multinomial p(x) = x1 !x2 !...xn ! i=1 αixi
n!
xi ∈ {0, 1, 2, . . . , n} , i xi = n
3
Natural Parameter form for Bernoulli
θ > T (x)−A(θ)
p(x) = h(x) e
1−x
p(x) = αx (1 − α)
h ¡ x i
1−x ¢
= exp log α (1 − α)
= exp [ x log α + (1 − x) log (1 − α) ]
· ¸
α
= exp x log + log (1 − α)
1−α
£ ¡ θ
¢¤
= exp x θ − log 1 + e
so
α ¡ θ
¢
T (x) = x θ = log A(θ) = log 1 + e
1−α
4
Natural Parameter Form for Gaussian
1 −(x−µ)2 /(2σ 2 )
p(x) = √ e
2πσ 2
µ ¶
1 x2 µx µ2
= √ exp − log σ − 2 + 2 − 2
2π 2σ σ 2σ
1 ¡ > 2 2
¢
= √ exp θ T (x) − log σ − µ /(2σ )
2π | {z }
| {z } A(θ)
h(x)
where
2 µ2
x µ/σ A(θ) = 2σ 2 + log σ
T (x) = θ=
[θ]21
2 2 1
x −1/(2σ ) = − 4[θ]2 − 2 log (−2[θ]2 )
5
Natural Parameter Form for Multivariate
Gaussian
>
p(x) = h(x) e θ T (x)−A(θ)
1 −1
p(x) = D/2 1/2
e−(x−µ)Σ (x−µ)/2
(2π) |Σ|
−D/2 x Σ−1 µ
h(x) = (2π) T (x) = θ=
x x> − 12 Σ−1
6
The £rst derivative of A(θ)
·Z ¸
>
A(θ) = log h(x) e θ T (x)
dx
| {z }
Q(θ)
7
The second derivative of A(θ)
·Z ¸
θ > T (x)
A(θ) = log h(x) e dx
| {z }
Q(θ)
· 0
¸ · ¸ 2
dA(θ) d Q (θ) d 0 1 Q00 (θ) (Q0 (θ))
= = Q (θ) = − 2
dθ dθ Q(θ) dθ Q(θ) Q(θ) (Q(θ))
R >
h(x) e θ T (x) T 2 (x) dx 2
= R − (E pθ [T (x)])
h(x) e θ> T (x) dx
R θ > T (x)−A(θ) 2
h(x) e T (x) dx 2
= R − (E p [T (x)])
h(x) e θ> T (x)−A(θ) dx θ
£ 2 ¤ 2
= Epθ T (x) − (Epθ [T (x)]) = Covpθ [T (x)] º 0.
8
Maximum Likelihood
N
X N h
X i
`(θ) = log p ( xi | θ ) = log h(xi ) + T (xi ) − A(θ)
i=1 i=1
To £nd maxmimum likelihood solution
" N #
X
0
` (θ) = θT T (xi ) − N A0 (θ)
i=1
So ML solution satis£es
N
1 X
A0 (θ̂M L ) = T (xi ) = 0
N i=1
9
Products
10
Conjugate Priors in Bayesian Statistics
p ( x | θ ) p(θ)
p(θ|x) = R
p ( x | θ ) p(θ) dθ
Note: denominator not a function of θ ⇒ just normalizing term
11
Example: Dirichlet and Multinomial
P
Γ( αi ) Y αi −1
i
p(θ) = Q θ Dirichlet in θ Γ(x) = (x − 1)!
i Γ (α i ) i
P n
( i xi )! Y
p(x|θ) = θixi Multinomial in x
x1 !x2 ! . . . xn ! i=1
Y
p(θ|x) ∝ p ( θ | x ) p(θ) = junk × θixi +αi −1
i
12
Conjugate Pairs
Prior Conditional
2
/(2σ 2 ) 2
/(2σ 2 )
Gaussian e−k µ−µ0 k Gaussian e−k x−µ k
Γ(r+s) r−1 s−1 1−x
Beta Γ(r)Γ(s) α (2 − α) Bernoulli αx (1 − α)
P Q αi −1 P
Γ(
Q αi ) ( Q xi )! Q xi
Dirichlet Γ(αi ) θi Multinomial xi ! θi
Inv. Wishart Gaussian (cov)
13