Professional Documents
Culture Documents
The Likelihood, The Prior and Bayes Theorem
The Likelihood, The Prior and Bayes Theorem
Douglas Nychka,
www.image.ucar.edu/~nychka
Supported by the National Science Foundation NCAR/IMAGe summer school Jul 2007
The big picture
L(Y, θ) or [Y |θ]
the conditional density of the data given the parameters.
Y
−4 −2 0 2 4 6
theta
Data is ‘unlikely’ under the dashed density.
Some likelihood examples.
Y = θ + N (0, 1)
Likelihood:
1 − (Y −θ)2
L(Y, θ) = √ e 2
π
Minus log likelihood:
(Y − θ)2
−log(L(Y, θ)) = + log(π)/2
2
-log likelihood usually in simpler algebraically
Note the foreshadowing of a squared term here!
Temperature station data
Observational model:
Yj = T (xj ) + N (0, σ 2) for j = 1, ..., N
2
1 −(Y1−T2(x1))2 1 −(Y2−T2(x2))2 1 −(YN −T2(xN ))
L(Y, θ) = √ e 2σ ×√ e 2σ ×...× √ e 2σ
πσ πσ πσ
combining
PN (Y −T (xj )) 2
1 − j=1 j
L(Y, θ) = √ e 2σ 2
( πσ)N
zj = hj (xi) + N (0, R)
for j = 1, ..., N
and
xi is determined by the dynamical model:
xi+1 = Φ(xi) + G(u)
P (z −h(xi)) 2
1 − i,j j
L(Z, u) = √ e 2R2
( πR)N
Minus log likelihood:
(I−1) N
X X (zj − h(xi))2
−logL(Z, u) = 2
+ (N/2)log(π) + N log(σ)
i=0 j=1 2R
Posterior
The conditional density of the parameters given the
data
Back to first example
Y = θ + N(0,1)
Likelihood
1 − (Y −θ)2
[Y |θ] = √ e 2
π
Prior for θ
2
1 − (θ−µ)
[θ] = √ e 2σ2
πσ
Posterior
2
1 − (Y −θ)2 1 − (θ−µ)
[θ|Y ] ∝ √ e 2 √ e 2σ2
π πσ
Simplying the posterior for Gaussian-Gaussian
posterior mean:
Y ∗ = (Y σ 2 + µ)/(1 + σ 2)
Y ∗ = µ + (Y − µ)σ 2/(1 + σ 2)
Remember that θ has prior mean µ
posterior variance:
(σ ∗)2 = σ 2/(1 + σ 2)
(σ ∗)2 = σ 2 − 1/(1 + σ 2)
−4 −2 0 2 4 6
theta
Prior not very informative
−4 −2 0 2 4 6
theta
Prior is informative
−4 −2 0 2 4 6
theta
-log posterior
(Y − θ)2 (θ − µ)2
+ 2
+ constant
2 2σ
−log likelihood + −log prior
Use climatology!
Posterior= N (T̂ , P a)
+ other stuff.
Prior
-log posterior
(I−1) N
(zj − hj (xi))T Rj −1(zj − hj (xi))/2
X X
+
i=0 j=1
K
−1
(uk − µu,k )T Pu,k
X
(uk − µu,k )/2 +
k=1
Dynamical constraint:
Given the time varying sources and initial concentra-
tions all the subsequent concentrations are found by the
dynamical model.
Due to linear properties of tracers
X = ΩU
Posterior:
Easy to write down but difficult to compute focus on
finding where the posterior is maximized.
A future direction is to draw samples from the posterior.