Professional Documents
Culture Documents
Week 10
Week 10
Week 10
Andrew Thangaraj
IIT Madras
Bayesian estimation
X1 , . . . , Xn ∼ iid X , parameter θ
X1 , . . . , Xn ∼ iid X , parameter θ
X1 , . . . , Xn ∼ iid X , parameter θ
X1 , . . . , Xn ∼ iid Bernoulli(p)
X1 , . . . , Xn ∼ iid Bernoulli(p)
Samples: 1, 0, 1, 1, 0
X1 , . . . , Xn ∼ iid Bernoulli(p)
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
X1 , . . . , Xn ∼ iid Bernoulli(p)
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
X1 , . . . , Xn ∼ iid Bernoulli(p)
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.5/P(S) = 0.25
X1 , . . . , Xn ∼ iid Bernoulli(p)
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.5/P(S) = 0.25
F P(p = 0.75|S) = 0.753 × 0.252 × 0.5/P(S) = 0.75
X1 , . . . , Xn ∼ iid Bernoulli(p)
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.5/P(S) = 0.25
F P(p = 0.75|S) = 0.753 × 0.252 × 0.5/P(S) = 0.75
F P(S) = 0.253 × 0.752 × 0.5 + 0.753 × 0.252 × 0.5 = 0.252 × 0.752 × 0.5
X1 , . . . , Xn ∼ iid Bernoulli(p)
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.5/P(S) = 0.25
F P(p = 0.75|S) = 0.753 × 0.252 × 0.5/P(S) = 0.75
F P(S) = 0.253 × 0.752 × 0.5 + 0.753 × 0.252 × 0.5 = 0.252 × 0.752 × 0.5
I Estimator 1: Since P(p = 0.75|E ) > P(p = 0.25|S), we could estimate
p̂ = 0.75
X1 , . . . , Xn ∼ iid Bernoulli(p)
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.5/P(S) = 0.25
F P(p = 0.75|S) = 0.753 × 0.252 × 0.5/P(S) = 0.75
F P(S) = 0.253 × 0.752 × 0.5 + 0.753 × 0.252 × 0.5 = 0.252 × 0.752 × 0.5
I Estimator 1: Since P(p = 0.75|E ) > P(p = 0.25|S), we could estimate
p̂ = 0.75
I Estimator 2: Posterior mean,
p̂ = 0.25 P(p = 0.25|S) + 0.75 P(p = 0.75|S) = 0.625
X1 , . . . , Xn ∼ iid Bernoulli(p)
0.9 0.1
Suppose that p ∼ {0.25, 0.75}
X1 , . . . , Xn ∼ iid Bernoulli(p)
0.9 0.1
Suppose that p ∼ {0.25, 0.75}
Samples: 1, 0, 1, 1, 0
X1 , . . . , Xn ∼ iid Bernoulli(p)
0.9 0.1
Suppose that p ∼ {0.25, 0.75}
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
X1 , . . . , Xn ∼ iid Bernoulli(p)
0.9 0.1
Suppose that p ∼ {0.25, 0.75}
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
X1 , . . . , Xn ∼ iid Bernoulli(p)
0.9 0.1
Suppose that p ∼ {0.25, 0.75}
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.9/P(S) = 0.75
X1 , . . . , Xn ∼ iid Bernoulli(p)
0.9 0.1
Suppose that p ∼ {0.25, 0.75}
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.9/P(S) = 0.75
F P(p = 0.75|S) = 0.753 × 0.252 × 0.1/P(S) = 0.25
X1 , . . . , Xn ∼ iid Bernoulli(p)
0.9 0.1
Suppose that p ∼ {0.25, 0.75}
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.9/P(S) = 0.75
F P(p = 0.75|S) = 0.753 × 0.252 × 0.1/P(S) = 0.25
F P(S) = 0.253 × 0.752 × 0.9 + 0.753 × 0.252 × 0.1 = 0.252 × 0.752 × 0.3
X1 , . . . , Xn ∼ iid Bernoulli(p)
0.9 0.1
Suppose that p ∼ {0.25, 0.75}
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.9/P(S) = 0.75
F P(p = 0.75|S) = 0.753 × 0.252 × 0.1/P(S) = 0.25
F P(S) = 0.253 × 0.752 × 0.9 + 0.753 × 0.252 × 0.1 = 0.252 × 0.752 × 0.3
I Estimator 1: Since P(p = 0.25|S) > P(p = 0.75|S), we estimate
p̂ = 0.25
X1 , . . . , Xn ∼ iid Bernoulli(p)
0.9 0.1
Suppose that p ∼ {0.25, 0.75}
Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)
I Estimate using Bayes’ rule
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.9/P(S) = 0.75
F P(p = 0.75|S) = 0.753 × 0.252 × 0.1/P(S) = 0.25
F P(S) = 0.253 × 0.752 × 0.9 + 0.753 × 0.252 × 0.1 = 0.252 × 0.752 × 0.3
I Estimator 1: Since P(p = 0.25|S) > P(p = 0.75|S), we estimate
p̂ = 0.25
I Estimator 2: Posterior mean,
p̂ = 0.25 P(p = 0.25|S) + 0.75 P(p = 0.75|S) = 0.375
X1 , . . . , Xn ∼ iid X , parameter Θ
X1 , . . . , Xn ∼ iid X , parameter Θ
X1 , . . . , Xn ∼ iid X , parameter Θ
Prior distribution
I Captures what we might know about the parameter
I This could be using some scientific model or expert opinion
Flat, uninformative
I Nearly flat over the interval in which the parameter takes value
I This usually reduces to something close to maximum likelihood
Conjugate priors
I Pick a prior so that the posterior is in the same class as prior
I Examples
F Prior: Normal and Posterior: Normal
F Prior: Beta and Posterior: Beta
Informative priors
I This needs some justification from the domain of the problem
I Parameterize the prior so that its flatness can be controlled
X1 , . . . , Xn ∼ iid Bernoulli(p)
X1 , . . . , Xn ∼ iid Bernoulli(p)
X1 , . . . , Xn ∼ iid Bernoulli(p)
X1 , . . . , Xn ∼ iid Bernoulli(p)
X1 + · · · + Xn + 1
p̂ =
n+2
X1 , . . . , Xn ∼ iid Bernoulli(p)
X1 , . . . , Xn ∼ iid Bernoulli(p)
Samples: x1 , . . . , xn
Posterior: p|(X1 = x1 , . . . , Xn = xn ) is continuous
I Posterior density ∝ P(X1 = x1 , . . . , Xn = xn |p = p)fp (p)
I Posterior density ∝ p w +α−1 (1 − p)n−w +β−1 , 0 ≤ p ≤ 1
F w = x1 + · · · + xn : number of 1s in samples
X1 , . . . , Xn ∼ iid Bernoulli(p)
Samples: x1 , . . . , xn
Posterior: p|(X1 = x1 , . . . , Xn = xn ) is continuous
I Posterior density ∝ P(X1 = x1 , . . . , Xn = xn |p = p)fp (p)
I Posterior density ∝ p w +α−1 (1 − p)n−w +β−1 , 0 ≤ p ≤ 1
F w = x1 + · · · + xn : number of 1s in samples
X1 , . . . , Xn ∼ iid Bernoulli(p)
Samples: x1 , . . . , xn
Posterior: p|(X1 = x1 , . . . , Xn = xn ) is continuous
I Posterior density ∝ P(X1 = x1 , . . . , Xn = xn |p = p)fp (p)
I Posterior density ∝ p w +α−1 (1 − p)n−w +β−1 , 0 ≤ p ≤ 1
F w = x1 + · · · + xn : number of 1s in samples
X1 + · · · + Xn + α
p̂ =
n+β
Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 11 / 15
Observations for Beta prior
Prior: Beta(α, β)
I α, β ≥ 0
I PDF ∝ p α−1 (1 − p)β−1 , 0 < p < 1
I How to pick α, beta?
α = β = 1: Uniform[0, 1]
I Flat prior
I Estimate close to, but not equal to, Maximum-Likelihood
α=β=0
I Estimate coincides with Maximum-Likelihood
α=β
I Symmetric prior
α, β may depend
√ on n, the number of samples
I α=β= n/2 is an interesting choice
X1 , . . . , Xn ∼ iid Normal(M, σ 2 )
X1 , . . . , Xn ∼ iid Normal(M, σ 2 )
X1 , . . . , Xn ∼ iid Normal(M, σ 2 )
X1 , . . . , Xn ∼ iid Normal(M, σ 2 )
X1 + · · · + Xn nσ02 σ2
µ̂ = + µ 0
n nσ02 + σ 2 nσ02 + σ 2
Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 13 / 15
Oberservations for Normal prior
n Frequency n Frequency
0 14 7 14
1 30 8 10
2 36 9 6
3 68 10 4
4 43 11 1
5 43 12 1
6 30 13+ 0
n Frequency n Frequency
1 48 7 4
2 31 8 2
3 20 9 1
4 9 10 1
5 6 11 2
6 5 12 1