Week10 Slides

Bayesian estimation and hypothesis testing
Andrew Thangaraj
IIT Madras
Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 1 / 15

Section 1
Bayesian estimation

Parameter estimation
X1 , . . . , Xn ∼ iid X , parameter θ
Two schools of thought for design of estimators


Frequentist: treat θ as an unknown constant
I Method of moments
I Maximum likelihood


Frequentist: treat θ as an unknown constant
I Method of moments
I Maximum likelihood
Bayesian: treat θ as a random variable with a known distribution

I Bayesian estimation

Example 1: Bernoulli(p)
X1 , . . . , Xn ∼ iid Bernoulli(p)
Suppose that p ∼ Uniform{0.25, 0.75}

I Assume p is chosen first at random according to the above distribution
I Once p is chosen, the samples are drawn according to Bernoulli(p)


Samples: 1, 0, 1, 1, 0


Samples: 1, 0, 1, 1, 0
I Notation: S = (X1 = 1, X2 = 0, X3 = 1, X4 = 1, X5 = 0)


Samples: 1, 0, 1, 1, 0
I Estimate using Bayes’ rule


Samples: 1, 0, 1, 1, 0
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.5/P(S) = 0.25


Samples: 1, 0, 1, 1, 0
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.5/P(S) = 0.25
F P(p = 0.75|S) = 0.753 × 0.252 × 0.5/P(S) = 0.75


Samples: 1, 0, 1, 1, 0
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.5/P(S) = 0.25
F P(p = 0.75|S) = 0.753 × 0.252 × 0.5/P(S) = 0.75
F P(S) = 0.253 × 0.752 × 0.5 + 0.753 × 0.252 × 0.5 = 0.252 × 0.752 × 0.5


Samples: 1, 0, 1, 1, 0
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.5/P(S) = 0.25
F P(p = 0.75|S) = 0.753 × 0.252 × 0.5/P(S) = 0.75
F P(S) = 0.253 × 0.752 × 0.5 + 0.753 × 0.252 × 0.5 = 0.252 × 0.752 × 0.5
I Estimator 1: Since P(p = 0.75|E ) > P(p = 0.25|S), we could estimate
p̂ = 0.75


Samples: 1, 0, 1, 1, 0
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.5/P(S) = 0.25
F P(p = 0.75|S) = 0.753 × 0.252 × 0.5/P(S) = 0.75
F P(S) = 0.253 × 0.752 × 0.5 + 0.753 × 0.252 × 0.5 = 0.252 × 0.752 × 0.5
I Estimator 1: Since P(p = 0.75|E ) > P(p = 0.25|S), we could estimate
p̂ = 0.75
I Estimator 2: Posterior mean,
p̂ = 0.25 P(p = 0.25|S) + 0.75 P(p = 0.75|S) = 0.625

0.9 0.1
Suppose that p ∼ {0.25, 0.75}

0.9 0.1
Samples: 1, 0, 1, 1, 0

0.9 0.1
Samples: 1, 0, 1, 1, 0

0.9 0.1
Samples: 1, 0, 1, 1, 0

0.9 0.1
Samples: 1, 0, 1, 1, 0
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.9/P(S) = 0.75

0.9 0.1
Samples: 1, 0, 1, 1, 0
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.9/P(S) = 0.75
F P(p = 0.75|S) = 0.753 × 0.252 × 0.1/P(S) = 0.25

0.9 0.1
Samples: 1, 0, 1, 1, 0
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.9/P(S) = 0.75
F P(p = 0.75|S) = 0.753 × 0.252 × 0.1/P(S) = 0.25
F P(S) = 0.253 × 0.752 × 0.9 + 0.753 × 0.252 × 0.1 = 0.252 × 0.752 × 0.3

0.9 0.1
Samples: 1, 0, 1, 1, 0
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.9/P(S) = 0.75
F P(p = 0.75|S) = 0.753 × 0.252 × 0.1/P(S) = 0.25
F P(S) = 0.253 × 0.752 × 0.9 + 0.753 × 0.252 × 0.1 = 0.252 × 0.752 × 0.3
I Estimator 1: Since P(p = 0.25|S) > P(p = 0.75|S), we estimate
p̂ = 0.25

0.9 0.1
Samples: 1, 0, 1, 1, 0
F P(p = 0.25|S) = P(S|p = 0.25)P(p = 0.25)/P(S) =
0.253 × 0.752 × 0.9/P(S) = 0.75
F P(p = 0.75|S) = 0.753 × 0.252 × 0.1/P(S) = 0.25
F P(S) = 0.253 × 0.752 × 0.9 + 0.753 × 0.252 × 0.1 = 0.252 × 0.752 × 0.3
I Estimator 1: Since P(p = 0.25|S) > P(p = 0.75|S), we estimate
p̂ = 0.25
I Estimator 2: Posterior mean,
p̂ = 0.25 P(p = 0.25|S) + 0.75 P(p = 0.75|S) = 0.375

Bayesian estimation
X1 , . . . , Xn ∼ iid X , parameter Θ
Prior distribution of Θ: Θ ∼ fΘ (θ)

Bayesian estimation

Samples: x1 , . . . , xn , Notation: S = (X1 = x1 , . . . , Xn = xn )
Bayes’ rule: posterior ∝ likelihood × prior
P(Θ = θ|S) = P(S|Θ = θ)fΘ (θ)/P(S)

Bayesian estimation

Samples: x1 , . . . , xn , Notation: S = (X1 = x1 , . . . , Xn = xn )
Bayes’ rule: posterior ∝ likelihood × prior
P(Θ = θ|S) = P(S|Θ = θ)fΘ (θ)/P(S)
Estimation using “posterior” probability

I Posterior mode: θ̂ = arg maxθ P(S|Θ = θ)fΘ (θ)
I Posterior mean: θ̂ = E [Θ|S], mean of posterior distribution
F (Θ|S) may be a known distribution, and its mean might become a
simple formula in some cases

Meaning of prior distribution
Prior distribution
I Captures what we might know about the parameter
I This could be using some scientific model or expert opinion
Posterior ∝ Likelihood of samples × Prior

I Intuitively understood as incorporating “data” into prior
I Useful in modeling
What if we do not know anything?

I You can choose a flat prior, uniform over the entire range
Lots of debates between frequentists vs Bayesians

I Search “frequestist vs Bayesian”

Section 2
Choice of prior and examples

How to pick prior?
Flat, uninformative
I Nearly flat over the interval in which the parameter takes value
I This usually reduces to something close to maximum likelihood
Conjugate priors
I Pick a prior so that the posterior is in the same class as prior
I Examples
F Prior: Normal and Posterior: Normal
F Prior: Beta and Posterior: Beta
Informative priors
I This needs some justification from the domain of the problem
I Parameterize the prior so that its flatness can be controlled

Bernoulli(p) samples with uniform prior
Prior p ∼ Uniform[0, 1], continuous distribution


Samples: x1 , . . . , xn
Posterior: p|(X1 = x1 , . . . , Xn = xn ) is continuous
I Posterior density ∝ P(X1 = x1 , . . . , Xn = xn |p = p)fp (p)
I Posterior density ∝ p w (1 − p)n−w , 0 ≤ p ≤ 1
F w = x1 + · · · + xn : number of 1s in samples


Posterior density: Beta(w + 1, n − w + 1)

w +1 w +1 x1 +···+xn +1
I Posterior mean = w +1+n−w +1 = n+2 = n+2


Posterior density: Beta(w + 1, n − w + 1)

w +1 w +1 x1 +···+xn +1
I Posterior mean = w +1+n−w +1 = n+2 = n+2
X1 + · · · + Xn + 1
p̂ =
n+2

Bernoulli(p) samples with beta prior
Prior p ∼ Beta(α, β), continuous distribution

I fp (p) ∝ p α−1 (1 − p)β−1 , 0 ≤ p ≤ 1


I fp (p) ∝ p α−1 (1 − p)β−1 , 0 ≤ p ≤ 1
I Posterior density ∝ p w +α−1 (1 − p)n−w +β−1 , 0 ≤ p ≤ 1


I fp (p) ∝ p α−1 (1 − p)β−1 , 0 ≤ p ≤ 1
Posterior density: Beta(w + α, n − w + β)

w +α w +α x1 +···+xn +α
I Posterior mean = w +α+n−w +β = n+α+β = n+α+β


I fp (p) ∝ p α−1 (1 − p)β−1 , 0 ≤ p ≤ 1
Posterior density: Beta(w + α, n − w + β)

w +α w +α x1 +···+xn +α
I Posterior mean = w +α+n−w +β = n+α+β = n+α+β
X1 + · · · + Xn + α
p̂ =

Observations for Beta prior
Prior: Beta(α, β)
I α, β ≥ 0
I PDF ∝ p α−1 (1 − p)β−1 , 0 < p < 1
I How to pick α, beta?
α = β = 1: Uniform[0, 1]
I Flat prior
I Estimate close to, but not equal to, Maximum-Likelihood
α=β=0
I Estimate coincides with Maximum-Likelihood
α=β
I Symmetric prior
α, β may depend
√ on n, the number of samples
I α=β= n/2 is an interesting choice

Normal samples with unknown mean and known variance
X1 , . . . , Xn ∼ iid Normal(M, σ 2 )
Prior M ∼ Normal(µ0 , σ02 ), continuous distribution

2
I fM (µ) = √ 1
2πσ0
exp(− (µ−µ
2σ 2
0)
)
0

X1 , . . . , Xn ∼ iid Normal(M, σ 2 )

2
I fM (µ) = √ 1
2πσ0
exp(− (µ−µ
2σ 2
0)
)
0
Samples: x1 , . . . , xn , Sample mean: x = (x1 + · · · + xn )/n

Posterior: M|(X1 = x1 , . . . , Xn = xn ) is continuous
I Posterior density ∝ f (X1 = x1 , . . . , Xn = xn |M = µ)fM (µ)
2 2 2
I Posterior density ∝ exp(− (x1 −µ) +···+(x
2σ 2
n −µ)
) exp(− (µ−µ
2σ 2
0)
)
0

X1 , . . . , Xn ∼ iid Normal(M, σ 2 )

2
I fM (µ) = √ 1
2πσ0
exp(− (µ−µ
2σ 2
0)
)
0

2 2 2
2σ 2
n −µ)
) exp(− (µ−µ
2σ 2
0)
)
0
Posterior density: Normal

nσ 2 σ 2
I Posterior mean = x nσ2 +σ
0
2 + µ0 nσ 2 +σ 2
0 0

X1 , . . . , Xn ∼ iid Normal(M, σ 2 )

2
I fM (µ) = √ 1
2πσ0
exp(− (µ−µ
2σ 2
0)
)
0

2 2 2
2σ 2
n −µ)
) exp(− (µ−µ
2σ 2
0)
)
0
Posterior density: Normal

nσ 2 σ 2
I Posterior mean = x nσ2 +σ
0
2 + µ0 nσ 2 +σ 2
0 0
X1 + · · · + Xn nσ02 σ2
µ̂ = + µ 0
n nσ02 + σ 2 nσ02 + σ 2
Oberservations for Normal prior
Prior: Normal(µ0 , σ02 )

I How to pick µ0 and σ0 ?
Estimate is combination of data and prior
I Prior is “updated” using data to get posterior
If n is very large, µ̂ → sample mean
I Data dominates the estimate
I Prior plays no significant role
If n is small, prior contributes significantly to the estimate
I Prior needs to have some justification when n is small
If variance of prior is large compared to variance of samples, prior tends
to be flat or uninformative
I Choice of variance of prior is important

Section 3
Problems: Finding estimators

Problem 1
Suppose X is a discrete random variable taking values {0, 1, 2, 3} with
respective probabilities {2θ/3, θ/3, 2(1 − θ)/3, (1 − θ)/3}, where 0 ≤ θ ≤ 1
is a parameter. Consider the estimation of θ from samples
2, 2, 0, 3, 1, 3, 2, 1, 2, 3.
Find the method of moments and maximum likelihood estimates.
Using a Uniform[0, 1] prior, find the posterior distribution and mean.

Working

Problem 2
Consider n iid samples from a Geometric(p) distribution.
Find the method of moments estimate.
Find the MLE.
Using a Uniform[0, 1] prior, find the posterior distribution and mean.

Working

Problem 3
Consider n iid samples from a Poisson(λ) distribution.
Find the method of moments estimate.
Find the MLE.
Using a Gamma[α, β] prior, find the posterior distribution and mean.

Working

Section 4
Problems: Fitting distributions

Problem 1
Fit a Poisson distribution to the following frequency data on number of
vehicles (n) making a right turn at an intersection in a 3-minute interval.
Find an approximate 95% confidence interval for the sample mean using a
normal approximation for the sampling distribution.
n Frequency n Frequency
0 14 7 14
1 30 8 10
2 36 9 6
3 68 10 4
4 43 11 1
5 43 12 1
6 30 13+ 0

Problem 2
Fit a Geometric distribution to the following frequency data on number of
hops (n) between flights of birds. Find an approximate 95% confidence
interval.
n Frequency n Frequency
1 48 7 4
2 31 8 2
3 20 9 1
4 9 10 1
5 6 11 2
6 5 12 1

Working

Problem 3
Data from a genetic experiment and expected distribution in terms of an
unknown parameter θ are given in the following table.
Type Frequency Theory

1 1997 0.25(2 + θ)
2 906 0.25(1 − θ)
3 904 0.25(1 − θ)
4 32 0.25θ
Find the ML estimate for θ.

Working

Section 5
Problems: Model and estimation

Problem 1
To find the size of an animal population, 100 animals are captured and
tagged. Some time later, another 50 animals are captured, and 20 of them
were found to be tagged. How will you estimate the population size? What
are your assumptions?

Working

Problem 2
In a new machine, suppose that, out of 10 produced items, no item was
found to be defective. How will you estimate the fraction of defective items
produced by the new machine? From data collected from other similar
machines, the average of the fraction of defective items was found to be
10%, and the actual fraction was between 5% and 15% in 95% of the cases.

Working

Problem 3
Colab sheet

Week10 Slides

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week10 Slides

Uploaded by

Copyright:

Available Formats

Bayesian estimation and hypothesis testing

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 1 / 15

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 2 / 15

Two schools of thought for design of estimators

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 3 / 15

Two schools of thought for design of estimators

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 3 / 15

Two schools of thought for design of estimators

Bayesian: treat θ as a random variable with a known distribution

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 3 / 15

Suppose that p ∼ Uniform{0.25, 0.75}

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 4 / 15

Suppose that p ∼ Uniform{0.25, 0.75}

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 4 / 15

Suppose that p ∼ Uniform{0.25, 0.75}

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 4 / 15

Suppose that p ∼ Uniform{0.25, 0.75}

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 4 / 15

Suppose that p ∼ Uniform{0.25, 0.75}

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 4 / 15

Suppose that p ∼ Uniform{0.25, 0.75}

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 4 / 15

Suppose that p ∼ Uniform{0.25, 0.75}

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 4 / 15

Suppose that p ∼ Uniform{0.25, 0.75}

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 4 / 15

Suppose that p ∼ Uniform{0.25, 0.75}

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 4 / 15

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 5 / 15

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 5 / 15

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 5 / 15

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 5 / 15

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 5 / 15

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 5 / 15

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 5 / 15

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 5 / 15

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 5 / 15

Prior distribution of Θ: Θ ∼ fΘ (θ)

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 6 / 15

Prior distribution of Θ: Θ ∼ fΘ (θ)

P(Θ = θ|S) = P(S|Θ = θ)fΘ (θ)/P(S)

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 6 / 15

Prior distribution of Θ: Θ ∼ fΘ (θ)

P(Θ = θ|S) = P(S|Θ = θ)fΘ (θ)/P(S)

Estimation using “posterior” probability

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 6 / 15

Posterior ∝ Likelihood of samples × Prior

What if we do not know anything?

Lots of debates between frequentists vs Bayesians

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 7 / 15

Choice of prior and examples

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 8 / 15

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 9 / 15

Prior p ∼ Uniform[0, 1], continuous distribution

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 10 / 15

Prior p ∼ Uniform[0, 1], continuous distribution

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 10 / 15

Prior p ∼ Uniform[0, 1], continuous distribution

Posterior density: Beta(w + 1, n − w + 1)

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 10 / 15

Prior p ∼ Uniform[0, 1], continuous distribution

Posterior density: Beta(w + 1, n − w + 1)

Andrew Thangaraj (IIT Madras) Bayesian estimation and hypothesis testing 10 / 15