Professional Documents
Culture Documents
Single-Parameter Models: Binomial, Normal, Poisson, Exponential
Single-Parameter Models: Binomial, Normal, Poisson, Exponential
Single-Parameter Models: Binomial, Normal, Poisson, Exponential
Parameter Estimation
• “A quantity which has a probability associated with each possible value is
traditionally called a “random variable”. Random variables have probability
distributions associated with them. In Bayesian stats, an unknown parameter
looks mathematically like a “random variable”, but we will avoid the word
random because it has connotations about something that luctuates or
varies. In Bayesian statistics, the prior distribution and posterior distribution
only describe our uncertainty. The actual parameter is a single ixed number.”
• Features in any Bayesian parameter estimation problem:
• Prior distribution
• Likelihood
• Posterior distribution
f
f
Parameter Estimation
• Estimating Proportion Problem:
• Mark is new to Auckland and he decided that he would take the bus to
work each day. However, he wasn’t very con ident with the bus system
in the new city, so for the irst week he just took the irst bus that came
along and was heading in the right direction, towards the city. In the
irst week, he caught 5 morning buses. Of these 5 buses, two of them
took him to the right place, while three of them took him far from work,
leaving him with an extra 20 minute walk. Given this information, he
would like to try to infer the proportion of the buses that are “good”, that
would take him right to campus (Mark is a professor).
f
f
f
f
Parameter Estimation
• Let θ be the proportion of buses that are “good”; 0 ≤θ≤1
• To keep things simple assume that the set of possible values for θ is
{0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1}.
• This discrete approximation means that we can use Bayes’ Box.
• We assume that before we get the data, we were “very uncertain” about
the value of θ, and this can be modeled by using a uniform prior
distribution.
• Since there are 11 possible values that are considered in our discrete
approximation, the probability of each is 1/11 = 0.0909.
Parameter Estimation
• To get the likelihoods, we need to think about the properties of our experiment; we
imagine that we knew the value of θ and were trying to predict what experimental
outcome would occur.
• We want to ind the probability of our actual data set (2 out of 5 buses were “good”), for
all possible values of θ.
(x)
N x N−x
• p(x | θ) = θ (1 − θ)
(2)
5 2 3 2 3
• Actual likelihood: P(x = 2 | θ) = θ (1 − θ) = 10θ (1 − θ)
f
Parameter Estimation
• The inal steps are to multiply the prior by the likelihood and then normalize
that to get the posterior distribution.
∑
P(good bus tomorrow| x) = p(θ | x)P(good bus tomorrow| x)
•
θ
∑
P(good bus tomorrow| x) = p(θ | x) θ
•
θ
f
(y)
n y n−y
• Simple binomial model: p(y | θ) = Bin(y | n, θ) = θ (1 − θ)
• likelihood: A ball O is randomly thrown n times. The value of y is the number of times O lands
to the right of W.
• From the above solution, θ is assumed to have a (prior) uniform distribution on [0,1].
θ2
P(θ ∈ (θ1, θ2), y) ∫θ p(y | θ)p(θ)dθ
• P(θ ∈ (θ1 , θ2 ) | y) = = 1
p(y) p(y)
∫θ ( y ) θ (1
θ2 n y n−y
− θ) dθ
=
1
• (2.1)
p(y)
∫0 ( y )
1
n y n−y 1
•
p(y) = θ (1 − θ) dθ = , y = 0,1,…, n (2.2)
n+1
• This shows that all possible values of y are equally likely a priori.
• Pierre Simon Laplace independently discovered Bayes’ theorem. He
y n−y
expanded the function θ (1 − θ) around its maximum at θ = y/n and
evaluated the incomplete beta integral in (2.1) using what we now know as
the normal approximation.
∫0
•
P(ỹ = 1 | y) = P(ỹ = 1 | θ, y)p(θ | y)dθ
1
y+1
∫0
•
= θp(θ | y)dθ = E(θ | y) = (2.3)
n+2
• The result above is known as ‘Laplace’s Law of succession’.
1 n+1
• At the extreme observations y = 0 and y = n , Laplace’s law predicts probabilities of and ,
n+2 n+2
respectively.
f
• The prior mean of θ is the average of all possible posterior means over the distribution of possible
data
• The posterior variance is on average smaller than the prior variance, by an amount that depends on
the variation in posterior means over the distribution of possible data.
1
• In the binomial example with the uniform prior distribution, the prior mean is and the prior
2
1 y+1
variance is . The posterior mean, , is a compromise between the prior mean and the
12 n+2
y
sample proportion, where clearly the prior mean has a smaller and smaller role as the size of the
n
data sample increases.