Single-Parameter Models: Binomial, Normal, Poisson, Exponential

Single-Parameter Models
Binomial, Normal, Poisson, Exponential
Peter John B. Aranas

Parameter Estimation
• Parameter - fancy term for a quantity or a number that is unknown
• Example: How many people are currently in Metro Manila? 14,158,573?
We could call this number of people in Metro Manila right now as θ.
• Bayesian method: list of possible hypotheses is the list of possible
values for the unknown parameter.
• Start with prior distribution for the unknown parameter(s)
• Update the prior distribution to get a posterior distribution for the
unknown parameter(s)
• In the above example, what is the possible list of hypotheses?

• “A quantity which has a probability associated with each possible value is
traditionally called a “random variable”. Random variables have probability
distributions associated with them. In Bayesian stats, an unknown parameter
looks mathematically like a “random variable”, but we will avoid the word
random because it has connotations about something that luctuates or
varies. In Bayesian statistics, the prior distribution and posterior distribution
only describe our uncertainty. The actual parameter is a single ixed number.”
• Features in any Bayesian parameter estimation problem:
• Prior distribution
• Likelihood
• Posterior distribution

f
f

• Estimating Proportion Problem:
• Mark is new to Auckland and he decided that he would take the bus to
work each day. However, he wasn’t very con ident with the bus system
in the new city, so for the irst week he just took the irst bus that came
along and was heading in the right direction, towards the city. In the
irst week, he caught 5 morning buses. Of these 5 buses, two of them
took him to the right place, while three of them took him far from work,
leaving him with an extra 20 minute walk. Given this information, he
would like to try to infer the proportion of the buses that are “good”, that
would take him right to campus (Mark is a professor).
f
f

f
f
• Let θ be the proportion of buses that are “good”; 0 ≤θ≤1
• To keep things simple assume that the set of possible values for θ is
{0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1}.
• This discrete approximation means that we can use Bayes’ Box.
• We assume that before we get the data, we were “very uncertain” about
the value of θ, and this can be modeled by using a uniform prior
distribution.
• Since there are 11 possible values that are considered in our discrete
approximation, the probability of each is 1/11 = 0.0909.

• To get the likelihoods, we need to think about the properties of our experiment; we
imagine that we knew the value of θ and were trying to predict what experimental
outcome would occur.
• We want to ind the probability of our actual data set (2 out of 5 buses were “good”), for
all possible values of θ.
• If there are N repetitions of a “random experiment” and the “success” probability is θ

for each repetition, then the number of “successes” x has a binomial distribution
(x)
N x N−x
• p(x | θ) = θ (1 − θ)
(2)
5 2 3 2 3
• Actual likelihood: P(x = 2 | θ) = θ (1 − θ) = 10θ (1 − θ)
f

• The inal steps are to multiply the prior by the likelihood and then normalize
that to get the posterior distribution.
• What is the likelihood for θ = 0?

• What is the likelihood for θ = 1?
• Prediction
• What is the probability of catching a good bus tomorrow?
∑
P(good bus tomorrow| x) = p(θ | x)P(good bus tomorrow| x)
•
θ
∑
P(good bus tomorrow| x) = p(θ | x) θ
•
θ
f

Estimating a Probability from a Binomial Data

• Binomial distribution - natural model for data that arise from a sequence of n
exchangeable trials or draws from a large population where each trial gives rise
to either “success” or “failure.”
(y)
n y n−y
• Simple binomial model: p(y | θ) = Bin(y | n, θ) = θ (1 − θ)
• estimates population proportion from y1, …, yn each of which is either 0 or 1.
• Data can be summarized by the total number of successes in the n trials; we

denote it by y
• We let θ represent the proportion of successes in the population, or

equivalently, the probability of success in each trial.


• Proportion of births that are female is of interest - we denote this by θ
• Alternative way of reporting this parameter is as a ratio of male to female birth
1−θ
rates, ϕ = .
θ
• Let y be the number of girls in n recorded births.
• To perform Bayesian inference, we assume that the prior distribution for θ is

uniform on the interval [0,1].
• Using Bayes’ rule, the posterior density for θ is

y n−y
• p(θ | y) ∝ θ (1 − θ) (2.2)
• θ | y ∼ Beta(y + 1,n − y + 1) - unnormalized Beta distribution


• In 1763, Reverend Thomas Bayes sought the probability P(θ ∈ (θ1, θ2) | y).
• prior distribution: A ball W is randomly thrown (according to a uniform distribution on the
table). The horizontal position of the ball on the table is θ, expressed as a fraction of the table
width.
• likelihood: A ball O is randomly thrown n times. The value of y is the number of times O lands
to the right of W.
• From the above solution, θ is assumed to have a (prior) uniform distribution on [0,1].
θ2
P(θ ∈ (θ1, θ2), y) ∫θ p(y | θ)p(θ)dθ
• P(θ ∈ (θ1 , θ2 ) | y) = = 1
p(y) p(y)
∫θ ( y ) θ (1
θ2 n y n−y
− θ) dθ
=
1
• (2.1)
p(y)

• The denominator p(y) is given by
∫0 ( y )
1
n y n−y 1
•
p(y) = θ (1 − θ) dθ = , y = 0,1,…, n (2.2)
n+1
• This shows that all possible values of y are equally likely a priori.
• Pierre Simon Laplace independently discovered Bayes’ theorem. He
y n−y
expanded the function θ (1 − θ) around its maximum at θ = y/n and
evaluated the incomplete beta integral in (2.1) using what we now know as
the normal approximation.


• In the binomial example with the uniform prior distribution, the prior predictive distribution can be
evaluated explicitly using (2.2) which suggests that all values of y are equally likely, a priori.
• For posterior prediction from this model, we might be more interested in the outcome of one new
trial, rather than another set of n new trials. Letting ỹ denote the result of a new trial, exchangeable
with the irst n
1
∫0
•
P(ỹ = 1 | y) = P(ỹ = 1 | θ, y)p(θ | y)dθ
1
y+1
∫0
•
= θp(θ | y)dθ = E(θ | y) = (2.3)
n+2
• The result above is known as ‘Laplace’s Law of succession’.
1 n+1
• At the extreme observations y = 0 and y = n , Laplace’s law predicts probabilities of and ,
n+2 n+2
respectively.
f

Posterior as a compromise between data and prior information

• Note that the posterior distribution is less variable than the prior distribution:
• E(θ) = E(E(θ | y)) (2.4)
• var(θ) = E(var(θ | y)) + var(E(θ | y)) (2.5)
• The prior mean of θ is the average of all possible posterior means over the distribution of possible
data
• The posterior variance is on average smaller than the prior variance, by an amount that depends on
the variation in posterior means over the distribution of possible data.
1
• In the binomial example with the uniform prior distribution, the prior mean is and the prior
2
1 y+1
variance is . The posterior mean, , is a compromise between the prior mean and the
12 n+2
y
sample proportion, where clearly the prior mean has a smaller and smaller role as the size of the
n
data sample increases.
Summarizing Posterior Inference

• Various numerical summaries (point summaries)
• Location: mean, median, and modes
• Variation: standard deviation, IQR, and other quantiles
• Posterior quantiles and intervals - report posterior uncertainty
• Posterior interval - central interval of posterior probability which
corresponds to the range of values above and below which lies exactly
100(α/2) % of the posterior probability
• Highest posterior density region - the set of values that contains
100(1 − α) % of the posterior probability; density within the region is
never lower than that outside

Single-Parameter Models: Binomial, Normal, Poisson, Exponential

Uploaded by

Copyright:

Available Formats

You might also like

Single-Parameter Models: Binomial, Normal, Poisson, Exponential

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Single-Parameter Models: Binomial, Normal, Poisson, Exponential

Uploaded by

Copyright:

Available Formats

Single-Parameter Models

Binomial, Normal, Poisson, Exponential

Peter John B. Aranas

• If there are N repetitions of a “random experiment” and the “success” probability is θ

• What is the likelihood for θ = 0?

Estimating a Probability from a Binomial Data

• estimates population proportion from y1, …, yn each of which is either 0 or 1.

• Data can be summarized by the total number of successes in the n trials; we

• We let θ represent the proportion of successes in the population, or

Estimating a Probability from a Binomial Data

• To perform Bayesian inference, we assume that the prior distribution for θ is

• Using Bayes’ rule, the posterior density for θ is

• θ | y ∼ Beta(y + 1,n − y + 1) - unnormalized Beta distribution

Estimating a Probability from a Binomial Data

Estimating a Probability from a Binomial Data

Estimating a Probability from a Binomial Data

Posterior as a compromise between data and prior information

• E(θ) = E(E(θ | y)) (2.4)

• var(θ) = E(var(θ | y)) + var(E(θ | y)) (2.5)

Summarizing Posterior Inference

You might also like