Professional Documents
Culture Documents
Bayesian Statistical Analysis: Chapter 1: Fundamentals of Bayesian Inference
Bayesian Statistical Analysis: Chapter 1: Fundamentals of Bayesian Inference
Tang Yin-cai
yctang@stat.ecnu.edu.cn
Statistical Inference
Statistical Inference is a problem in which data Stat Inference
have been generated in accordance with some un- Classical ...
Bayesian ...
known probability distribution which must be ana- Bayesian Theorem
Difference
lyzed and some type of inferences about the un- Subj. and Obj.
known distributions to be made.
Statistical Inference
Statistical Inference is a problem in which data Stat Inference
have been generated in accordance with some un- Classical ...
Bayesian ...
known probability distribution which must be ana- Bayesian Theorem
Difference
lyzed and some type of inferences about the un- Subj. and Obj.
known distributions to be made.
In other words, in a statistics problem, there are
two or more probability distributions which may
have generated the data. By analyzing the data,
we attempt
Statistical Inference
Statistical Inference is a problem in which data Stat Inference
have been generated in accordance with some un- Classical ...
Bayesian ...
known probability distribution which must be ana- Bayesian Theorem
Difference
lyzed and some type of inferences about the un- Subj. and Obj.
known distributions to be made.
In other words, in a statistics problem, there are
two or more probability distributions which may
have generated the data. By analyzing the data,
we attempt
■ to learn about the unknown distribution,
Statistical Inference
Statistical Inference is a problem in which data Stat Inference
have been generated in accordance with some un- Classical ...
Bayesian ...
known probability distribution which must be ana- Bayesian Theorem
Difference
lyzed and some type of inferences about the un- Subj. and Obj.
known distributions to be made.
In other words, in a statistics problem, there are
two or more probability distributions which may
have generated the data. By analyzing the data,
we attempt
■ to learn about the unknown distribution,
Statistical Inference
Statistical Inference is a problem in which data Stat Inference
have been generated in accordance with some un- Classical ...
Bayesian ...
known probability distribution which must be ana- Bayesian Theorem
Difference
lyzed and some type of inferences about the un- Subj. and Obj.
known distributions to be made.
In other words, in a statistics problem, there are
two or more probability distributions which may
have generated the data. By analyzing the data,
we attempt
■ to learn about the unknown distribution,
possible distribution
S
is actually
CHOOL OFF
the
S
INANCE AND
correct one.
TAT I S T I C S
£Ø
Difference
Subj. and Obj.
However, based on our epistemological( )
foundations, we cannot state that the model is true
with a certain probability X.
£Ø
Difference
Subj. and Obj.
However, based on our epistemological( )
foundations, we cannot state that the model is true
with a certain probability X.
.
one, then the data is inconsistent with the
model s predictions, and we reject the model.
Example ?
nature) predicts
nature) predicts
■ Need to know how to update our old inferences
nature) predicts
■ Need to know how to update our old inferences
Theory, Creativity
Inference,
Hypothesis, Verification,
Model Clasification
6
Deduction Induction
?
Epistemic Relationships
Predition - Data
Observation
SCHOOL OF FINANCE AND S TAT I S T I C S
z p(θ) —
■ called Prior Distribution
rr B
LLL VVVVV
j
r
f
/ 0
epistemological foundation that there exists a
true data-generating process that can be
revealed through process of elimination.
Example:
■ Background and Data Information:
Theory A1 : θ = 0.03
Theory A2 : θ = 0.04
Theory A3 : θ = 0.05
Theory A4 : θ = 0.06
£Øö
high-voltage transmission lines.
■ Pro: Epidemiologists( ) show positive
correlations between cancer and proximity
■ Con: Other epidemiologists don .
t show these
correlations, and physicists and biologists
maintain believe that energy in magnetic fields
associated with high-voltage power lines is too
small to have an appreciable biological effect.
So,
So,
P r(A1 ) ≈ 0.5 ≈ P r(A2 ) + P r(A3 ) + P r(A4 ),
So,
P r(A1 ) ≈ 0.5 ≈ P r(A2 ) + P r(A3 ) + P r(A4 ),
So,
P r(A1 ) ≈ 0.5 ≈ P r(A2 ) + P r(A3 ) + P r(A4 ),
= 0.23
P r(A2 |Y = 8) = 0.21
P r(A3 |Y = 8) = 0.28
P r(A4 |Y = 8) = 0.28.
= 0.23
P r(A2 |Y = 8) = 0.21
P r(A3 |Y = 8) = 0.28
P r(A4 |Y = 8) = 0.28.
.
■ the p-value summands are irrelevant because
.
■ the p-value summands are irrelevant because
3.0
distribution? For example left
2.5
2.0
skew Be(1, 3) of sample
1.5
mean, what is the CI for the
1.0
population mean? The mode
0.5
is not included in the CI. This
0.0
seems not plausible!
0.0 0.2 0.4 0.6 0.8 1.0
s£`{
■ The sample mean does. Thus the frequentist
must use circumlocutions( ) like "95% of
similar intervals would contain the true mean, if
each interval were constructed from a different
random sample like this one."
a
”probability” = long-run fraction having this characteristic.
|
|
|
|
|
15
|
|
|
|
Index
|
10
|
|
|
|
|
|
5
|
|
|
|
−4 −2 0 2 4 6
Confidence Interval
38-1
Bayesian’s Point of View
N!
interval, centered near the sample mean, but
tempered( ) by "prior" beliefs concerning the
mean.
priors, sensitivity analysis, etc. to mitigate(
) the influence of priors on their results.
priors, sensitivity analysis, etc. to mitigate(
) the influence of priors on their results.
■ The Bayesian analysis is philosophically
unsound. Bayesians treat θ as a random variable
where classical analysis treats θ as a fixed, but
unknown constant.
priors, sensitivity analysis, etc. to mitigate(
) the influence of priors on their results.
■ The Bayesian analysis is philosophically
unsound. Bayesians treat θ as a random variable
where classical analysis treats θ as a fixed, but
unknown constant.
■ Bayesian Reply: Treating θ as random does not
necessarily mean that θ is random; rather, it
expresses our uncertainty/knowledge about θ.
SCHOOL OF FINANCE AND S TAT I S T I C S
©
quantities such as predictions for new
observations
©
probability distribution for all observable and
unobservable quantities in a problem
©
probability distribution for all observable and
unobservable quantities in a problem
■ Conditioning on observed data: calculating and
©
probability distribution for all observable and
unobservable quantities in a problem
■ Conditioning on observed data: calculating and
µ
implications of the resulting posterior
distribution
◆ Does the model fit the data?
assumptions in step 1?
©
probability distribution for all observable and
unobservable quantities in a problem
■ Conditioning on observed data: calculating and
µ
implications of the resulting posterior
distribution
◆ Does the model fit the data?
assumptions in step 1?
¶
1) potentially observable quantities, such as future
observations of a process
¶
1) potentially observable quantities, such as future
observations of a process
2) quantities that are not directly observable, that is,
parameters that govern the hypothetical process
©
leading to the observed data (for example,
regression coefficients)
¶
1) potentially observable quantities, such as future
observations of a process
2) quantities that are not directly observable, that is,
parameters that govern the hypothetical process
©
leading to the observed data (for example,
regression coefficients)
§
The distinction between these two kinds of
estimands is not always precise but generally
useful as a way of understanding how a statistical
©
model for a particular problem fits into the real
world
parameters of interest;
parameters of interest;
■ y = (y1 , . . . , yn ) — the observed data;
parameters of interest;
■ y = (y1 , . . . , yn ) — the observed data;
parameters of interest;
■ y = (y1 , . . . , yn ) — the observed data;
§
■ Bayesian statistical conclusions about a
parameter θ or unobserved data ỹ, are made in
terms of probability statements ©
§
■ Bayesian statistical conclusions about a
parameter θ or unobserved data ỹ, are made in
terms of probability statements ©
■ These probability statements are conditional on
the observed value of y, and in our notation are
written simply as p(θ|y) or p(ỹ|y) ©
§
■ Bayesian statistical conclusions about a
parameter θ or unobserved data ỹ, are made in
terms of probability statements ©
■ These probability statements are conditional on
the observed value of y, and in our notation are
written simply as p(θ|y) or p(ỹ|y) ©
©
■ We also implicitly condition on the known values
of any covariates, x
■ called
◆ marginal distribution of y
■ called
◆ marginal distribution of y
■ called
◆ marginal distribution of y
©
■ why predictive: because it is the distribution for a
quantity that is observable
§ ©
the likelihood function (when regarded as a
function of θ for fixed y)
§ ©
the likelihood function (when regarded as a
function of θ for fixed y)
■ In this way Bayesian inference obeys what is
§
sometimes called the likelihood principle, which
states that for a given sample of data any two
probability models p(y|θ) that have the same
©
likelihood function yield the same inference for
θ
©
the posterior odds are equal to the prior odds
multiplied by the likelihood ratio, p(y|θ1 )/p(y|θ2 )
§
■ We will rely primarily on the statistical package R
for graphs and basic simulations fitting of
©
classical simple models(including regression, ...),
optimization, and some simple programming
§
■ We will rely primarily on the statistical package R
for graphs and basic simulations fitting of
©
classical simple models(including regression, ...),
optimization, and some simple programming
©
■ We use WinBugs within R(see Appendix C) as a
first try for fitting most models
§
■ We will rely primarily on the statistical package R
for graphs and basic simulations fitting of
©
classical simple models(including regression, ...),
optimization, and some simple programming
©
■ We use WinBugs within R(see Appendix C) as a
first try for fitting most models
■ other related softwares
◆ First Bayes: http://www.tonyohagan.co.uk/1b/
(http://www.econ.umn.edu/ bacc)
◆ MCMCpack: R package (V0.7-3)
◆ coda: R package
Appendix A)
■ Drawing simulations from probability distributions
customized functions)
■ Calculating the linear regression estimate and
variance matrix
©
■ Graphics, including scatterplots with overlain
Appendix A)
■ Drawing simulations from probability distributions
customized functions)
■ Calculating the linear regression estimate and
variance matrix
©
■ Graphics, including scatterplots with overlain
©
models, gradually increasing the complexity ( See
Appendix C for a simple example)
©
models, gradually increasing the complexity ( See
Appendix C for a simple example)
Appendix C illustrates how to perform
©
computations in R and Bugs in several different
ways for a single example
■ where
◆ θ is the parameter of interest.
for θ.
◆ p(θ|y) is the posterior distribution for θ, given
the data y.
◆ p(y) is the marginal distribution, the total
2. Informative Prior
■ not uniform
2. Informative Prior
■ not uniform
3. Conjugate Prior
■ prior and posterior have same distribution
Suppose that
Suppose that
■ the likelihood (model) for y given θ is binomial
Suppose that
■ the likelihood (model) for y given θ is binomial
Suppose that
■ the likelihood (model) for y given θ is binomial
lower.
■ When β increases, the distribution shifts
lower.
■ When β increases, the distribution shifts
5
3.0
6
5
4
4
2.0
3
2
2
1.0
1
1
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
α = 1, β = 0.5 α = 1, β = 1 α = 1, β = 1.5
postmean=0.35,postmax=0.32 postmean=0.33,postmax=0.30 postmean=0.32,postmax=0.29
1.4
1.5
5
1.2
4
1.0
3
1.0
0.5
2
0.8
1
0.6
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
1.2
6
5
1.0
0.8
4
3
0.5
0.4
2
1
0.0
0.0
0
y=0 y=5
3
y=1 y=4
y=2 y=3
2
Prior
1
0
Prior
1.5
n=5,y=1
n=10,y=3
1.0
0.5
0.0
8
5
8
6
4
Posterios
Priors
6
n=5, y=1
Posterios
n=50, y=10
4
4
2
2
1
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
y y y
p(y) = N (µ, σ 2 + τ 2 ),
2 2 2 2
τ σ σ τ
p(θ|y) = N 2 2
y+ 2 2
µ, 2 2
,
σ +τ σ +τ σ +τ
2 2 2 2
τ σ 2 σ τ
p(ỹ|y) = N 2 2
y+ 2 2
µ, σ + 2 2
.
σ +τ σ +τ σ +τ