Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Dr.

Shahin Tavakoli Applied Bayesian Statistics Project 1

Project C
Exercise 1.1. The goal of this project is to implement a Metropolis algorithm to estimate
the parameter β of the Poisson regression model for the Sparrows dataset as explained in
the lecture slides (that you can nd in sparrows.RData). Recall that :

Y = number of ospring
x = age of the female sparrow
Y |x ∼ Poisson(θx )
log E(Y |x) = log θx = β1 + β2 x + β3 x2 ,

Note that we can write log E(Yi |xi ) = xTi β where xi = (1, xi , x2i )T and β = (β1 , β2 , β3 )T .
Use a suitable length for your MCMC simulations.
1. Implement a Metropolis algorithm to sample from the posterior p(β|y, X) using as
prior p(β) = N3 (0, 100I) and a proposal distribution J(β ∗ |β (s) ) = N3 (β (s) , σ̂ 2 (XT X)−1 ),
where σ̂ 2 is the variance of log(y1 + 1/2), . . . , log(yn + 1/2).
2. Construct multiple chains using as starting values β 0 = (i, i, i)T for i =c(-5:5)*2.
(a) Plot the multiple chains in a trace plot and assess the convergence.
(b) Plot
s 7→ log p(β (s) |X, y),
up to an additive constant. Why is this plot useful to assess convergence ?
(c) Produce a Gelman plot and the Gelman diagnostics.
For which starting value the chain converges rst ?
3. Now set β 0 as the optimal starting value that you found in the previous point. Sample
from p(β|y, X) using the Metropolis algorithm you wrote. Use a sample size S = 105 .
What is the posterior mean β̂ ?
(a) Plot the Auto- and Cross- Covariance and Auto- and Cross- Correlation func-
tion. Compute the eective sample size for β1 , β2 , β3 respectively. Comment on
these results.
(b) Produce a plot of the posterior density for each of the three entries of β and
add a segment on the x-axis indicating the credible sets of coverage 0.95 for
β1 , β2 , β3 respectively.
(c) We are interested in the posterior of θx |(X, y) for age x ∈ [1, 6]. Produce a plot
of its .025, .5, .975 quantiles as a function of x ∈ [1, 6].

1
Dr. Shahin Tavakoli Applied Bayesian Statistics Project 1

Instructions
 This project is worth 20% of your nal mark.
 The deadline for this project is Sunday 14th of May 2023 at midnight.
 You can work on this in groups of 3 students. You can decide the groups. Make sure
that the names of each person in the group appears on the report.
 You should submit both the .rmd (with the material needed to knit it) and the
.html les on Moodle. Make sure your results are reproducible.
 Late submissions are allowed but you will lose 0.5 pt for each 12 hours period of
delay.
 You can use the package coda for diagnostics and the package mvtnorm for the
multivariate normal distribution. No other external package is allowed except for
plotting, or unless specically announced.
 The length of the report should be reasonable. Reports that are too long will be
penalized.
 You will receive group feedbacks for your work but no general solution will be
released.
 The evaluation will be based on the marking grid, see Table 1.

Marking scheme

E/R 2pts
CA 2pts
R/C 1 pt
P 1 pt

Table 1  Marking grid.

 E/R = Clearness of Explanations and Reasoning. You should provide a neat


description of the thought process you followed to arrive to the solution you present.
 CA = Correct Answer. Arriving to a correct solution is important, however even
if the solution is incorrect, but the reasoning you followed is sound, you would still
get points from E/R.
 R/C = Reproducibility and clearness of Code. You should make sure that your
code is understandable and tidy. Make comments and explain what you are doing.
If you do simulations, set the seed for reproducibility of your results.
 P = Presentation. The report should be well structured and it should have a logic
development and present all your results. The quality of the graphs, tables and
gures you present will be evaluated. They should help to understand your results
in a compact and smart way. Choose carefully what to include and not include in
your reports.

You might also like