Tutorial 106b - Poisson Regression and Log-Linear Models (Bayesian)

Tutorial 10.
6b - Poisson regression and log-linear models (Bayesian)
Home Tutorials and Workshops Downloads Links

« Tutorial 10.5b Tutorial 11.1 »
Tutorial 10.6b - Poisson regression and log-

linear models (Bayesian) 12 Sep 2016
Section 1: Overview
Whilst in many instances, count data can be approximated reasonably well by a normal distribution (particularly if the counts
are all above zero and the mean count is greater than about 20), more typically, when count data are modelled via normal
distribution certain undesirable characteristics arise that are a consequence of the nature of discrete non-negative data.
Expected (predicted) values and confidence bands less than zero are illogical, yet these are entirely possible from a normal
distribution
The distribution of count data are often skewed when their mean is low (in part because the distribution is truncated to the
left by zero) and variance usually increases with increasing mean (variance is typically proportional to mean in count data).
By contrast, the Gaussian (normal) distribution assumes that mean and variance are unrelated and thus estimates
(particularly of standard error) might well be reasonable inaccurate.
Poisson regression is a type of generalized linear model (GLM) in which a non-negative integer (natural number) response
is modelled against a linear predictor via a specific link function. The linear predictor is typically a linear combination of effects
parameters (e.g. $\beta_0 + \beta_1x_x$). The role of the link function is to transform the expected values of the response y
(which is on the scale of (0,$\infty$), as is the Poisson distribution from which expectations are drawn) into the scale of the
linear predictor (which is $-\infty,\infty$).
As implied in the name of this group of analyses, a Poisson rather than Gaussian (normal) distribution is used to represent the
errors (residuals). Like count data (number of individuals, species etc), the Poisson distribution encapsulates positive integers
and is bound by zero at one end. Consequently, the degree of variability is directly related the expected value (equivalent to
the mean of a Gaussian distribution). Put differently, the variance is a function of the mean. Repeated observations from a
Poisson distribution located close to zero will yield a much smaller spread of observations than will samples drawn from a
Poisson distribution located a greater distance from zero. In the Poisson distribution, the variance has a 1:1 relationship with
the mean.
The canonical link function for the Poisson distribution is a log-link function.
Whilst the expectation that the mean=variance ($\mu=\sigma$) is broadly compatible with actual count data (that variance
increases at the same rate as the mean), under certain circumstances, this might not be the case. For example, when there are
other unmeasured influences on the response variable, the distribution of counts might be somewhat clumped which can result
in higher than expected variability (that is $\sigma\gt\mu$). The variance increases more rapidly than does the mean. This is
referred to as overdispersion. The degree to which the variability is greater than the mean (and thus the expected degree of
variability) is called dispersion. Effectively, the Poisson distribution has a dispersion parameter (or scaling factor) of 1.
It turns out that overdispersion is very common for count data and it typically underestimates variability, standard errors and
thus deflated p-values. There are a number of ways of overcoming this limitation, the effectiveness of which depend on the
causes of overdispersion.
Quasi-Poisson models - these introduce the dispersion parameter ($\phi$) into the model. This approach does not utilize
an underlying error distribution to calculate the maximum likelihood (there is no quasi-Poisson distribution). Instead, if the
Newton-Ralphson iterative reweighting least squares algorithm is applied using a direct specification of the relationship
between mean and variance ($var(y)=\phi\mu$), the estimates of the regression coefficients are identical to those of the
maximum likelihood estimates from the Poisson model. This is analogous to fitting ordinary least squares on symmetrical,
yet not normally distributed data - the parameter estimates are the same, however they won't necessarily be as efficient.
The standard errors of the coefficients are then calculated by multiplying the Poisson model coefficient standard errors by
$\sqrt{\phi}$.
Unfortunately, because the quasi-poisson model is not estimated via maximum likelihood, properties such as AIC and log-
likelihood cannot be derived. Consequently, quasi-poisson and Poisson model fits cannot be compared via either AIC or
likelihood ratio tests (nor can they be compared via deviance as uasi-poisson and Poisson models have the same residual
deviance). That said, quasi-likelihood can be obtained by dividing the likelihood from the Poisson model by the dispersion
(scale) factor.
Negative binomial model - technically, the negative binomial distribution is a probability distribution for the number of
successes before a specified number of failures. However, the negative binomial can also be defined (parameterized) in
terms of a mean ($\mu$) and scale factor ($\omega$), $$p(y_i)=\frac{\Gamma(y_i+\omega)}
{\Gamma(\omega)y!}\times\frac{\mu_i^{y_i}\omega^\omega}{(\mu_i+\omega)^{\mu_i+\omega}}$$ where the
expectected value of the values $y_i$ (the means) are ($\mu_i$) and the variance is $y_i=\frac{\mu_i+\mu_i^2}
{\omega}$. In this way, the negative binomial is a two-stage hierarchical process in which the response is modeled against
a Poisson distribution whose expected count is in turn modeled by a Gamma distribution with a mean of $\mu$ and
constant scale parameter ($\omega$).
Strictly, the negative binomial is not an exponential family distribution (unless $\omega$ is fixed as a constant), and thus
negative binomial models cannot be fit via the usual GLM iterative reweighting algorithm. Instead estimates of the
regression parameters along with the scale factor ($\omega$) are obtained via maximum likelihood.
The negative binomial model is useful for accommodating overdispersal when it is likely caused by clumping (due to the
influence of other unmeasured factors) within the response.
Zero-inflated Poisson model - overdispersion can also be caused by the presence of a greater number of zero's than
would otherwise be expected for a Poisson distribution. There are potentially two sources of zero counts - genuine zeros
and false zeros. Firstly, there may genuinely be no individuals present. This would be the number expected by a Poisson
distribution. Secondly, individuals may have been present yet not detected or may not even been possible. These are false
zero's and lead to zero inflated data (data with more zeros than expected).
For example, the number of joeys accompanying an adult koala could be zero because the koala has no offspring (true
zero) or because the koala is male or infertile (both of which would be examples of false zeros). Similarly, zero counts of
the number of individual in a transect are due either to the absence of individuals or the inability of the observer to detect
them. Whilst in the former example, the latent variable representing false zeros (sex or infertility) can be identified and
those individuals removed prior to analysis, this is not the case for the latter example. That is, we cannot easily partition
which counts of zero are due to detection issues and which are a true indication of the natural state.
Consistent with these two sources of zeros, zero-inflated models combine a binary logistic regression model (that models
count membership according to a latent variable representing observations that can only be zeros - not detectable or male
koalas) with a Poisson regression (that models count membership according to a latent variable representing observations
whose values could be 0 or any positive integer - fertile female koalas).
https://www.flutterbys.com.au/stats/tut/tut10.6b.html[05/01/2020 18:28:17]
Tutorial 10.6b - Poisson regression and log-linear models (Bayesian)
1.1: Summary of important equations

Many ecologists are repulsed or frightened by statistical formulae. In part this is due to the use of Greek letters to represent
concepts, constants and functions with the assumption that the readers are familiar with their meaning. Hence, the issue is
largely that many ecologists are familiar with certain statistical concepts, yet are not overly familiar with the notation used to
represent those concepts.
As a typed language, R exposes users to a rich variety of statistical opportunities. Whilst many of the common procedures are
wrapped into simple to use functions (and so all of the calculations etc underlying a GLM need not be performed by hand by the
user), not all procedures have been packaged into functions. For such cases, it is useful to have the formulas handy.
Hence, in this section I am going to summarize and compare equations for common procedures associated with Poisson and
Negative Binomial models. Feel free to skip this section until it is useful to you.
Poisson Negative Binomial (p,r)
Density $f(y|\lambda)=\frac{\lambda^{y}e^{- $f(y|r,p)=\frac{\Gamma(y+r)}

function \lambda}}{y!}$ {\Gamma(r)\Gamma(y+1)}p^r(1-p)^y$
Expected $E(Y)=\mu=\lambda$ $E(Y)=\mu=\frac{r(1-p)}{p}$

value
Variance $var(Y)=\lambda$ $var(Y)=\frac{r(1-p)}{p^2}$
log- $\mathcal{LL}(\lambda\mid y) = $\begin{align}\mathcal{LL}(r,p\mid y) =

likelihood \sum\limits^n_{i=1}y_i \sum\limits^n_{i=1}&log\Gamma(y_1 + r) - log\Gamma(r) -
log(\lambda_i)-\lambda_i - log\Gamma(y_i + 1) + \\&r.log(p) + y_i log(1-p)\end{align}$
log\Gamma(y_i+1)$
Negative Binomial (a,b)
Density $f(y \mid a,b) = \frac{b^{a}y^{b-1}e^{-b y}}{\Gamma(a)}$

function
Expected $E(Y)=\mu=\frac{r}{p}$
value
Variance
log- $\mathcal{LL}(\lambda;y)=\sum\limits^n_{i=1} log\Gamma(y_i + a) - log\Gamma(y_i+1) -

likelihood log\Gamma(a) + \alpha(log(b_i) - log(b_i+1)) - y_i.log(b_i+1)$
Zero-inflated Poisson
Density $p(y \mid \theta,\lambda) = \left\{ \begin{array}{l l} \theta + (1-\theta)\times \text{Pois}(0|\lambda) &
function \quad \text{if $y_i=0$ and}\\ (1-\theta)\times \text{Pois}(y_i|\lambda) & \quad \text{if $y_i>0$}
\end{array} \right.$
Expected $E(Y)=\mu=\lambda\times(1-theta)$
value
Variance $var(Y)=\lambda\times(1-\theta)\times(1+\theta\times\lambda^2)$
log- $\mathcal{LL}(\theta,\lambda\mid y) = \left\{ \begin{array}{l l} \sum\limits^n_{i=1} (1-\theta)\times

likelihood log(\theta + \lambda_i)\\ \sum\limits^n_{i=1}y_i log(1-\theta)\times y_i\lambda_i - exp(\lambda_i)\\
\end{array} \right. $
Procedure Equation
Residuals $\varepsilon_i = y_i - \hat{y}$
Pearson's Residuals $\frac{y_i - \hat{y}}{\sqrt{var(y)}}$
Residual Sums of Squares $RSS = \sum \varepsilon_i^2$
Dispersion statistic $\phi=\frac{RSS}{df}$
$R^2$ $R^2 = 1-\frac{RSS_{model}}{RSS_{null}}$
McFadden's $R^2$ $R_{McF}^2 = 1 - \frac{\mathcal{LL}_{model}}{\mathcal{LL}_{null}}$
Deviance $dev = -2*\mathcal{LL}$
pD $pD = $
DIC $DIC = -pD$
AIC $AIC = min(dev+(2pD))$
Section 2: Poisson regression

The following equations are provided since in Bayesian modelling, it is occasionally necessary to directly define the log-
likelihood calculations (particularly for zero-inflated models and other mixture models). Feel free initially gloss over these
equations until such time when your models require them ;)
Density function: $$ f(y\mid\lambda)=\frac{\lambda^{y}e^{-\lambda}}{y!}\\ E(Y)=Var(Y)=\lambda\\ $$ where $\lambda$
is the mean.
Likelihood: $$ \begin{align} \mathcal{L}(\lambda\mid y) &= \prod\limits_{i=1}^n \dfrac{\lambda^{y_i}e^{-\lambda}}

{y_i!}\\ %&= \dfrac{\lambda^{\sum\limits^n_{i=1}y_i} e^{-n\lambda}}{y_1!y_2! \cdots y_n!}\\ \end{align} $$ Log-
likelihood: $$ \begin{array}{rcl} \mathcal{LL}(\lambda\mid y)&=&\sum\limits^n_{i=1}log(\lambda^{y_i}e^{-\lambda_i})-
log(y_i!)\\ \mathcal{LL}(\lambda\mid y)&=&\sum\limits^n_{i=1}log(\lambda^{y_i})+log(e^{-\lambda_i})-log(y_i!)\\
\mathcal{LL}(\lambda\mid y)&=&\sum\limits^n_{i=1}y_i log(\lambda_i)-\lambda_i - log(y_i!) \end{array} $$
2.1: Scenario and Data

Lets say we wanted to model the abundance of an item ($y$) against a continuous predictor ($x$). As this section is mainly
about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are
optional and can be safely skipped. Consequently, I have folded (toggled) this section away.
Details of data generation
Random data incorporating the following trends (effect parameters)
the sample size = 20

the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
the rate of change in log $y$ per unit of $x$ (slope) = 0.1.
the value of $x$ when log$y$ equals 0 (when $y$=1)
to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor (created by calculating the
outer product of the model matrix and the regression parameters). These expected values are then transformed into a
scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$

finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random
numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and
positive integers.
set.seed(8 )
#The number of samples
n.x <- 20
#Create x values that at uniformly distributed throughout the rate of 1 to 20
x <- sort(runif(n = n.x, min = 1 , max =20 ))
mm <- model.matrix(~x)
intercept <- 0.6
slope=0.1
#The linear predictor
linpred <- mm %*% c(intercept,slope)
#Predicted y values
lambda <- exp(linpred)
#Add some noise and make binomial
y <- rpois(n=n.x, lambda=lambda)
dat <- data.frame(y,x)
With these sort of data, we are primarily interested in investigating whether there is a relationship between the positive integer
response variable and the linear predictor (linear combination of one or more continuous or categorical predictors).
2.2: Exploratory data analysis and initial assumption checking
The assumptions are:

1. All of the observations are independent - this must be addressed at the design and collection stages
2. The response variable (and thus the residuals) should be matched by an appropriate distribution (in the case of
positive integers response - a Poisson is appropriate).
3. All observations are equally influential in determining the trends - or at least no observations are overly influential.
This is most effectively diagnosed via residuals and other influence indices and is very difficult to diagnose prior to
analysis
4. the relationship between the linear predictor (right hand side of the regression formula) and the link function should
be linear. A scatterplot with smoother can be useful for identifying possible non-linearity.
5. The dispersion factor is close to 1
There are at least five main potential models we could consider fitting to these data:
1. Ordinary least squares regression (general linear model) - assumes normality of residuals
2. Poisson regression - assumes mean=variance (dispersion=1)
3. Quasi-poisson regression - a general solution to overdispersion. Assumes variance is a function of mean, dispersion
estimated, however likelihood based statistics unavailable
4. Negative binomial regression - a specific solution to overdispersion caused by clumping (due to an unmeasured latent
variable). Scaling factor ($\omega$) is estimated along with the regression parameters.
5. Zero-inflation model - a specific solution to overdispersion caused by excessive zeros (due to an unmeasured latent
variable). Mixture of binomial and Poisson models.
2.2.1: Confirm non-normality and explore clumping

When counts are all very large (not close to 0) and their ranges do not span orders of magnitude, they take on very Gaussian
properties (symmetrical distribution and variance independent of the mean). Given that models based on the Gaussian
distribution are more optimized and recognized than Generalized Linear Models, it can be prudent to adopt Gaussian models for
such data. Hence it is a good idea to first explore whether a Poisson model is likely to be more appropriate than a standard
Gaussian model.
The potential for overdispersion can be explored by adding a rug to boxplot. The rug is simply tick marks on the inside of an
axis at the position corresponding to an observation. As multiple identical values result in tick marks drawn over one another, it
is typically a good idea to apply a slight amount of jitter (random displacement) to the values used by the rug.
hist(dat$y)
boxplot(dat$y, horizontal=TRUE )
rug(jitter(dat$y), side=1 )
There is definitely signs of non-normality that would warrant Poisson models. The rug applied to the boxplots does not indicate
a series degree of clumping and there appears to be few zero. Thus overdispersion is unlikely to be an issue.
2.2.2: Confirm linearity

Lets now explore linearity by creating a histogram of the predictor variable ($x$) and a scatterplot of the relationship between
the response ($y$) and the predictor ($x$)
hist(dat$x)
#now for the scatterplot

plot(y~x, dat, log="y")
with(dat, lines(lowess(y~x)))
Conclusions: the predictor ($x$) does not display any skewness or other issues that might lead to non-linearity. The lowess
smoother on the scatterplot does not display major deviations from a straight line and thus linearity is satisfied. Violations of
linearity could be addressed by either:
define a non-linear linear predictor (such as a polynomial, spline or other non-linear function)
transform the scale of the predictor variables
2.2.3: Explore zero inflation

Although we have already established that there are few zeros in the data (and thus overdispersion is unlikely to be an issue),
we can also explore this by comparing the number of zeros in the data to the number of zeros that would be expected from a
Poisson distribution with a mean equal to the mean count of the data.
#proportion of 0's in the data

dat.tab<-table(dat$y==0 )
dat.tab/sum(dat.tab)
FALSE
1
#proportion of 0's expected from a Poisson distribution

mu <- mean(dat$y)
cnts <- rpois(1000 , mu)
dat.tab <- table(cnts == 0 )
dat.tab/sum(dat.tab)
FALSE TRUE
0.997 0.003
In the above, the value under FALSE is the proportion of non-zero values in the data and the value under TRUE is the
proportion of zeros in the data. In this example, there are no zeros in the observed data which corresponds closely to the very
low proportion expected (0.003).
2.3: Model fitting or statistical analysis

2.3.1: JAGS
$$ \begin{align} Y_i&\sim{}P(\lambda) & (\text{response distribution})\\ log(\lambda_i)&=\eta_i & (\text{link function})\\
\eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,10000) & (\text{diffuse
Bayesian prior})\\ \end{align} $$
dat.list <- with(dat,list(Y=y, X=x,N=nrow(dat)))
modelString="
model {
for (i in 1:N) {
Y[i] ~ dpois(lambda[i])
log(lambda[i]) <- beta0 + beta1*X[i]
}
beta0 ~ dnorm(0,1.0E-06)
}
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS1.4.bug')
library(R2jags)
system.time(
dat.P.jags <- jags(model='../downloads/BUGSscripts/tut11.5bS1.4.bug', data=dat.list, inits=NULL,
param=c('beta0','beta1'),
n.chain=3 ,
n.iter=20000 , n.thin=10 , n.burnin=10000 )
)
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph Size: 105
Initializing model
user system elapsed
1.852 0.004 1.861
Matrix parameterization
Xmat <- model.matrix(~x, dat)

nX <- ncol(Xmat)
dat.list1 <- with(dat,list(Y=y, X=Xmat,N=nrow(dat), nX=nX))
modelString="
model {
for (i in 1:N) {
eta[i] <- inprod(beta[], X[i,])
log(lambda[i]) <- eta[i]
}
for (i in 1:nX) {
beta[i] ~ dnorm(0,1.0E-06)
}
}
"
library(R2jags)
system.time(
dat.P.jags1 <- jags(model='../downloads/BUGSscripts/tut11.5bS1.41.bug', data=dat.list1, inits=NULL,
param=c('beta'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 127
Initializing model
user system elapsed
1.836 0.004 1.844
print(dat.P.jags1)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS1.41.bug", fit using jags,
3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
n.sims = 3000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
beta[1] 0.546 0.256 0.019 0.384 0.552 0.719 1.018 1.002 1700
beta[2] 0.112 0.019 0.077 0.099 0.112 0.124 0.149 1.002 1800
deviance 88.428 4.999 86.376 86.891 87.680 89.036 93.531 1.001 3000
For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
DIC info (using the rule, pD = var(deviance)/2)

pD = 12.5 and DIC = 100.9
DIC is an estimate of expected predictive error (lower deviance is better).
Or arguably better still, use a multivariate normal prior. If we have a $k$ regression parameters ($\beta_k$), then the
multivariate normal priors are defined as: $$ \boldsymbol{\beta}\sim{}\mathcal{N_k}(\boldsymbol{\mu},
\mathbf{\Sigma}) $$ where $$\boldsymbol{\mu}=[E[\beta_1],E[\beta_2],...,E[\beta_k]] = \left(\begin{array}
{c}0\\\vdots\\0\end{array}\right)$$ $$ \mathbf{\Sigma}=[Cov[\beta_i, \beta_j]] = \left(\begin{array}
{ccc}1000^2&0&0\\0&\ddots&0\\0&0&1000^2\end{array} \right) $$ hence, along with the response and predictor matrix,
we need to supply $\boldsymbol{\mu}$ (a vector of zeros) and $\boldsymbol{\Sigma}$ (a covariance matrix with
$1000^2$ in the diagonals).

nX <- ncol(Xmat)
dat.list2 <- with(dat,list(Y=y, X=Xmat,N=nrow(dat), mu=rep(0 ,nX),Sigma=diag(1.0E-06 ,nX)))
modelString="
model {
for (i in 1:N) {
}
beta ~ dmnorm(mu[],Sigma[,])
}
"
library(R2jags)
system.time(
param=c('beta'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 130
Initializing model
user system elapsed
0.460 0.000 0.463
print(dat.P.jags2)
beta[1] 0.561 0.241 0.084 0.405 0.571 0.724 1.034 1.002 1500
beta[2] 0.111 0.018 0.074 0.099 0.111 0.123 0.145 1.002 1300
deviance 88.184 1.850 86.372 86.854 87.596 88.898 92.951 1.007 900


pD = 1.7 and DIC = 89.9
2.3.2: BRMS
library(brms)
dat.brm <- brm(y~x, data=dat, family='poisson',
prior = c(set_prior("normal(0,100)", class="b")),
chains=3 , iter=2000 , warmup=1000 , thin=2 )
SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 1).
Chain 1, Iteration: 1 / 2000 [ 0%] (Warmup)

Chain 1, Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%] (Sampling)#
# Elapsed Time: 0.03963 seconds (Warm-up)
# 0.03519 seconds (Sampling)
# 0.07482 seconds (Total)
#

#

#
2.4: Chain mixing and Model validation

Prior to exploring the model parameters, it is prudent to confirm that the model did indeed fit the assumptions and was an
appropriate fit to the data as well as that the MCMC sampling chain was adequately mixed and the retained samples
independent.
Whilst I will only demonstrate this for the logit model, the procedure would be identical for exploring the probit and clog-log
models.
We will start by exploring the mixing of the MCMC chains via traceplots.
plot(as.mcmc(dat.P.jags))
View code associated with BRMS model
library(gridExtra)
grid.arrange(stan_trace(dat.brm$fit, ncol=1 ),
stan_dens(dat.brm$fit, separate_chains=TRUE ,ncol=1 ),
ncol=2 )
The chains appear well mixed and stable

Next we will explore correlation amongst MCMC samples.
autocorr.diag(as.mcmc(dat.P.jags))
beta0 beta1 deviance
Lag 0 1.0000000 1.00000 1.000000
Lag 10 0.1553164 0.11400 0.009912
Lag 50 0.0228548 0.01439 0.004464
Lag 100 -0.0008548 -0.01320 0.001207
Lag 500 -0.0339699 -0.02674 -0.007545
The level of auto-correlation at the nominated lag of 10 is higher than we would generally like. It is worth increasing the
thinning rate from 10 to 50. Obviously, to support this higher thinning rate, we would also increase the number of
iterations.
library(R2jags)
dat.P.jags <- jags(data=dat.list,model.file='../downloads/BUGSscripts/tut11.5bS1.4.bug',
n.chains=3 , n.iter=100000 , n.burnin=20000 , n.thin=50 )
Allocating nodes
Graph Size: 105
Initializing model
print(dat.P.jags)
3 chains, each with 1e+05 iterations (first 20000 discarded), n.thin = 50

beta0 0.545 0.257 0.040 0.375 0.545 0.718 1.039 1.001 4800
beta1 0.112 0.019 0.075 0.100 0.112 0.124 0.148 1.001 4800
deviance 88.379 3.712 86.368 86.871 87.664 89.058 93.778 1.002 4800


pD = 6.9 and DIC = 95.3
plot(as.mcmc(dat.P.jags))
autocorr.diag(as.mcmc(dat.P.jags))
beta0 beta1 deviance
Lag 0 1.0000000 1.000000 1.000000
Lag 50 0.0100259 0.009642 0.003845
Lag 250 0.0133849 0.010277 0.001847
Lag 500 0.0056560 0.003987 0.001649
Lag 2500 0.0006608 0.003896 0.007031
Conclusions: the samples are now less auto-correlated and the chains are also arguably mixed a little better.
stan_ac(dat.brm$fit)
Explore the step size characteristics (STAN only)

summary(do.call(rbind, args = get_sampler_params(dat.brm$fit, inc_warmup = FALSE )), digits = 2 )

accept_stat__ stepsize__ treedepth__ n_leapfrog__ n_divergent__
Min. :0.24 Min. :0.64 Min. :1.0 Min. :1.0 Min. :0

1st Qu.:0.88 1st Qu.:0.64 1st Qu.:2.0 1st Qu.:3.0 1st Qu.:0
Median :0.96 Median :0.72 Median :2.0 Median :3.0 Median :0
Mean :0.92 Mean :0.70 Mean :2.1 Mean :3.7 Mean :0
3rd Qu.:1.00 3rd Qu.:0.73 3rd Qu.:3.0 3rd Qu.:5.0 3rd Qu.:0
Max. :1.00 Max. :0.73 Max. :3.0 Max. :7.0 Max. :0
stan_diag(dat.brm$fit)
stan_diag(dat.brm$fit, information = "stepsize")
stan_diag(dat.brm$fit, information = "treedepth")
stan_diag(dat.brm$fit, information = "divergence")
library(gridExtra)
grid.arrange(stan_rhat(dat.brm$fit) + theme_classic(8 ),
stan_ess(dat.brm$fit) + theme_classic(8 ),
stan_mcse(dat.brm$fit) + theme_classic(8 ),
ncol = 2 )
Conclusions: acceptance ratio is very good, the lot posterior is relatively robust to step size and tree depth, rhat values are
all below 1.05, the effective sample size is always above 0.5 and the ratio of MCMC error to posterior sd is very small.
One very important model validation procedure is to examine a plot of residuals against predicted or fitted values (the
residual plot). Ideally, residual plots should show a random scatter of points without outliers. That is, there should be no
patterns in the residuals. Patterns suggest inappropriate linear predictor (or scale) and/or inappropriate residual
distribution/link function.
The residuals used in such plots should be standardized (particularly if the model incorporated any variance-covariance
structures - such as an autoregressive correlation structure) Pearsons's residuals standardize residuals by division with the
square-root of the variance.
We can generate Pearson's residuals within the JAGS model. Alternatively, we could use the parameters to generate the
residuals outside of JAGS. Pearson's residuals are calculated according to: $$ \varepsilon = \frac{y_i - \mu}
{\sqrt{var(y)}} $$ where $\mu$ is the expected value of $Y$ ($=\lambda$ for Poisson) and $var(y)$ is the variance of
$Y$ ($=\lambda$ for Poisson).
#extract the samples for the two model parameters
coefs <- dat.P.jags$BUGSoutput$sims.matrix[,1 :2 ]
Xmat <- model.matrix(~x, data=dat)
#expected values on a log scale
eta<-coefs %*% t(Xmat)
#expected value on response scale
lambda <- exp(eta)
#Expected value and variance are both equal to lambda
expY <- varY <- lambda
#sweep across rows and then divide by lambda
Resid <- -1 *sweep(expY,2 ,dat$y,'-')/sqrt(varY)
#plot residuals vs expected values
plot(apply(Resid,2 ,mean)~apply(eta,2 ,mean))
#Calculate residuals
Resid.brm <- residuals(dat.brm, type='pearson')
Fitted.brm <- fitted(dat.brm, scale='linear')
ggplot(data=NULL, aes(y=Resid.brm[,'Estimate'], x=Fitted.brm[,'Estimate'])) + geom_point()
There is one residual that is substantially larger in magnitude than all the others. However, there are no other patterns in
the residuals.
Now we will compare the sum of squared residuals to the sum of squares residuals that would be expected from a Poisson
distribution matching that estimated by the model. Essentially this is estimating how well the Poisson distribution, the log-
link function and the linear model approximates the observed data.
SSres<-apply(Resid^2 ,1 ,sum)
set.seed(2 )
#generate a matrix of draws from a poisson distribution
# the matrix is the same dimensions as lambda and uses the probabilities of lambda
YNew <- matrix(rpois(length(lambda),lambda=lambda),nrow=nrow(lambda))
Resid1<-(lambda - YNew)/sqrt(lambda)
SSres.sim<-apply(Resid1^2 ,1 ,sum)
mean(SSres.sim>SSres)
[1] 0.484375
Goodness of fit within JAGS
dat.list1 <- with(dat,list(Y=y, X=x,N=nrow(dat)))

modelString="
model {
for (i in 1:N) {
#likelihood function
eta[i] <- beta0+beta1*X[i] #linear predictor
log(lambda[i]) <- eta[i] #link function
#E(Y) and var(Y)

expY[i] <- lambda[i]
varY[i] <- lambda[i]
# Calculate RSS
Resid[i] <- (Y[i] - expY[i])/sqrt(varY[i])
RSS[i] <- pow(Resid[i],2)
#Simulate data from a Poisson distribution

Y1[i] ~ dpois(lambda[i])
#Calculate RSS for simulated data
Resid1[i] <- (Y1[i] - expY[i])/sqrt(varY[i])
RSS1[i] <-pow(Resid1[i],2)
}
#Priors
#Bayesian P-value
Pvalue <- mean(sum(RSS1)>sum(RSS))
}
"
library(R2jags)
system.time(
param=c('beta0','beta1','Pvalue'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 272
Initializing model
user system elapsed
16.189 0.004 16.209
print(dat.P.jags1)
Pvalue 0.481 0.500 0.000 0.000 0.000 1.000 1.00 1.001 4800
beta0 0.544 0.261 0.017 0.374 0.550 0.720 1.04 1.002 1200
beta1 0.112 0.019 0.075 0.099 0.112 0.125 0.15 1.002 1600
deviance 88.466 3.891 86.372 86.931 87.758 89.180 93.91 1.001 4800


pD = 7.6 and DIC = 96.0
Resid.brm <- residuals(dat.brm, type='pearson', summary=FALSE )

SSres.brm <- apply(Resid.brm^2 ,1 ,sum)
lambda.brm = fitted(dat.brm, scale='response', summary=FALSE )
YNew.brm <- matrix(rpois(length(lambda.brm), lambda=lambda.brm),nrow=nrow(lambda.brm))
Resid1.brm<-(lambda.brm - YNew.brm)/sqrt(lambda.brm)
SSres.sim.brm<-apply(Resid1.brm^2 ,1 ,sum)
mean(SSres.sim.brm>SSres.brm)
[1] 0.6026667
Conclusions: the Bayesian p-value is approximately 0.5, suggesting that there is a good fit of the model to the data.
Unfortunately, unlike with linear models (Gaussian family), the expected distribution of data (residuals) varies over the
range of fitted values for numerous (often competing) ways that make diagnosing (and attributing causes thereof) miss-
specified generalized linear models from standard residual plots very difficult. The use of standardized (Pearson) residuals
or deviance residuals can partly address this issue, yet they still do not offer completely consistent diagnoses across all
issues (miss-specified model, over-dispersion, zero-inflation).
An alternative approach is to use simulated data from the model posteriors to calculate an empirical cumulative density
function from which residuals are are generated as values corresponding to the observed data along the density function.

coefs <- dat.P.jags$BUGSoutput$sims.matrix[,1 :2 ]
lambda <- exp(eta)
simRes <- function(lambda, data,n=250 , plot=T, family='poisson') {

require(gap)
N = nrow(data)
sim = switch(family,
'poisson' = matrix(rpois(n*N,apply(lambda,2 ,mean)),ncol=N, byrow=TRUE )
)
a = apply(sim + runif(n,-0.5 ,0.5 ),2 ,ecdf)
resid<-NULL
for (i in 1 :nrow(data)) resid<-c(resid,a[[i]](data$y[i] + runif(1 ,-0.5 ,0.5 )))
if (plot==T) {
par(mfrow=c(1 ,2 ))
gap::qqunif(resid,pch = 2 , bty = "n",
logscale = F, col = "black", cex = 0.6 , main = "QQ plot residuals",
cex.main = 1 , las=1 )
plot(resid~apply(lambda,2 ,mean), xlab='Predicted value', ylab='Standardized residual', las=1 )
}
resid
}
simRes(lambda,dat, family='poisson')
[1] 0.264 0.280 0.556 0.416 0.584 0.988 0.068 0.576 0.524 0.756 0.080 0.544 0.888 0.044 0.288 0.340
0.796 0.832 0.164 0.704
The trend (black symbols) in the qq-plot does not appear to be overly non-linear (matching the ideal red line well),
suggesting that the model is not overdispersed. The spread of standardized (simulated) residuals in the residual plot do not
appear overly non-uniform. That is there is not trend in the residuals. Furthermore, there is not a concentration of points
close to 1 or 0 (which would imply overdispersion).
simRes <- function(lambda, data,n=250 , plot=T, family='poisson') {

require(gap)
N = nrow(data)
'poisson' = matrix(rpois(n*N,apply(lambda,2 ,mean)),ncol=N, byrow=TRUE )
)
resid<-NULL
if (plot==T) {
par(mfrow=c(1 ,2 ))
}
resid
}
simRes(lambda.brm, dat, family='poisson')
[1] 0.500 0.564 0.676 0.184 0.632 0.996 0.072 0.568 0.588 0.764 0.128 0.580 0.940 0.100 0.264
0.216 0.780 0.880 0.148 0.716
Recall that the Poisson regression model assumes that variance=mean ($var=\mu\phi$ where $\phi=1$) and thus
dispersion ($\phi=\frac{var}{\mu}=1$). However, we can also calculate approximately what the dispersion factor would
be by using sum square of the residuals as a measure of variance and the model residual degrees of freedom as a measure
of the mean (since the expected value of a Poisson distribution is the same as its degrees of freedom). $$\phi=\frac{RSS}
{df}$$ where $df=n-k$ and $k$ is the number of estimated model coefficients.
Resid <- -1 *sweep(lambda,2 ,dat$y,'-')/sqrt(lambda)
RSS<-apply(Resid^2 ,1 ,sum)
(df<-nrow(dat)-ncol(coefs))
[1] 18
Disp <- RSS/df

data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)),
HPDinterval(as.mcmc(Disp),p=0.5 ))
Median Mean lower upper lower.1 upper.1
var1 1.044197 1.109181 0.9299409 1.452832 0.9299853 1.044238
Dispersion statistic entirely via JAGS

modelString="
model {
for (i in 1:N) {
eta[i] <- beta0 + beta1*X[i]
expY[i] <- lambda[i]
varY[i] <- lambda[i]
}
RSS <- sum(pow(Resid,2))
df <- N-2
phi <- RSS/df
}
"
writeLines(modelString,con='tut11.5bS1.40.bug')
library(R2jags)
system.time(
dat.P.jags <- jags(model='tut11.5bS1.40.bug', data=dat.list, inits=NULL,
param=c('beta0','beta1','phi'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 171
Initializing model
user system elapsed
14.865 0.004 14.886
print(dat.P.jags)
Inference for Bugs model at "tut11.5bS1.40.bug", fit using jags,

beta0 0.542 0.258 0.013 0.37 0.544 0.716 1.042 1.001 4800
beta1 0.112 0.019 0.075 0.10 0.112 0.125 0.149 1.001 4800
phi 1.112 0.400 0.934 0.98 1.052 1.173 1.570 1.001 4800
deviance 88.433 3.859 86.373 86.89 87.727 89.159 93.839 1.001 4800


pD = 7.4 and DIC = 95.9
Resid.brm <- residuals(dat.brm, type='pearson', summary=FALSE )

(df <- nrow(dat) - nrow(coef(dat.brm)))
[1] 18
Disp <- SSres.brm/df

HPDinterval(as.mcmc(Disp)))
var1 0.9666143 0.9944079 0.8964392 1.168199 0.8964392 1.168199
The dispersion statistic $\phi$ is close to 1 and thus there is no evidence that the data were overdispersed. The Poisson
distribution was therefore appropriate.
2.5: Exploring the model parameters, test hypotheses

If there was any evidence that the assumptions had been violated or the model was not an appropriate fit, then we would need
to reconsider the model and start the process again. In this case, there is no evidence that the test will be unreliable so we can
proceed to explore the test statistics.
print(dat.P.jags)
beta0 0.542 0.258 0.013 0.37 0.544 0.716 1.042 1.001 4800
beta1 0.112 0.019 0.075 0.10 0.112 0.125 0.149 1.001 4800
phi 1.112 0.400 0.934 0.98 1.052 1.173 1.570 1.001 4800
deviance 88.433 3.859 86.373 86.89 87.727 89.159 93.839 1.001 4800


pD = 7.4 and DIC = 95.9
library(plyr)
adply(dat.P.jags$BUGSoutput$sims.matrix[,1 :2 ], 2 , function(x) {
data.frame(Median=median(x), Mean=mean(x), HPDinterval(as.mcmc(x)), HPDinterval(as.mcmc(x),p=0.5 ))
})
X1 Median Mean lower upper lower.1 upper.1
1 beta0 0.5445 0.5423 0.02752 1.0480 0.3617 0.7056
2 beta1 0.1123 0.1122 0.07599 0.1491 0.1001 0.1251
Actually, many find it more palatable to express the estimates in the original scale of the observations rather than on a log
scale.
library(plyr)
adply(exp(dat.P.jags$BUGSoutput$sims.matrix[,1 :2 ]), 2 , function(x) {
})
1 beta0 1.724 1.778 0.9645 2.709 1.342 1.915
2 beta1 1.119 1.119 1.0790 1.161 1.105 1.133
Conclusions: We would reject the null hypothesis of no effect of $x$ on $y$. An increase in x is associated with a significant
linear increase (positive slope) in the abundance of $y$. Every 1 unit increase in $x$ results in a log 0.1123415 unit increase in
$y$. We usually express this in terms of abundance rather than log abundance, so every 1 unit increase in $x$ results in a
($e^{ 0.1123415 } = 1.1188949 $) 1.1188949 unit increase in the abundance of $y$.
summary(dat.brm)
Family: poisson (log)
Formula: y ~ x
Data: dat (Number of observations: 20)
Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 2;
total post-warmup samples = 1500
WAIC: Not computed
Fixed Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept 0.54 0.25 0.04 1.01 864 1
x 0.11 0.02 0.08 0.15 962 1
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample

is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(coef(dat.brm))
mean
Intercept 1.710068
x 1.119290
coefs.brm <- as.matrix(as.data.frame(rstan:::extract(dat.brm$fit)))

coefs.brm <- coefs.brm[,grep('b', colnames(coefs.brm))]
plyr:::adply(exp(coefs.brm), 2 , function(x) {
data.frame(Mean=mean(x), median=median(x), HPDinterval(as.mcmc(x)))
})
X1 Mean median lower upper
1 b_Intercept 1.763680 1.710650 0.950453 2.630181
2 b_x 1.119476 1.118679 1.082259 1.161513
marginal_effects(dat.brm)
2.6: Further explorations of the trends

A measure of the strength of the relationship can be obtained according to: $$R^2 = 1 - \frac{RSS_{model}}{RSS_{null}}$$

lambda <- exp(eta)
#calculate the raw SS residuals
SSres <- apply((-1 *(sweep(lambda,2 ,dat$y,'-')))^2 ,1 ,sum)
SSres.null <- sum((dat$y - mean(dat$y))^2 )
#OR
SSres.null <- crossprod(dat$y - mean(dat$y))
#calculate the model r2
1 -mean(SSres)/SSres.null
[,1]
[1,] 0.6572
Conclusions: 65.72% of the variation in $y$ abundance can be explained by its relationship with $x$.
$R^2$ entirely via JAGS

modelString="
model {
for (i in 1:N) {
res[i] <- Y[i] - lambda[i]
resnull[i] <- Y[i] - meanY
}
meanY <- mean(Y)
RSS <- sum(res^2)
RSSnull <- sum(resnull^2)
r2 <- 1-RSS/RSSnull
}
"
writeLines(modelString,con='tut11.5bS1.40.bug')
library(R2jags)
system.time(
dat.P.jags <- jags(model='tut11.5bS1.40.bug', data=dat.list, inits=NULL,
param=c('beta0','beta1','r2'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 150
Initializing model
user system elapsed
14.909 0.008 14.933
print(dat.P.jags)
beta0 0.550 0.253 0.052 0.384 0.554 0.723 1.028 1.001 4800
beta1 0.112 0.019 0.076 0.099 0.112 0.124 0.149 1.001 4800
r2 0.655 0.066 0.509 0.640 0.673 0.691 0.701 1.001 4800
deviance 88.418 4.483 86.372 86.885 87.718 89.117 93.725 1.003 4800


pD = 10.1 and DIC = 98.5
## calculate the expected values on the response scale

## calculate the raw SSresid
SSres.brm <- apply((-1 *(sweep(lambda.brm,2 ,dat$y,'-')))^2 ,1 ,sum)
SSres.null <- sum((dat$y - mean(dat$y))^2 )
#OR
SSres.null <- crossprod(dat$y - mean(dat$y))
1 -mean(SSres.brm)/SSres.null
[,1]
[1,] 0.6570747
Finally, we will create a summary plot.
par(mar = c(4 , 5 , 0 , 0 ))
plot(y ~ x, data = dat, type = "n", ann = F, axes = F)
points(y ~ x, data = dat, pch = 16 )
xs <- seq(min(dat$x,na.rm=TRUE ),max(dat$x,na.rm=TRUE ), l = 1000 )
Xmat <- model.matrix(~xs)
ys <- exp(eta)
library(plyr)
library(coda)
data.tab <- adply(ys,2 ,function(x) {
data.frame(Median=median(x), HPDinterval(as.mcmc(x)))
})
data.tab <- cbind(x=xs,data.tab)
points(Median ~ x, data=data.tab,col = "black", type = "l")
lines(lower ~ x, data=data.tab,col = "black", type = "l", lty = 2 )
lines(upper ~ x, data=data.tab,col = "black", type = "l", lty = 2 )
axis(1 )
mtext("X", 1 , cex = 1.5 , line = 3 )
axis(2 , las = 2 )
mtext("Abundance of Y", 2 , cex = 1.5 , line = 3 )
box(bty = "l")
newdata = data.frame(x=seq(min(dat$x), max(dat$x), len=100 ))

Xmat = model.matrix(~x, data=newdata)
coefs <- as.matrix(as.data.frame(rstan:::extract(dat.brm$fit)))
coefs <- coefs[,grep('b', colnames(coefs))]
fit = exp(coefs %*% t(Xmat))
newdata = cbind(newdata,
plyr:::adply(fit, 2 , function(x) {
data.frame(Mean=mean(x), Median=median(x),
HPDinterval(as.mcmc(x)))
})
)
ggplot(newdata, aes(y=Mean, x=x)) +

geom_point(data=dat, aes(y=y)) +
geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue',alpha=0.2 ) +
geom_line() +
scale_x_continuous('X') +
scale_y_continuous('Abundance of Y') +
theme_classic() +
theme(axis.line.x=element_line(),axis.line.y=element_line())
2.7: Defining full log-likelihood function

Now lets try it by specifying log-likelihood and the zero trick. When applying this trick, we need to manually calculate the
deviance as the inbuilt deviance will be based on the log-likelihood of estimating the zeros (as part of the zero trick) rather
than the deviance of the intended model..
The one advantage of the zero trick is that the Deviance and thus DIC, AIC provided by R2jags will be incorrect. Hence, they
too need to be manually defined within jags. I suspect that the AIC calculation I have used is incorrect...

nX <- ncol(Xmat)
dat.list2 <- with(dat,list(Y=y, X=Xmat,N=nrow(dat), mu=rep(0 ,nX),
Sigma=diag(1.0E-06 ,nX), zeros=rep(0 ,nrow(dat)), C=10000 ))
modelString="
model {
for (i in 1:N) {
zeros[i] ~ dpois(zeros.lambda[i])
zeros.lambda[i] <- -ll[i] + C
ll[i] <- Y[i]*log(lambda[i]) - lambda[i] - loggam(Y[i]+1)
llm[i] <- Y[i]*log(meanlambda) - meanlambda - loggam(Y[i]+1)
}
meanlambda <- mean(lambda)
dev <- sum(-2*ll)
pD <- mean(dev)-sum(-2*llm)
AIC <- min(dev+(2*pD))
}
"
library(R2jags)
system.time(
param=c('beta','dev','AIC'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 353
Initializing model
user system elapsed
1.936 0.004 1.942
print(dat.P.jags3)
AIC 1.328e+01 4.076 9.634e+00 1.050e+01 1.200e+01 1.458e+01 2.438e+01 1.001 2400
beta[1] 5.510e-01 0.245 6.200e-02 3.860e-01 5.540e-01 7.160e-01 1.008e+00 1.000 2400
beta[2] 1.120e-01 0.018 7.800e-02 9.900e-02 1.120e-01 1.240e-01 1.470e-01 1.001 2400
dev 8.821e+01 1.868 8.637e+01 8.686e+01 8.764e+01 8.897e+01 9.294e+01 1.002 2400
deviance 4.001e+05 1.868 4.001e+05 4.001e+05 4.001e+05 4.001e+05 4.001e+05 1.000 1


pD = 1.7 and DIC = 400090.0
Section 3: Negative binomial

The following equations are provided since in Bayesian modelling, it is occasionally necessary to directly define the log-
likelihood calculations (particularly for zero-inflated models and other mixture models). Feel free initially gloss over these
equations until such time when your models require them ;)
prob, size alpha, beta (gamma-poisson)
$$ f(y|r,p)=\frac{\Gamma(y+r)} $$ f(y \mid \alpha, \beta) =

{\Gamma(r)\Gamma(y+1)}p^r(1-p)^y $$ where $p$ is \frac{\beta^{\alpha}y^{\alpha-1}e^{-\beta y}}
the probability of $y$ successes until $r$ failures. If, we {\Gamma(\alpha)} $$ where $$ \begin{align}
make $p=\frac{size}{size+\mu}$, then we can define the l(\lambda;y)=&\sum\limits^n_{i=1} log\Gamma(y_i +
function in terms of $\mu$ $$ \alpha) - log\Gamma(y_i+1) - log\Gamma(\alpha) + \\ &
f(y|r,\mu)=\frac{\Gamma(y+r)} \alpha(log(\beta_i) - log(\beta_i+1)) - y_i.log(\beta_i+1)

{\Gamma(r)\Gamma(y+1)}\left(\frac{r} \end{align} $$
{\mu+r}\right)^r\left(1-\frac{r}{\mu+r}\right)^y $$
where $p$ is the probability of $y$ successes until $r$
failures. $$\mu = \frac{r(1-p)}{p}\\ E(Y)=\mu,
Var(Y)=\mu+\frac{\mu^2}{r} $$


positive integers.
set.seed(37 ) #16 #35

n.x <- 20
intercept <- 0.6
slope=0.1
#Predicted y values
y <- rnbinom(n=n.x, mu=lambda, size=1 )
dat.nb <- data.frame(y,x)

analysis
5. Dispersion is either 1 or overdispersion is otherwise accounted for in the model
6. The number of zeros is either not excessive or else they are specifically addressed by the model
When counts are all very large (not close to 0) and their ranges do not span orders of magnitude, they take on very Gaussian
properties (symmetrical distribution and variance independent of the mean). Given that models based on the Gaussian
distribution are more optimized and recognized than Generalized Linear Models, it can be prudent to adopt Gaussian models for
such data. Hence it is a good idea to first explore whether a Poisson or Negative Binomial model is likely to be more appropriate
than a standard Gaussian model.
Recall from Poisson regression, there are five main potential models that we could consider fitting to these data.
Remind me again of those five potential models
There are five main potential models we could consider fitting to these data:
1. Ordinary least squares regression (general linear model) - assumes normality of residuals
2. Poisson regression - assumes mean=variance (dispersion=1)
3. Quasi-poisson regression - a general solution to overdispersion. Assumes variance is a function of mean, dispersion
estimated, however likelihood based statistics unavailable
4. Negative binomial regression - a specific solution to overdispersion caused by clumping (due to an unmeasured
latent variable). Scaling factor ($\omega$) is estimated along with the regression parameters.
5. Zero-inflation model - a specific solution to overdispersion caused by excessive zeros (due to an unmeasured latent
variable). Mixture of binomial and Poisson models.

Check the distribution of the $y$ abundances
hist(dat.nb$y)
boxplot(dat.nb$y, horizontal=TRUE )
rug(jitter(dat.nb$y))
There is definitely signs of non-normality that would warrant Poisson models. Further to that, the rug on the boxplot is
suggestive of clumping (observations are not evenly distributed and well spaced out). Together the, skewness and clumping
could point to an overdispersed Poisson. Negative binomial models are one of the most effective means of modeling such data.

Lets now explore linearity by creating a histogram of the predictor variable ($x$) and a scatterplot of the relationship between
the response ($y$) and the predictor ($x$)
hist(dat.nb$x)

plot(y~x, dat.nb, log="y")
with(dat.nb, lines(lowess(y~x)))
smoother on the scatterplot does not display major deviations from a straight line and thus linearity is satisfied. Violations of
linearity could be addressed by either:


dat.nb.tab<-table(dat.nb$y==0 )
dat.nb.tab/sum(dat.nb.tab)
FALSE TRUE
0.95 0.05

mu <- mean(dat.nb$y)
dat.nb.tabE <- table(cnts == 0 )
dat.nb.tabE/sum(dat.nb.tabE)
FALSE
1
proportion of zeros in the data. In this example, the proportion of zeros observed is similar to the proportion expected. Indeed,
there was only a single zero observed. Hence it is likely that if there is overdispersion it is unlikely to be due to excessive zeros.

The boxplot of $y$ with the axis rug suggested that there might be some clumping (possibly due to some other unmeasured
influence). It is therefore likely that the data are overdispersed.
Fit Poisson model and examine dispersion statistic
dat.nb.list <- with(dat.nb,list(Y=y, X=x,N=nrow(dat.nb)))

modelString="
model {
for (i in 1:N) {
}
}
"
library(R2jags)
system.time(
dat.nb.P.jags <- jags(model='../downloads/BUGSscripts/tut11.5bS3.4.bug', data=dat.nb.list, inits=NULL,
n.chain=3 ,
)
Allocating nodes
Graph Size: 105
Initializing model
user system elapsed
1.788 0.000 1.791
print(dat.nb.P.jags)
beta0 0.555 0.278 0.002 0.365 0.568 0.736 1.08 1.003 1000
beta1 0.111 0.020 0.072 0.098 0.111 0.125 0.15 1.002 1400
deviance 135.489 4.131 133.269 133.849 134.697 136.261 141.07 1.010 2700


pD = 8.5 and DIC = 144.0

coefs <- dat.nb.P.jags$BUGSoutput$sims.matrix[,1 :2 ]
lambda <- exp(eta)
Resid <- -1 *sweep(lambda,2 ,dat.nb$y, '-')/sqrt(lambda)

RSS <- apply(Resid^2 , 1 , sum)
Disp <- RSS/(nrow(dat.nb)-ncol(coefs))
var1 3.147 3.243 2.63 4.002 2.786 3.252
The dispersion parameter was 3.2425024, indicating over three times more variability than would be expected for a Poisson
distribution. The data are thus over-dispersed. Given that this is unlikely to be due to zero inflation and the rug plot did suggest
some level of clumping, negative binomial regression would seem reasonable.

3.4.1: JAGS
$$ \begin{align} Y_i&\sim{}NB(p_i,size) & (\text{response distribution})\\ p_i&=size/(size+\lambda_i)\\
log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0,
\beta_1&\sim{}\mathcal{N}(0,10000) & (\text{diffuse Bayesian prior})\\ size &\sim{}\mathcal{U}(0.001,1000)\\
\end{align} $$
dat.nb.list <- with(dat.nb,list(Y=y, X=x,N=nrow(dat.nb)))
modelString="
model {
for (i in 1:N) {
Y[i] ~ dnegbin(p[i],size)
p[i] <- size/(size+lambda[i])
log(lambda[i]) <- beta0 + beta1*X[i]
}
size ~ dunif(0.001,1000)
theta <- pow(1/mean(p),2)
scaleparam <- mean((1-p)/p)
}
"
library(R2jags)
system.time(
dat.NB.jags <- jags(model='../downloads/BUGSscripts/tut11.5bS4.1.bug', data=dat.nb.list, inits=NULL,
param=c('beta0','beta1', 'size','theta','scaleparam'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 157
Initializing model
user system elapsed
17.430 0.008 17.455
Xmat <- model.matrix(~x, dat.nb)

nX <- ncol(Xmat)
dat.nb.list1 <- with(dat.nb,list(Y=y, X=Xmat,N=nrow(dat.nb), nX=nX))
modelString="
model {
for (i in 1:N) {
log(lambda[i]) <- max(-20,min(20,eta[i]))
}
for (i in 1:nX) {
}
size ~ dunif(0.001,10000)
}
"
library(R2jags)
system.time(
dat.NB.jags1 <- jags(model='../downloads/BUGSscripts/tut11.5bS4.11.bug', data=dat.nb.list1, inits=NULL,
param=c('beta'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 212
Initializing model
user system elapsed
17.01 0.00 17.02
print(dat.NB.jags1)
beta[1] 0.733 0.415 -0.069 0.465 0.725 1.005 1.564 1.001 3000
beta[2] 0.097 0.033 0.032 0.075 0.097 0.118 0.163 1.001 3000
deviance 113.046 2.682 110.080 111.088 112.321 114.261 120.171 1.002 2000


pD = 3.6 and DIC = 116.6

nX <- ncol(Xmat)
dat.nb.list2 <- with(dat.nb,list(Y=y, X=Xmat,N=nrow(dat.nb), mu=rep(0 ,nX),Sigma=diag(1.0E-06 ,nX)))
modelString="
model {
for (i in 1:N) {
}
size ~ dunif(0.001,10000)
}
"
library(R2jags)
system.time(
param=c('beta', 'size'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 173
Initializing model
user system elapsed
6.353 0.000 6.361
print(dat.NB.jags2)
beta[1] 0.737 0.421 -0.097 0.460 0.736 1.013 1.555 1.002 3000
beta[2] 0.097 0.033 0.030 0.076 0.097 0.118 0.164 1.001 3000
size 3.102 1.931 1.048 1.885 2.615 3.749 7.885 1.001 3000
deviance 113.067 2.628 110.066 111.150 112.407 114.255 120.059 1.002 1600


pD = 3.5 and DIC = 116.5
dat.nb.brm <- brm(y~x, data=dat.nb, family='negbinomial',

prior=c(set_prior("normal(0,100)", class="b"),
set_prior("student_t(3,0,5)", class="shape")),
SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 1).

#

#

#

independent.
plot(as.mcmc(dat.NB.jags))
library(gridExtra)
grid.arrange(stan_trace(dat.nb.brm$fit, ncol=1 ),
stan_dens(dat.nb.brm$fit, separate_chains=TRUE ,ncol=1 ),
ncol=2 )

autocorr.diag(as.mcmc(dat.NB.jags))
beta0 beta1 deviance scaleparam size theta
Lag 0 1.00000 1.00000 1.0000000 1.0000000 1.00000 1.000000
Lag 10 0.30859 0.30371 0.0455964 0.0180221 0.08759 0.028477
Lag 50 -0.05370 -0.05565 0.0011639 -0.0206865 -0.01812 -0.022056
Lag 100 -0.01028 -0.01914 -0.0003804 -0.0075196 0.01651 -0.004871
Lag 500 0.02130 0.01817 -0.0441559 0.0001324 0.01337 0.022265
iterations. Typically for a negative binomial, it is worth having a large burnin (approximately half of the iterations).
library(R2jags)
dat.NB.jags <- jags(data=dat.nb.list,model.file='../downloads/BUGSscripts/tut11.5bS4.1.bug',
param=c('beta0','beta1','size'),
Allocating nodes
Graph Size: 157
Initializing model
print(dat.NB.jags)
beta0 0.730 0.421 -0.074 0.452 0.725 0.996 1.602 1.000 1500
beta1 0.097 0.033 0.030 0.077 0.098 0.119 0.157 1.001 1500
size 3.167 2.185 1.086 1.911 2.641 3.704 8.551 1.002 1100
deviance 113.089 2.760 110.053 111.122 112.340 114.179 120.468 1.004 780


pD = 3.8 and DIC = 116.9
plot(as.mcmc(dat.NB.jags))
autocorr.diag(as.mcmc(dat.NB.jags))
beta0 beta1 deviance size
Lag 0 1.000000 1.000000 1.000000 1.00000
Lag 50 0.002132 0.007780 0.002472 0.00480
Lag 250 -0.014601 -0.013641 -0.012261 -0.06051
Lag 500 0.019129 0.015262 -0.023785 0.01291
Lag 2500 0.039030 0.008195 -0.013073 -0.01248
Conclusions: the samples are now less auto-correlated. Ideally, we should probably collect even more samples. Whilst the
traceplots are reasonably noisy, there is more of a signal or pattern than there ideally should be.
stan_ac(dat.nb.brm$fit)

summary(do.call(rbind, args = get_sampler_params(dat.nb.brm$fit, inc_warmup = FALSE )), digits =

2 )
Min. :0.012 Min. :0.63 Min. :1.0 Min. : 1.0 Min. :0
1st Qu.:0.850 1st Qu.:0.63 1st Qu.:2.0 1st Qu.: 3.0 1st Qu.:0
Median :0.937 Median :0.75 Median :2.0 Median : 3.0 Median :0
Mean :0.884 Mean :0.76 Mean :2.2 Mean : 3.8 Mean :0
3rd Qu.:0.990 3rd Qu.:0.90 3rd Qu.:3.0 3rd Qu.: 5.0 3rd Qu.:0
stan_diag(dat.nb.brm$fit)
stan_diag(dat.nb.brm$fit, information = "stepsize")
stan_diag(dat.nb.brm$fit, information = "treedepth")
stan_diag(dat.nb.brm$fit, information = "divergence")
library(gridExtra)
grid.arrange(stan_rhat(dat.nb.brm$fit) + theme_classic(8 ),
stan_ess(dat.nb.brm$fit) + theme_classic(8 ),
stan_mcse(dat.nb.brm$fit) + theme_classic(8 ),
ncol = 2 )
all below 1.05, the effective sample size is always above 0.5 and the ratio of MCMC error to posterior sd is very small.
We now explore the goodness of fit of the models via the residuals and deviance. We could calculate the Pearsons's
residuals within the JAGS model. Alternatively, we could use the parameters to generate the residuals outside of JAGS.
coefs <- dat.NB.jags$BUGSoutput$sims.matrix[,1 :2 ]
size <- dat.NB.jags$BUGSoutput$sims.matrix[,'size']
Xmat <- model.matrix(~x, data=dat.nb)
lambda <- exp(eta)
varY <- lambda + (lambda^2 )/size
Resid <- -1 *sweep(lambda,2 ,dat.nb$y,'-')/sqrt(varY)
Resid.nb.brm <- residuals(dat.nb.brm, type='pearson')
Fitted.nb.brm <- fitted(dat.nb.brm, scale='linear')
ggplot(data=NULL, aes(y=Resid.nb.brm[,'Estimate'], x=Fitted.nb.brm[,'Estimate'])) + geom_point()
There are no real patterns in the residuals.

set.seed(2 )
#generate a matrix of draws from a negative binomial distribution
# the matrix is the same dimensions as pi and uses the probabilities of pi
YNew <- matrix(rnbinom(length(lambda),mu=lambda, size=size),nrow=nrow(lambda))
Resid1<-(lambda - YNew)/sqrt(varY)
[1] 0.4127

nX <- ncol(Xmat)
dat.nb.list2 <- with(dat.nb,list(Y=y, X=Xmat,N=nrow(dat.nb), mu=rep(0 ,nX),Sigma=diag(1.0E-
06 ,nX)))
modelString="
model {
for (i in 1:N) {
Y1[i] ~ dnegbin(p[i],size)
varY[i] <- lambda[i] + pow(lambda[i],2)/size
Resid[i] <- (Y[i] - lambda[i])/sqrt(varY[i])
Resid1[i] <- (Y1[i] - lambda[i])/sqrt(varY[i])
}
size ~ dunif(0.001,10000)
}
"
library(R2jags)
system.time(
dat.NB.jags3 <- jags(model='../downloads/BUGSscripts/tut11.5bS4.61.bug', data=dat.nb.list2,
inits=NULL,
param=c('beta','Pvalue'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 400
Initializing model
user system elapsed
6.717 0.000 6.726
print(dat.NB.jags3)
Pvalue 0.421 0.494 0.000 0.000 0.000 1.000 1.000 1.002 1000
beta[1] 0.763 0.422 -0.035 0.482 0.746 1.044 1.619 1.002 1400
beta[2] 0.095 0.033 0.030 0.072 0.095 0.117 0.160 1.002 1500
deviance 113.064 2.596 110.071 111.143 112.361 114.276 119.730 1.001 3000


pD = 3.4 and DIC = 116.4
Resid.nb.brm <- residuals(dat.nb.brm, type='pearson', summary=FALSE )

SSres.nb.brm <- apply(Resid.nb.brm^2 ,1 ,sum)
lambda.nb.brm = fitted(dat.nb.brm, scale='response', summary=FALSE )
YNew.nb.brm <- matrix(rpois(length(lambda.nb.brm), lambda=lambda.nb.brm),nrow=nrow(lambda.nb.brm))
Resid1.nb.brm<-(lambda.nb.brm - YNew.nb.brm)/sqrt(lambda.nb.brm)
SSres.sim.nb.brm<-apply(Resid1.nb.brm^2 ,1 ,sum)
mean(SSres.sim.nb.brm>SSres.nb.brm)
[1] 0.8286667
Conclusions: the Bayesian p-value is not far from 0.5, suggesting that there is a good fit of the model to the data.
Unfortunately, unlike with linear models (Gaussian family), the expected distribution of data (residuals) varies over the
range of fitted values for numerous (often competing) ways that make diagnosing (and attributing causes thereof) miss-
specified generalized linear models from standard residual plots very difficult. The use of standardized (Pearson) residuals
or deviance residuals can partly address this issue, yet they still do not offer completely consistent diagnoses across all
issues (miss-specified model, over-dispersion, zero-inflation).
An alternative approach is to use simulated data from the model posteriors to calculate an empirical cumulative density
function from which residuals are are generated as values corresponding to the observed data along the density function.

coefs <- dat.NB.jags$BUGSoutput$sims.matrix[,1 :2 ]
size <- dat.NB.jags$BUGSoutput$sims.matrix[,'size']
lambda <- exp(eta)
simRes <- function(lambda, data,n=250 , plot=T, family='negbin', size=NULL) {

require(gap)
N = nrow(data)
'poisson' = matrix(rpois(n*N,apply(lambda,2 ,mean)),ncol=N, byrow=TRUE ),
'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2 ,mean),size),ncol=N, byrow=TRUE )
)
resid<-NULL
if (plot==T) {
par(mfrow=c(1 ,2 ))
}
resid
}
simRes(lambda,dat.nb, family='negbin', size=mean(size))
[1] 0.208 0.956 0.428 0.688 0.080 0.888 0.124 0.728 0.144 0.824 0.460 0.352 0.100 0.172 0.684 0.208
0.004 0.688 0.892 0.876
close to 1 or 0 (which would imply overdispersion).

size <- as.matrix(as.data.frame(rstan:::extract(dat.nb.brm$fit, par='shape')))
simRes <- function(lambda, data,n=250 , plot=T, family='poisson', size=NULL) {

require(gap)
N = nrow(data)

'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2 ,mean),size),ncol=N, byrow=TRUE )
)
resid<-NULL
if (plot==T) {
par(mfrow=c(1 ,2 ))
}
resid
}
simRes(lambda.nb.brm, dat.nb, family='negbin', size=mean(size))
[1] 0.332 0.924 0.420 0.748 0.128 0.888 0.108 0.680 0.184 0.824 0.480 0.464 0.092 0.180 0.672
0.212 0.000 0.628 0.884 0.844

As with most Bayesian models, it is best to base conclusions on medians rather than means.
print(dat.NB.jags)
beta0 0.730 0.421 -0.074 0.452 0.725 0.996 1.602 1.000 1500
beta1 0.097 0.033 0.030 0.077 0.098 0.119 0.157 1.001 1500
size 3.167 2.185 1.086 1.911 2.641 3.704 8.551 1.002 1100
deviance 113.089 2.760 110.053 111.122 112.340 114.179 120.468 1.004 780


pD = 3.8 and DIC = 116.9
library(plyr)
adply(dat.NB.jags$BUGSoutput$sims.matrix, 2 , function(x) {
})
1 beta0 0.72460 0.72989 -0.08683 1.5612 0.40701 0.9444
2 beta1 0.09763 0.09734 0.03644 0.1629 0.07544 0.1171
3 deviance 112.34047 113.08851 109.87327 118.6993 110.36278 112.7765
4 size 2.64119 3.16733 0.64845 7.0820 1.21209 2.7610
scale.
library(plyr)
adply(exp(dat.NB.jags$BUGSoutput$sims.matrix[,1 :2 ]), 2 , function(x) {
})
1 beta0 2.064 2.274 0.7971 4.352 1.353 2.368
2 beta1 1.103 1.103 1.0307 1.170 1.078 1.124
($e^{0.0976275}=1.1025521$) 1.1025521 unit increase in the abundance of $y$.
summary(dat.nb.brm)
Family: negbinomial (log)
Formula: y ~ x
Data: dat.nb (Number of observations: 20)
WAIC: Not computed
Fixed Effects:
Intercept 0.75 0.42 -0.04 1.59 1124 1
x 0.10 0.03 0.03 0.16 1110 1
Family Specific Parameters:

shape 2.64 1.27 0.93 5.76 1214 1

exp(coef(dat.nb.brm))
mean
Intercept 2.114221
x 1.100268
coefs.nb.brm <- as.matrix(as.data.frame(rstan:::extract(dat.nb.brm$fit)))

coefs.nb.brm <- coefs.nb.brm[,grep('b', colnames(coefs.nb.brm))]
plyr:::adply(exp(coefs.nb.brm), 2 , function(x) {
})
1 b_Intercept 2.312289 2.123695 0.8215863 4.460667
2 b_x 1.100884 1.100589 1.0281742 1.173267
marginal_effects(dat.nb.brm)
3.6.1: Further explorations of the trends


lambda <- exp(eta)
SSres <- apply((-1 *(sweep(lambda,2 ,dat.nb$y,'-')))^2 ,1 ,sum)
SSres.null <- sum((dat.nb$y - mean(dat.nb$y))^2 )
#OR
SSres.null <- crossprod(dat.nb$y - mean(dat.nb$y))
[,1]
[1,] 0.2719

nX <- ncol(Xmat)
dat.nb.list2 <- with(dat.nb,list(Y=y, X=Xmat,N=nrow(dat.nb), mu=rep(0 ,nX),Sigma=diag(1.0E-06 ,nX)))
modelString="
model {
for (i in 1:N) {
res[i] <- Y[i] - lambda[i]

}
meanY <- mean(Y)
size ~ dunif(0.001,10000)
RSS <- sum(res^2)
r2 <- 1-RSS/RSSnull
}
"
library(R2jags)
system.time(
param=c('beta','r2'),
n.chain=3 ,

)
Allocating nodes
Graph Size: 217
Initializing model
user system elapsed
6.181 0.000 6.191
print(dat.NB.jags4)
beta[1] 0.727 0.405 -0.073 0.463 0.730 0.989 1.529 1.001 3000
beta[2] 0.097 0.032 0.034 0.076 0.097 0.117 0.163 1.002 1500
r2 0.276 0.148 -0.029 0.245 0.310 0.352 0.390 1.001 3000
deviance 113.051 2.625 110.048 111.134 112.366 114.231 120.033 1.001 3000


pD = 3.4 and DIC = 116.5

SSres.nb.brm <- apply((-1 *(sweep(lambda.nb.brm,2 ,dat.nb$y,'-')))^2 ,1 ,sum)
SSres.null <- sum((dat.nb$y - mean(dat.nb$y))^2 )
#OR
SSres.null <- crossprod(dat.nb$y - mean(dat.nb$y))
1 -mean(SSres.nb.brm)/SSres.null
[,1]
[1,] 0.2601319
par(mar = c(4 , 5 , 0 , 0 ))
plot(y ~ x, data = dat.nb, type = "n", ann = F, axes = F)
points(y ~ x, data = dat.nb, pch = 16 )
xs <- seq(min(dat.nb$x,na.rm=TRUE ),max(dat.nb$x,na.rm=TRUE ), l = 1000 )
ys <- exp(eta)
library(plyr)
library(coda)
})
axis(1 )
mtext("X", 1 , cex = 1.5 , line = 3 )
axis(2 , las = 2 )
box(bty = "l")
newdata = data.frame(x=seq(min(dat.nb$x), max(dat.nb$x), len=100 ))

coefs <- as.matrix(as.data.frame(rstan:::extract(dat.nb.brm$fit)))
})

geom_point(data=dat.nb, aes(y=y)) +
geom_line() +
theme_classic() +


nX <- ncol(Xmat)
dat.nb.list2 <- with(dat.nb,list(Y=y, X=Xmat,N=nrow(dat.nb), mu=rep(0 ,nX),
Sigma=diag(1.0E-06 ,nX), zeros=rep(0 ,nrow(dat.nb)), C=10000 ))
modelString="
model {
for (i in 1:N) {
ll[i] <- loggam(Y[i]+size) - loggam(Y[i]+1) - loggam(size) + size*(log(p[i]) - log(p[i]+1)) -
Y[i]*log(p[i]+1)
p[i] <- size/lambda[i]
}
size ~ dunif(0.001,1000)
dev <- sum(-2*ll)
}
"
library(R2jags)
system.time(
param=c('beta','dev'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 453
Initializing model
user system elapsed
16.996 0.000 17.014
print(dat.NB.jags3)
beta[1] 0.741 0.417 -0.074 0.461 0.738 1.020 1.559 1.002 990
beta[2] 0.096 0.033 0.033 0.074 0.097 0.118 0.160 1.003 640
dev 113.085 2.662 110.073 111.179 112.398 114.194 120.071 1.000 2400
deviance 400113.085 2.662 400110.073 400111.179 400112.398 400114.194 400120.071 1.000 1


pD = 3.5 and DIC = 400116.6
Section 4: Zero-inflated Poisson (ZIP) regression

Zero-Inflation Poisson (ZIP) mixture model is defined as: $$ p(y_i|\theta,\lambda) = \left\{ \begin{array}{l l} \theta + (1-
\theta)\times \text{Pois}(0|\lambda) & \quad \text{if $y_i=0$ and}\\ (1-\theta)\times \text{Pois}(y_i|\lambda) & \quad
\text{if $y_i>0$} \end{array} \right. $$ where $\theta$ is the probability of false values (zeros).
Hence there is essentially two models coupled together (a mixture model) to yield an overall probability:
when an observed response is zero ($y_i=0$), it is the probability of getting a false value (zero) plus the probability of a
true value multiplied probability of drawing a value of zero from a Poisson distribution of $lambda$
when an observed response is greater than 0, it is the probability of a true value multiplied probability of drawing that
value from a Poisson distribution of $lambda$
The above formulation indicates the same $\lambda$ for both the zeros and non-zeros components. In the model of zero
values, we are essentially investigating whether the likelihood of false zeros is related to the linear predictor and then the
greater than zero model investigates whether the counts are related to the linear predictor.
However, we are typically less interested in modelling determinants of false zeros. Indeed, it is better that the likelihood of
false zeros be unrelated to the linear predictor. For example, if excess (false zeros) are due to issues of detectability
(individuals are present, just not detected), it is better that the detectability is not related to experimental treatments. Ideally,
any detectability issues should be equal across all treatment levels.
The expected value of $Y$ and the variance in $Y$ for a ZIP model are: $$ E(y_i) = \lambda\times(1-\theta)\\ Var(y_i) =
\lambda\times(1-\theta)\times(1+\theta\times\lambda^2) $$


positive integers.
set.seed(37 ) #34.5 #4 #10 #16 #17 #26

n.x <- 20
intercept <- 0.6
slope=0.1
#Predicted y values
library(gamlss.dist)
#fixed latent binomial
y<- rZIP(n.x,lambda, 0.4 )
#latent binomial influenced by the linear predictor
#y<- rZIP(n.x,lambda, 1-exp(linpred)/(1+exp(linpred)))
dat.zip <- data.frame(y,x)
summary(glm(y~x, dat.zip, family="poisson"))

Call:
glm(formula = y ~ x, family = "poisson", data = dat.zip)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5415 -2.3769 -0.9753 1.1736 3.6380
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.67048 0.33018 2.031 0.0423 *
x 0.02961 0.02663 1.112 0.2662
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 85.469 on 19 degrees of freedom

Residual deviance: 84.209 on 18 degrees of freedom
AIC: 124.01
Number of Fisher Scoring iterations: 6
plot(glm(y~x, dat.zip, family="poisson"))
library(pscl)
summary(zeroinfl(y ~ x | 1 , dist = "poisson", data = dat.zip))
Call:
zeroinfl(formula = y ~ x | 1, data = dat.zip, dist = "poisson")
Pearson residuals:
-1.0015 -0.9556 -0.3932 0.9663 1.6195
Count model coefficients (poisson with log link):

(Intercept) 0.70474 0.31960 2.205 0.027449 *
x 0.08734 0.02532 3.449 0.000563 ***
Zero-inflation model coefficients (binomial with logit link):

(Intercept) -0.2292 0.4563 -0.502 0.615
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Number of iterations in BFGS optimization: 13

Log-likelihood: -36.17 on 3 Df
plot(resid(zeroinfl(y ~ x | 1 , dist = "poisson", data = dat.zip))~fitted(zeroinfl(y ~ x | 1 , dist =

"poisson", data = dat.zip)))
library(gamlss)
summary(gamlss(y~x,data=dat.zip, family=ZIP))
GAMLSS-RS iteration 1: Global Deviance = 72.4363
*******************************************************************
Family: c("ZIP", "Poisson Zero Inflated")
Call: gamlss(formula = y ~ x, family = ZIP, data = dat.zip)
Fitting method: RS()
-------------------------------------------------------------------
Mu link function: log
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.70582 0.31958 2.209 0.04123 *
x 0.08719 0.02533 3.442 0.00311 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-------------------------------------------------------------------
Sigma link function: logit
Sigma Coefficients:
(Intercept) -0.2292 0.4563 -0.502 0.622
-------------------------------------------------------------------
No. of observations in the fit: 20
Degrees of Freedom for the fit: 3
Residual Deg. of Freedom: 17
at cycle: 3
Global Deviance: 72.34282

AIC: 78.34282
SBC: 81.33002
*******************************************************************
predict(gamlss(y~x,data=dat.zip, family=ZIP), se.fit=TRUE , what="mu")

$fit
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
0.7966905 0.9236189 1.0263933 1.0369823 1.1305358 1.3763304 1.4054417 1.4299603 1.4951229 1.6161339
1.7035853 1.7629994 1.8678543 1.9838052 1.9951833 2.1187071 2.1249555 2.1431616 2.1837285 2.3064727
$se.fit
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
0.3517894 0.3089432 0.2758980 0.2726011 0.2445792 0.1856062 0.1807322 0.1770893 0.1696661 0.1656458
0.1710016 0.1783138 0.1973009 0.2251987 0.2282363 0.2637860 0.2656903 0.2712879 0.2840032 0.3241532

analysis

hist(dat.zip$y)
boxplot(dat.zip$y, horizontal=TRUE )
rug(jitter(dat.zip$y))
There is definitely signs of non-normality that would warrant Poisson models. Further to that, there appears to be a large
number of zeros that are likely to be the cause of overdispersion A zero-inflated Poisson model is likely to be one of the most
effective for modeling these data.

Lets now explore linearity by creating a histogram of the predictor variable ($x$). Note, it is difficult to directly assess issues of
linearity. Indeed, a scatterplot with lowess smoother will be largely influenced by the presence of zeros. One possible way of
doing so is to explore the trend in the non-zero data.
hist(dat.zip$x)

plot(y~x, dat.zip)
with(subset(dat.zip,y>0 ), lines(lowess(y~x)))
smoother on the non-zero data cloud does not display major deviations from a straight line and thus linearity is likely to be
satisfied. Violations of linearity (whilst difficult to be certain about due to the unknown influence of the zeros) could be
addressed by either:


dat.zip.tab<-table(dat.zip$y==0 )
dat.zip.tab/sum(dat.zip.tab)
FALSE TRUE
0.55 0.45

mu <- mean(dat.zip$y)
dat.zip.tabE <- table(cnts == 0 )
dat.zip.tabE/sum(dat.zip.tabE)
FALSE TRUE
0.921 0.079
proportion of zeros in the data. In this example, the proportion of zeros observed (45%) far exceeds that that would have been
expected (7.9%). Hence it is highly likely that any models will be zero-inflated.

The exploratory data analyses have suggested that a zero-inflated Poisson model might be the most appropriate for these data.
Fit Poisson model and examine dispersion statistic
dat.zip.list <- with(dat.zip,list(Y=y, X=x,N=nrow(dat.zip)))

modelString="
model {
for (i in 1:N) {
}
}
"
library(R2jags)
system.time(
dat.zip.P.jags <- jags(model='../downloads/BUGSscripts/tut11.5bS5.4.bug', data=dat.zip.list,
inits=NULL,
n.chain=3 ,
)
Allocating nodes
Graph Size: 105
Initializing model
user system elapsed
1.476 0.000 1.477
print(dat.zip.P.jags)
beta0 0.635 0.326 -0.061 0.431 0.644 0.857 1.250 1.001 2400
beta1 0.032 0.025 -0.018 0.014 0.031 0.048 0.084 1.001 2800
deviance 121.928 2.039 120.059 120.548 121.311 122.606 127.122 1.001 3000


pD = 2.1 and DIC = 124.0

coefs <- dat.zip.P.jags$BUGSoutput$sims.matrix[,1 :2 ]
lambda <- exp(eta)
Resid <- -1 *sweep(lambda,2 ,dat.zip$y, '-')/sqrt(lambda)

Disp <- RSS/(nrow(dat.zip)-ncol(coefs))
var1 4.195 4.325 3.181 5.642 3.605 4.426
The dispersion parameter was 4.3250848, indicating over three times more variability than would be expected for a Poisson
distribution. The data are thus over-dispersed. Given the large number of zeros in the response, it would seem likely that the
overdispersion is as a result of the excessive zeros and thus zero-inflated Poisson model would seem reasonable. Note, if this
model is still overdispersed (possibly due to clumpiness in the non-zero values), then a zero-inflated negative binomial model
might be worth exploring.
4.3.1: JAGS
$$ Y_i = \left\{ \begin{array}{lrcl} 0~\text{with}~P(y_i) = 1-\mu & z_i&\sim&\text{Binom}(1-\theta)\\ &
logit(\theta)&=&\gamma_0\\ &\gamma_0&\sim{}&\mathcal{N}(0,10000)\\ >0 & Y_i&\sim&\text{Pois}(\lambda_i)\\
&\lambda_i&=&z_i + \eta_i\\ &log(\eta_i)&=&\beta_0+\beta_1x_1\\ &\beta_0, \beta_1&\sim{}&\mathcal{N}(0,10000)\\
\end{array} \right. $$ Or in shorthand: $$ \begin{align} Y_i&\sim{}ZIP(\lambda,\theta) & (\text{response distribution})\\
logit(\theta)&=\gamma_0 & (\text{link function/linear predictor - zero component})\\ log(\lambda_i)&=\eta_i & (\text{link
function - count component})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor - count component})\\ \beta_0,
\beta_1, \gamma_0&\sim{}\mathcal{N}(0,10000) & (\text{diffuse Bayesian priors})\\ \end{align} $$
dat.zip.list <- with(dat.zip,list(Y=y, X=x,N=nrow(dat.nb), z=ifelse(y==0 ,0 ,1 )))
modelString="
model {
for (i in 1:N) {
z[i] ~ dbern(one.minus.theta)
lambda[i] <- z[i]*eta[i]
log(eta[i]) <- beta0 + beta1*X[i]
}
one.minus.theta <- 1-theta
logit(theta) <- gamma0
gamma0 ~ dnorm(0,1.0E-06)
}
"
library(R2jags)
system.time(
dat.zip.jags <- jags(model='../downloads/BUGSscripts/tut11.5bS5.5.bug', data=dat.zip.list, inits=NULL,
param=c('beta0','beta1', 'gamma0','theta'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 149
Initializing model
user system elapsed
2.492 0.004 2.503
Xmat <- model.matrix(~x, dat.zip)

nX <- ncol(Xmat)
dat.zip.list1 <- with(dat.zip,list(Y=y, X=Xmat,N=nrow(dat.zip), nX=nX, z=ifelse(y==0 ,0 ,1 )))
modelString="
model {
for (i in 1:N) {
log(eta[i]) <- inprod(beta[], X[i,])
}
for (i in 1:nX) {
}
}
"
library(R2jags)
system.time(
dat.zip.jags1 <- jags(model='../downloads/BUGSscripts/tut11.5bS5.6.bug', data=dat.zip.list1,
inits=NULL,
param=c('beta','gamma0'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 171
Initializing model
user system elapsed
2.465 0.004 2.470
print(dat.zip.jags1)
beta[1] 0.709 0.316 0.059 0.495 0.723 0.922 1.304 1.001 3000
beta[2] 0.086 0.025 0.039 0.070 0.085 0.103 0.135 1.001 3000
gamma0 -0.197 0.462 -1.097 -0.508 -0.198 0.113 0.724 1.001 3000
deviance 75.675 2.578 72.834 73.815 74.979 76.794 82.470 1.001 3000


pD = 3.3 and DIC = 79.0
multivariate normal priors are defined as: $$ \boldsymbol{\beta}\sim{}\mathcal{N_k}(\boldsymbol{\mu}, \mathbf{\Sigma})
$$ where $$\boldsymbol{\mu}=[E[\beta_1],E[\beta_2],...,E[\beta_k]] = \left(\begin{array}
{ccc}1000^2&0&0\\0&\ddots&0\\0&0&1000^2\end{array} \right) $$ hence, along with the response and predictor matrix, we
need to supply $\boldsymbol{\mu}$ (a vector of zeros) and $\boldsymbol{\Sigma}$ (a covariance matrix with $1000^2$ in
the diagonals).

nX <- ncol(Xmat)
dat.zip.list2 <- with(dat.zip,list(Y=y, X=Xmat,N=nrow(dat.zip), nX=nX,
mu=rep(0 ,nX),Sigma=diag(1.0E-06 ,nX), z=ifelse(y==0 ,0 ,1 )))
modelString="
model {
for (i in 1:N) {
}
}
"
library(R2jags)
system.time(
dat.zip.jags2 <- jags(model='../downloads/BUGSscripts/tut11.5bS5.7.bug', data=dat.zip.list2, inits=NULL,
param=c('beta', 'gamma0'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 176
Initializing model
user system elapsed
0.912 0.004 0.923
beta[1] 0.700 0.321 0.054 0.484 0.702 0.924 1.301 1.001 3000
beta[2] 0.087 0.025 0.038 0.070 0.087 0.104 0.136 1.001 3000
gamma0 -0.217 0.462 -1.155 -0.524 -0.201 0.094 0.674 1.001 3000
deviance 75.688 2.477 72.857 73.895 75.026 76.816 82.064 1.002 2400


pD = 3.1 and DIC = 78.8
Note, the n.eff indicates that we probably have an issue with chain mixing and/or autocorrelation. We probably should
increase the number of iterations and the thinning rate.
dat.zip.brm <- brm(y~0 +spec + main + main:x, data=dat.zip, family='zero_inflated_poisson',

SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 1).

#

#


#
## Or if we allow the zero-inflation model and the count model to vary with x
dat.zip.brm1 <- brm(y~trait:x, data=dat.zip, family='zero_inflated_poisson',

#

#

#

independent.
plot(as.mcmc(dat.zip.jags))
library(gridExtra)
grid.arrange(stan_trace(dat.zip.brm$fit, ncol=1 ),
stan_dens(dat.zip.brm$fit, separate_chains=TRUE ,ncol=1 ),
ncol=2 )

autocorr.diag(as.mcmc(dat.zip.jags))
beta0 beta1 deviance gamma0 theta
Lag 0 1.000000 1.0000000 1.000000 1.0000000 1.000000
Lag 10 0.275638 0.2908784 0.043162 0.0469457 0.048194
Lag 50 -0.006762 0.0226283 -0.007629 0.0035146 0.002608
Lag 100 0.016017 0.0046681 -0.008336 0.0003516 -0.000948
Lag 500 -0.001439 -0.0009426 -0.003084 -0.0085311 -0.010254
iterations.
library(R2jags)
dat.zip.jags <- jags(data=dat.zip.list,model.file='../downloads/BUGSscripts/tut11.5bS5.5.bug',
param=c('beta0','beta1','gamma0','theta'),
Allocating nodes
Graph Size: 149
Initializing model
print(dat.zip.jags)
beta0 0.706 0.319 0.043 0.494 0.716 0.928 1.301 1.001 2800
beta1 0.086 0.026 0.037 0.069 0.086 0.103 0.137 1.001 2300
gamma0 -0.220 0.466 -1.127 -0.541 -0.212 0.107 0.670 1.002 1800
theta 0.448 0.110 0.245 0.368 0.447 0.527 0.661 1.001 2000
deviance 75.685 2.453 72.827 73.883 75.054 76.878 81.846 1.001 2300


pD = 3.0 and DIC = 78.7
plot(as.mcmc(dat.zip.jags))
autocorr.diag(as.mcmc(dat.zip.jags))
beta0 beta1 deviance gamma0 theta
Lag 0 1.000000 1.000000 1.000000 1.000000 1.000000
Lag 50 -0.003957 -0.029631 0.016977 -0.019662 -0.019085
Lag 250 0.020470 0.011623 -0.016871 0.009786 0.011709
Lag 500 0.010446 0.003702 0.003925 -0.048037 -0.047746
Lag 2500 -0.026443 0.002784 -0.021067 -0.005668 -0.003888
Conclusions: the samples are now less auto-correlated and the chains are all well mixed and appear stable.
stan_ac(dat.zip.brm$fit)

summary(do.call(rbind, args = get_sampler_params(dat.zip.brm$fit, inc_warmup = FALSE )), digits =

2 )
stan_diag(dat.zip.brm$fit)
stan_diag(dat.zip.brm$fit, information = "stepsize")
stan_diag(dat.zip.brm$fit, information = "treedepth")
stan_diag(dat.zip.brm$fit, information = "divergence")
library(gridExtra)
grid.arrange(stan_rhat(dat.zip.brm$fit) + theme_classic(8 ),
stan_ess(dat.zip.brm$fit) + theme_classic(8 ),
stan_mcse(dat.zip.brm$fit) + theme_classic(8 ),
ncol = 2 )
all below 1.05, the effective sample size is not too bad (we might be able to achieve better acceptance ratios with tighter
priors) and the ratio of MCMC error to posterior sd is very small.
coefs <- dat.zip.jags$BUGSoutput$sims.matrix[,1 :2 ]
theta <- dat.zip.jags$BUGSoutput$sims.matrix[,'theta']
Xmat <- model.matrix(~x, data=dat.zip)
lambda<-coefs %*% t(Xmat)
eta <- exp(lambda)
expY <- sweep(eta,1 ,(1 -theta),"*")
varY <- eta+sweep(eta^2 ,1 ,theta,"*")
varY <- sweep(varY,1 ,(1 -theta),'*')
Resid <- -1 *sweep(expY,2 ,dat.zip$y,'-')/sqrt(varY)
Resid.zip.brm <- residuals(dat.zip.brm, type='pearson')[,'Estimate']
#Extract the fitted values. Note, since a zero-inflated model is really two models
#(the binary process and the poisson process), there are fitted values associated
#with both processes. The first half of the fitted values are associated with
#the count process (the ones we are interested in here), the second half are
#associated with the binomial process.
#Furthermore, dont be tempted to use fitted(dat.zip.brm, scale='response',...) with
#zero inflated models, they seem to be a bit odd...
Fitted.zip.brm <- exp(fitted(dat.zip.brm, scale='linear')[1 :nrow(dat.zip),'Estimate'])
ggplot(data=NULL, aes(y=Resid.zip.brm, x=Fitted.zip.brm)) + geom_point()
## Or the x varying zero-inflated and count model

Resid.zip.brm <- residuals(dat.zip.brm1, type='pearson')[,'Estimate']
Fitted.zip.brm <- exp(fitted(dat.zip.brm1, scale='linear')[1 :nrow(dat.zip),'Estimate'])
ggplot(data=NULL, aes(y=Resid.zip.brm, x=Fitted.zip.brm)) + geom_point()

When doing so, we need to consider the expected value and variance of the zero-inflated poisson. $$ E(y_i) =
\lambda\times(1-\theta)\\ Var(y_i) = \lambda\times(1-\theta)\times(1+\theta\times\lambda^2) $$
set.seed(2 )
#generate a matrix of draws from a zero-inflated poisson (ZIP) distribution
# the matrix is the same dimensions as lambda
#YNew <- matrix(rZIP(length(lambda),eta, theta),nrow=nrow(lambda))
lambda <- sweep(eta,1 ,ifelse(dat.zip$y==0 ,0 ,1 ),'*')
YNew <- matrix(rpois(length(lambda),lambda),nrow=nrow(lambda))
Resid1<-(expY - YNew)/sqrt(varY)
[1] 0.4346667
Whilst not ideal (as we would prefer a Bayesian P-value of around 0.5), this value is not wildly bad and does not constitute
overwhelming evidence of a lack of fit.

nX <- ncol(Xmat)
modelString="
model {
for (i in 1:N) {
log(eta[i]) <- max(-20,min(20,inprod(beta[], X[i,])))
expY[i] <- eta[i]*(1-theta)

varY[i] <- (1-theta)*(eta[i]*theta*pow(eta[i],2))
Y1[i] ~ dpois(lambda[i])
}
}
"
library(R2jags)
system.time(
dat.zip.jags3 <- jags(model='../downloads/BUGSscripts/tut11.5bS5.13.bug', data=dat.zip.list2,

inits=NULL,
param=c('beta', 'gamma0','Pvalue'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 465
Initializing model
user system elapsed
1.321 0.004 1.328
Pvalue 0.455 0.498 0.000 0.000 0.000 1.000 1.000 1.001 3000
beta[1] 0.718 0.320 0.079 0.512 0.729 0.921 1.328 1.001 3000
beta[2] 0.085 0.025 0.034 0.069 0.085 0.101 0.136 1.001 3000
gamma0 -0.197 0.465 -1.096 -0.512 -0.195 0.122 0.711 1.002 1600
deviance 75.686 2.594 72.853 73.845 74.987 76.779 82.738 1.006 1500


pD = 3.4 and DIC = 79.0
Resid.zip.brm <- residuals(dat.zip.brm, type='pearson', summary=FALSE )

SSres.zip.brm <- apply(Resid.zip.brm^2 ,1 ,sum)
lambda.zip.brm = fitted(dat.brm, scale='response', summary=FALSE )
YNew.zip.brm <- matrix(rpois(length(lambda.zip.brm), lambda=lambda.zip.brm),
nrow=nrow(lambda.zip.brm))
Resid1.zip.brm<-(lambda.zip.brm - YNew.zip.brm)/sqrt(lambda.zip.brm)
SSres.sim.zip.brm<-apply(Resid1.zip.brm^2 ,1 ,sum)
mean(SSres.sim.zip.brm>SSres.zip.brm)
[1] 0.482
Conclusions: the Bayesian p-value is very close to 0.5, suggesting that there is a good fit of the model to the data.
Since it is difficult to diagnose many issues from the typical residuals we will now explore simulated residuals.
coefs <- dat.zip.jags$BUGSoutput$sims.matrix[,1 :2 ]
Xmat <- model.matrix(~x, data=dat.zip)
lambda <- exp(eta)
simRes <- function(lambda, data,n=250 , plot=T, family='negbin', size=NULL,theta=NULL) {

require(gap)
N = nrow(data)
'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2 ,mean),size),ncol=N, byrow=TRUE ),
'zip' = matrix(gamlss.dist:::rZIP(n*N,apply(lambda,2 ,mean),theta),ncol=N, byrow=TRUE )
)
resid<-NULL
if (plot==T) {
par(mfrow=c(1 ,2 ))
}
resid
}
simRes(lambda,dat.zip, family='zip',theta=theta)
[1] 0.272 0.856 0.480 0.828 0.944 0.468 0.828 0.932 0.616 0.164 0.348 0.268 0.384 0.508 0.340 0.196
0.848 0.284 0.124 0.828
close to 1 or 0 (which would imply overdispersion). Hence, once zero-inflation is accounted for, the model does not display
overdispersion.
Although there is a slight hint of non-linearity in that the residuals are high for low and high fitted values and lower in the
middle, this might well be an artifact of the small data set size. By change, most of the observed values in the middle range
of the predictor were zero.
lambda.zip.brm = exp(fitted(dat.zip.brm, scale='linear', summary=FALSE ))[,1 :nrow(dat.zip)]

theta = binomial()$linkinv(mean(rstan:::extract(dat.zip.brm$fit, 'b_spec')[[1 ]]))
require(gap)
N = nrow(data)
'zip' = matrix(gamlss.dist:::rZIP(n*N,apply(lambda,2 ,mean),theta),ncol=N, byrow=TRUE )
)
resid<-NULL
if (plot==T) {
par(mfrow=c(1 ,2 ))
}
resid
}
simRes(lambda.zip.brm, dat.zip, family='zip', theta=theta)
[1] 0.332 0.820 0.592 0.892 0.940 0.548 0.732 0.900 0.648 0.396 0.272 0.256 0.204 0.464 0.024
0.236 0.764 0.180 0.256 0.824

print(dat.zip.jags)
beta0 0.706 0.319 0.043 0.494 0.716 0.928 1.301 1.001 2800
beta1 0.086 0.026 0.037 0.069 0.086 0.103 0.137 1.001 2300
gamma0 -0.220 0.466 -1.127 -0.541 -0.212 0.107 0.670 1.002 1800
theta 0.448 0.110 0.245 0.368 0.447 0.527 0.661 1.001 2000
deviance 75.685 2.453 72.827 73.883 75.054 76.878 81.846 1.001 2300


pD = 3.0 and DIC = 78.7
library(plyr)
adply(dat.zip.jags$BUGSoutput$sims.matrix, 2 , function(x) {
})
1 beta0 0.71587 0.70638 0.06835 1.3198 0.57561 0.9971
2 beta1 0.08556 0.08604 0.03607 0.1359 0.07058 0.1042
3 deviance 75.05425 75.68540 72.64080 80.6794 73.02722 75.3436
4 gamma0 -0.21225 -0.21991 -1.14123 0.6553 -0.49398 0.1432
5 theta 0.44714 0.44798 0.24209 0.6582 0.37853 0.5353
scale.
library(plyr)
adply(exp(dat.zip.jags$BUGSoutput$sims.matrix[,1 :2 ]), 2 , function(x) {
})
1 beta0 2.046 2.13 0.9547 3.495 1.536 2.392
2 beta1 1.089 1.09 1.0351 1.144 1.073 1.110
($e^{0.0855612}=1.0893282$) 1.0893282 unit increase in the abundance of $y$.
summary(dat.zip.brm)
Family: zero_inflated_poisson (log)
Formula: y ~ 0 + spec + main + main:x
Data: dat.zip (Number of observations: 20)
WAIC: Not computed
Fixed Effects:
spec -0.25 0.49 -1.27 0.65 1060 1
main 0.70 0.32 0.08 1.30 896 1
main:x 0.09 0.03 0.04 0.13 877 1

exp(coef(dat.zip.brm))
mean
spec 0.7810716
main 2.0043001
main:x 1.0907019
coefs.zip.brm <- as.matrix(as.data.frame(rstan:::extract(dat.zip.brm$fit)))

coefs.zip.brm <- coefs.zip.brm[,grep('b', colnames(coefs.zip.brm))]
plyr:::adply(exp(coefs.zip.brm), 2 , function(x) {
})
1 b_spec 0.8761436 0.7914704 0.2026179 1.690390
2 b_main 2.1094730 2.0088827 0.9788579 3.463095
3 b_main.x 1.0910609 1.0910342 1.0360825 1.141645
marginal_effects(dat.zip.brm)

Alternatively, we could use McFadden's psuedo $$R^2 = 1-\frac{\mathcal{LL}(Model_{full})}{\mathcal{LL}
(Model_{reduced})}$$ [[http://www.statisticalhorizons.com/category/uncategorized]] - about a quarter of the way down the
page.
Xmat <- model.matrix(~x, dat=dat.zip)

neta<-coefs %*% t(Xmat)
eta <- exp(neta)

expY <- sweep(lambda,2 ,1 -theta,'*')
SSres <- apply((-1 *(sweep(expY,2 ,dat.zip$y,'-')))^2 ,1 ,sum)
mean(SSres)
[1] 112.5743
SSres.null <- sum((dat.zip$y - mean(dat.zip$y))^2 )

[1] 0.4978847

nX <- ncol(Xmat)
modelString="
model {
for (i in 1:N) {
log(eta[i]) <- max(-20,min(20,inprod(beta[], X[i,])))
expY[i] <- lambda[i]*(1-theta)

res[i] <- Y[i] - expY[i]
}
meanY <- mean(Y)

RSS <- sum(res^2)
r2 <- 1-RSS/RSSnull
}
"
library(R2jags)
system.time(
dat.ZIP.jags4 <- jags(model='../downloads/BUGSscripts/tut11.5bS5.17.bug', data=dat.zip.list2,
inits=NULL,
param=c('beta','r2'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 277
Initializing model
user system elapsed
1.072 0.000 1.075
print(dat.ZIP.jags4)
beta[1] 0.724 0.313 0.064 0.522 0.737 0.941 1.304 1.003 880
beta[2] 0.085 0.025 0.037 0.067 0.084 0.101 0.134 1.002 1200
r2 0.492 0.176 0.113 0.377 0.504 0.622 0.787 1.002 1800
deviance 75.612 2.516 72.833 73.816 74.930 76.635 82.063 1.001 3000


pD = 3.2 and DIC = 78.8
Xmat <- model.matrix(~x, dat=dat.zip)

coefs.zip <- as.matrix(as.data.frame(rstan:::extract(dat.zip.brm$fit)))
coefs.zip <- coefs.zip[,grep('b.main', colnames(coefs.zip))]
neta<-coefs.zip %*% t(Xmat)
eta <- exp(neta)

SSres <- apply((-1 *(sweep(expY,2 ,dat.zip$y,'-')))^2 ,1 ,sum)
SSres.null <- sum((dat.zip$y - mean(dat.zip$y))^2 )

#OR
SSres.null <- crossprod(dat.zip$y - mean(dat.zip$y))

[,1]
[1,] 0.497461
par(mar = c(4 , 5 , 0 , 0 ))
plot(y ~ x, data = dat.zip, type = "n", ann = F, axes = F)
points(y ~ x, data = dat.zip, pch = 16 )
xs <- seq(min(dat.zip$x,na.rm=TRUE ),max(dat.zip$x,na.rm=TRUE ), l = 1000 )
ys <- exp(eta)
library(plyr)
library(coda)
})
axis(1 )
mtext("X", 1 , cex = 1.5 , line = 3 )
axis(2 , las = 2 )
box(bty = "l")
newdata = data.frame(x=seq(min(dat.zip$x), max(dat.zip$x), len=100 ))

coefs <- as.matrix(as.data.frame(rstan:::extract(dat.zip.brm$fit)))
coefs <- coefs[,grep('b_main', colnames(coefs))]
})
)

geom_point(data=dat.zip, aes(y=y)) +
geom_line() +
theme_classic() +


nX <- ncol(Xmat)
dat.zip.list2 <- with(dat.zip,list(Y=y, X=Xmat,N=nrow(dat.zip), mu=rep(0 ,nX),
Sigma=diag(1.0E-06 ,nX), zeros=rep(0 ,nrow(dat)), C=10000 ))
modelString="
model {
for (i in 1:N) {
ll[i] <- Y[i]*log(lambda[i]) - lambda[i] - loggam(Y[i]+1)
llm[i] <- Y[i]*log(meanlambda) - meanlambda - loggam(Y[i]+1)
}
meanlambda <- mean(lambda)
dev <- sum(-2*ll)
pD <- mean(dev)-sum(-2*llm)
AIC <- min(dev+(2*pD))
}
"
library(R2jags)
system.time(
dat.ZIP.jags3 <- jags(model=textConnection(modelString),
data=dat.zip.list2, inits=NULL,
param=c('beta','dev','AIC'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 328
Initializing model
user system elapsed
1.832 0.000 1.839
print(dat.ZIP.jags3)
Inference for Bugs model at "5", fit using jags,
AIC 1.215e+02 4.532 117.557 1.185e+02 1.200e+02 1.229e+02 1.336e+02 1.001 2400
beta[1] 6.440e-01 0.336 -0.045 4.270e-01 6.580e-01 8.750e-01 1.252e+00 1.005 480
beta[2] 3.000e-02 0.027 -0.022 1.100e-02 3.000e-02 4.700e-02 8.600e-02 1.005 480
dev 1.220e+02 2.038 120.044 1.206e+02 1.214e+02 1.228e+02 1.274e+02 1.001 2400
deviance 4.001e+05 2.038 400120.044 4.001e+05 4.001e+05 4.001e+05 4.001e+05 1.000 1


pD = 2.1 and DIC = 400124.1
Section 5: Zero-inflated negative binomial (ZINB)


positive integers.
set.seed(37 ) #34.5 #4 #10 #16 #17 #26

n.x <- 20
intercept <- 0.6
slope=0.1
#Predicted y values
#fixed latent binomial
y<- rZINBI(n.x,lambda, 0.4 )

#latent binomial influenced by the linear predictor
#y<- rZINB(n.x,lambda, 1-exp(linpred)/(1+exp(linpred)))
dat.zinb <- data.frame(y,x)
summary(dat.glm.nb<-glm.nb(y~x, dat.zinb))
Call:
glm.nb(formula = y ~ x, data = dat.zinb, init.theta = 0.4646673144,
link = log)
Deviance Residuals:
-1.3578 -1.3455 -0.5069 0.3790 1.1809
Coefficients:
(Intercept) 0.914191 0.796804 1.147 0.251
x 0.009149 0.067713 0.135 0.893
(Dispersion parameter for Negative Binomial(0.4647) family taken to be 1)
Null deviance: 20.303 on 19 degrees of freedom

Residual deviance: 20.282 on 18 degrees of freedom
AIC: 90.365
Number of Fisher Scoring iterations: 1
Theta: 0.465
Std. Err.: 0.218
2 x log-likelihood: -84.365
plot(glm.nb(y~x, dat.zinb))
library(pscl)
summary(dat.zeroinfl<-zeroinfl(y ~ x | 1 , dist = "negbin", data = dat.zinb))
Call:
zeroinfl(formula = y ~ x | 1, data = dat.zinb, dist = "negbin")
Pearson residuals:
-0.9609 -0.9268 -0.4446 1.0425 1.7556
Count model coefficients (negbin with log link):

(Intercept) 0.92733 0.32507 2.853 0.00433 **
x 0.06870 0.02755 2.494 0.01263 *
Log(theta) 3.36066 3.59739 0.934 0.35020
Zero-inflation model coefficients (binomial with logit link):

(Intercept) -0.2250 0.4559 -0.494 0.622
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta = 28.8082
Number of iterations in BFGS optimization: 17
Log-likelihood: -38.54 on 4 Df
plot(resid(zeroinfl(y ~ x | 1 , dist = "negbin", data = dat.zinb))~fitted(zeroinfl(y ~ x | 1 , dist =

"negbin", data = dat.zinb)))
vuong(dat.glm.nb, dat.zeroinfl)
Vuong Non-Nested Hypothesis Test-Statistic:
(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
-------------------------------------------------------------
Vuong z-statistic H_A p-value
Raw -1.2809521 model2 > model1 0.10011
AIC-corrected -0.9296587 model2 > model1 0.17627
BIC-corrected -0.7547616 model2 > model1 0.22520
library(gamlss)
summary(gamlss(y~x, data=dat.zinb, family='ZINBI'))
*******************************************************************
Family: c("ZINBI", "Zero inflated negative binomial type I")
Call: gamlss(formula = y ~ x, family = "ZINBI", data = dat.zinb)
-------------------------------------------------------------------
Mu Coefficients:
(Intercept) 0.92653 0.32502 2.851 0.0116 *
x 0.06880 0.02753 2.499 0.0237 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-------------------------------------------------------------------
Sigma link function: log

Sigma Coefficients:
(Intercept) -3.363 3.603 -0.933 0.365
-------------------------------------------------------------------
Nu link function: logit
Nu Coefficients:
(Intercept) -0.2250 0.4559 -0.494 0.628
-------------------------------------------------------------------
at cycle: 5

AIC: 85.0725
SBC: 89.05543
*******************************************************************
summary(gamlss(y~x, nu.fo=y~x,data=dat.zinb, family='ZINBI'))

*******************************************************************
Family: c("ZINBI", "Zero inflated negative binomial type I")
Call: gamlss(formula = y ~ x, nu.formula = y ~ x, family = "ZINBI", data = dat.zinb)
-------------------------------------------------------------------
Mu Coefficients:
(Intercept) 0.84246 0.35267 2.389 0.0305 *
x 0.07481 0.02933 2.550 0.0222 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-------------------------------------------------------------------
Sigma link function: log
Sigma Coefficients:
(Intercept) -2.982 2.844 -1.048 0.311
-------------------------------------------------------------------
Nu link function: logit
Nu Coefficients:
(Intercept) -2.4988 1.8283 -1.367 0.192
x 0.1996 0.1417 1.408 0.179
-------------------------------------------------------------------
at cycle: 5

AIC: 83.83046
SBC: 88.80912
*******************************************************************
library(INLA)
summary(inla(y~x,data=dat.zinb, family='zeroinflatednbinomial1'))
Call:
"inla(formula = y ~ x, family = \"zeroinflatednbinomial1\", data = dat.zinb)"
Time used:
Pre-processing Running inla Post-processing Total
0.1669 0.0427 0.0285 0.2380
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 0.9374 0.7147 -0.4199 0.9152 2.4282 0.8816 0
x 0.0259 0.0625 -0.0982 0.0257 0.1508 0.0254 0
The model has no random effects
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant
mode
size for nbinomial zero-inflated observations 0.6180 0.3059 0.1785 0.5699 1.3497
0.4477
zero-probability parameter for zero-inflated nbinomial_1 0.1743 0.1284 0.0142 0.1456 0.4896
0.0431
Expected number of effective parameters(std dev): 2.00(0.00)

Number of equivalent replicates : 9.999
Marginal log-Likelihood: -49.48

analysis

hist(dat.zinb$y)
boxplot(dat.zinb$y, horizontal=TRUE )
rug(jitter(dat.zinb$y))
There is definitely signs of non-normality that would warrant Poisson or negative binomial models. Further to that, there
appears to be a large number of zeros and a possible clumpiness that are likely to be the cause of overdispersion A zero-
inflated negative binomial model is likely to be one of the most effective for modeling these data.

Lets now explore linearity by creating a histogram of the predictor variable ($x$). Note, it is difficult to directly assess issues of
linearity. Indeed, a scatterplot with lowess smoother will be largely influenced by the presence of zeros. One possible way of
doing so is to explore the trend in the non-zero data.
hist(dat.zinb$x)

plot(y~x, dat.zinb, log="y")
with(subset(dat.zinb,y>0 ), lines(lowess(y~x)))
smoother on the non-zero data cloud does not display major deviations from a straight line and thus linearity is likely to be
satisfied. Violations of linearity (whilst difficult to be certain about due to the unknown influence of the zeros) could be
addressed by either:


dat.zinb.tab<-table(dat.zinb$y==0 )
dat.zinb.tab/sum(dat.zinb.tab)
FALSE TRUE
0.55 0.45

mu <- mean(dat.zinb$y)
v <- var(dat.zinb$y)
size <- mu + (mu^2 )/v
cnts <- rnbinom(1000 , mu=mu, size=size)
dat.zinb.tabE <- table(cnts == 0 )
dat.zinb.tabE/sum(dat.zinb.tabE)
FALSE TRUE
0.86 0.14
proportion of zeros in the data. In this example, the proportion of zeros observed (45%) far exceeds that that would have been
expected (14%). Hence it is highly likely that any models will be zero-inflated.

The exploratory data analyses have suggested that a zero-inflated Poisson model might be the most appropriate for these data.
Fit negative binomial model and examine dispersion statistic
dat.zinb.list <- with(dat.zinb,list(Y=y, X=x,N=nrow(dat.zinb)))

modelString="
model {
for (i in 1:N) {
p[i] <- size/(size+mu[i])
log(mu[i]) <- eta[i]
}
size ~ dunif(0.001, 5)
}
"
#writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS5.4.bug')
library(R2jags)
system.time(
dat.zinb.jags <- jags(model.file=textConnection(modelString), data=dat.zinb.list, inits=NULL,
param=c('beta0','beta1', 'size', 'theta'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 148
Initializing model
user system elapsed
11.001 0.004 11.021
print(dat.zinb.jags)
beta0 1.009 0.826 -0.441 0.450 0.953 1.492 2.859 1.001 2300
beta1 0.010 0.069 -0.125 -0.036 0.010 0.052 0.148 1.001 2000
size 0.545 0.297 0.183 0.348 0.478 0.674 1.237 1.002 1500
deviance 87.531 2.572 84.591 85.630 86.886 88.682 94.252 1.001 3000


pD = 3.3 and DIC = 90.8

coefs <- dat.zinb.jags$BUGSoutput$sims.matrix[,1 :2 ]
lambda <- exp(eta)
Resid <- -1 *sweep(lambda,2 ,dat.zinb$y, '-')/sqrt(lambda)

Disp <- RSS/(nrow(dat.zinb)-ncol(coefs))
var1 5.057138 5.85636 3.303402 10.79281 3.304139 5.058315
5.3.1: JAGS
$$ Y_i = \left\{ \begin{array}{lrcl} 0~\text{with}~P(y_i) = 1-\mu & z_i&\sim&\text{Binom}(1-\theta)\\ &
logit(\theta)&=&\gamma_0\\ &\gamma_0&\sim{}&\mathcal{N}(0,10000)\\ >0 & Y_i&\sim&\text{NB}(\lambda_i)\\
&\lambda_i&=&z_i + \eta_i\\ &log(\eta_i)&=&\beta_0+\beta_1x_1\\ &\beta_0, \beta_1&\sim{}&\mathcal{N}(0,10000)\\
\end{array} \right. $$ Or in shorthand: $$ \begin{align} Y_i&\sim{}ZINB(\lambda,\theta) & (\text{response distribution})\\
logit(\theta)&=\gamma_0 & (\text{link function/linear predictor - zero component})\\ log(\lambda_i)&=\eta_i & (\text{link
function - count component})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor - count component})\\ \beta_0,
\beta_1, \gamma_0&\sim{}\mathcal{N}(0,10000) & (\text{diffuse Bayesian priors})\\ \end{align} $$
dat.zinb.list <- with(dat.zinb,list(Y=y, X=x,N=nrow(dat.zinb),z=ifelse(y==0 ,0 ,1 )))
modelString="
model {
for (i in 1:N) {
z[i] ~ dbern(psi.min)
p[i] <- size/(size+mu.eff[i])
mu.eff[i] <- z[i]*mu[i]
}
gamma ~ dnorm(0,0.001)
psi.min <- min(0.9999, max(0.00001, (1-psi)))
logit(psi) <- max(-20, min(20, gamma))
}
"
system.time(
dat.zinb.jags <- jags(model.file=textConnection(modelString), data=dat.zinb.list, inits=NULL,
param=c('beta0','beta1', 'size', 'theta'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 205
Initializing model
user system elapsed
12.241 0.000 12.273
beta0 0.961 0.449 0.077 0.661 0.962 1.258 1.851 1.001 2300
beta1 0.067 0.041 -0.009 0.039 0.066 0.095 0.152 1.001 3000
size 3.503 1.010 1.402 2.781 3.647 4.348 4.930 1.001 3000
theta 2.196 0.367 1.718 1.938 2.109 2.363 3.145 1.001 3000
deviance 82.682 2.814 79.127 80.610 82.086 84.011 89.900 1.001 3000


pD = 4.0 and DIC = 86.6

nX <- ncol(Xmat)
dat.zip.list1 <- with(dat.zip,list(Y=y, X=Xmat,N=nrow(dat.zip), nX=nX, z=ifelse(y==0 ,0 ,1 )))
modelString="
model {
for (i in 1:N) {
log(mu[i]) <- inprod(beta[], X[i,])
}
for (i in 1:nX) {
}
}
"
library(R2jags)
system.time(
dat.zip.jags1 <- jags(model.file=textConnection(modelString), data=dat.zip.list1, inits=NULL,
param=c('beta','gamma0'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 227
Initializing model
user system elapsed
12.389 0.004 12.411
beta[1] 0.753 0.453 -0.143 0.448 0.743 1.069 1.633 1.001 3000
beta[2] 0.084 0.041 0.006 0.056 0.084 0.111 0.167 1.001 3000
deviance 80.373 2.924 76.666 78.234 79.724 81.807 87.799 1.001 3000


pD = 4.3 and DIC = 84.7

nX <- ncol(Xmat)
modelString="
model {
for (i in 1:N) {
mu.eff[i] <- z[i]*eta[i]
}
}
"
library(R2jags)
system.time(
dat.zip.jags2 <- jags(model.file=textConnection(modelString), data=dat.zip.list2, inits=NULL,
param=c('beta','gamma','size','theta'),
n.chain=3 ,
)
Allocating nodes
Graph Size: 231
Initializing model
user system elapsed
5.220 0.004 5.234
beta[1] 0.757 0.453 -0.155 0.453 0.772 1.063 1.620 1.001 3000
beta[2] 0.084 0.041 0.007 0.056 0.084 0.112 0.168 1.001 3000
gamma -0.201 0.455 -1.131 -0.505 -0.186 0.104 0.655 1.003 940
size 3.661 0.954 1.563 3.006 3.830 4.462 4.946 1.002 1300
theta 2.104 0.320 1.673 1.884 2.039 2.249 2.938 1.003 850
deviance 80.435 2.902 76.659 78.327 79.800 81.843 87.741 1.002 1300


pD = 4.2 and DIC = 84.6
dat.zinb.brm <- brm(y~0 +spec + main + main:x, data=dat.zinb, family='zero_inflated_negbinomial',

prior = c(set_prior("normal(0,10)", class="b", coef="main"),
set_prior("normal(0,100)", class="b", coef="spec"),
set_prior("student_t(3,0,5)", class='shape')),
SAMPLING FOR MODEL 'zero_inflated_negbinomial(log) brms-model' NOW (CHAIN 1).

#

#

#
## Or if we allow the zero-inflation model and the count model to vary with x
dat.zinb.brm1 <- brm(y~trait:x, data=dat.zinb, family='zero_inflated_negbinomial',
prior = c(set_prior("normal(0,100)", class="Intercept"),
set_prior("normal(0,10)", class="b", coef="traity:x"),
set_prior("normal(0,10)", class="b", coef="traitzi_y:x"),

#


#

independent.
plot(as.mcmc(dat.zinb.jags))
library(gridExtra)
grid.arrange(stan_trace(dat.zinb.brm$fit, ncol=1 ),
stan_dens(dat.zinb.brm$fit, separate_chains=TRUE ,ncol=1 ),

ncol=2 )

autocorr.diag(as.mcmc(dat.zinb.jags))
beta0 beta1 deviance size theta
Lag 0 1.000000000 1.000000000 1.000000000 1.000000000 1.00000000
Lag 10 0.157707113 0.164120975 0.046293478 0.005759824 -0.01817507
Lag 50 -0.000258746 0.023917373 -0.002086921 0.005145297 -0.01124115
Lag 100 0.001276269 0.002424773 -0.017995644 -0.035551481 -0.03951204
Lag 500 -0.007191282 0.005616019 -0.021469518 -0.044121841 -0.05450130
iterations.
library(R2jags)
modelString="
model {
for (i in 1:N) {
}
}
"
dat.zinb.jags <- jags(data=dat.zinb.list,model.file=textConnection(modelString),
param=c('beta0','beta1','gamma','theta','psi'),
Allocating nodes
Graph Size: 205
Initializing model
beta0 0.973 0.453 0.058 0.679 0.961 1.269 1.882 1.001 3000
beta1 0.066 0.042 -0.014 0.037 0.065 0.094 0.151 1.001 3000
gamma -0.210 0.462 -1.133 -0.521 -0.202 0.090 0.674 1.002 1200
psi 0.450 0.109 0.244 0.373 0.450 0.522 0.662 1.003 1000
theta 2.190 0.363 1.724 1.939 2.102 2.360 3.133 1.001 3000
deviance 82.759 2.824 79.157 80.658 82.132 84.199 89.605 1.001 3000


pD = 4.0 and DIC = 86.7
plot(as.mcmc(dat.zinb.jags))
autocorr.diag(as.mcmc(dat.zinb.jags))
beta0 beta1 deviance gamma psi theta
Lag 0 1.000000000 1.0000000000 1.000000000 1.000000000 1.000000000 1.0000000000
Lag 50 0.008803828 0.0071072063 0.009745184 0.041114460 0.041195791 -0.0009151277
Lag 250 0.003371794 0.0007720164 -0.006544352 0.012836461 0.014694728 0.0298401324
Lag 500 -0.027820267 -0.0299358791 0.004795827 -0.005996009 -0.007053143 0.0094320909
Lag 2500 0.015181957 0.0200111646 -0.024418362 -0.019640184 -0.020398216 -0.0171605926
stan_ac(dat.zinb.brm$fit)
#these samples are highly correlated, lets thin more...

dat.zinb.brm <- brm(y~0 +spec + main + main:x, data=dat.zinb, family='zero_inflated_negbinomial',
prior = c(set_prior("normal(0,10)", class="b", coef="main"),
set_prior("normal(0,100)", class="b", coef="spec"),


#

#

#
stan_ac(dat.zinb.brm$fit)
Conclusions: the samples are now less auto-correlated and the chains are all well mixed and appear stable.
summary(do.call(rbind, args = get_sampler_params(dat.zip.brm$fit, inc_warmup = FALSE )), digits =

2 )
stan_diag(dat.zinb.brm$fit)
stan_diag(dat.zinb.brm$fit, information = "stepsize")
stan_diag(dat.zinb.brm$fit, information = "treedepth")
stan_diag(dat.zinb.brm$fit, information = "divergence")
library(gridExtra)
grid.arrange(stan_rhat(dat.zinb.brm$fit) + theme_classic(8 ),
stan_ess(dat.zinb.brm$fit) + theme_classic(8 ),
stan_mcse(dat.zinb.brm$fit) + theme_classic(8 ),
ncol = 2 )
all below 1.05, the effective sample size is not too bad (we might be able to achieve better acceptance ratios with tighter
priors) and the ratio of MCMC error to posterior sd is very small.
theta <- dat.zinb.jags$BUGSoutput$sims.matrix[,'psi']
Xmat <- model.matrix(~x, data=dat.zinb)
lambda<-coefs %*% t(Xmat)
eta <- exp(lambda)
expY <- sweep(eta,1 ,(1 -theta),"*")
varY <- eta+sweep(eta^2 ,1 ,theta,"*")
head(varY)
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20
[1,] 5.343355 6.056455 6.710940 6.782673 7.454279 9.591883 9.886422 10.142129 10.857296
12.334728 13.53727 14.42609 16.15161 18.32106 18.55017 21.24417 21.39110 21.82541 22.82726
26.16783
[2,] 6.136290 8.213971 10.463062 10.730368 13.440247 24.739116 26.636896 28.354937 33.515421
45.900953 57.78469 67.65739 89.58710 122.59671 126.45037 177.25918 180.32847 189.58587 212.00514
297.82859
[3,] 11.086488 11.972272 12.745761 12.828478 13.585024 15.812014 16.100779 16.348390 17.026571
18.367469 19.40679 20.14872 21.53308 23.18303 23.35205 25.27383 25.37541 25.67387 26.35240
28.52397
[4,] 5.390940 5.903767 6.358283 6.407248 6.858209 8.216792 8.396190 8.550587 8.976105
9.828478 10.49885 10.98238 11.89530 13.00082 13.11511 14.42760 14.49762 14.70375 15.17440
16.69918
[5,] 5.011368 6.014766 6.990161 7.100137 8.158381 11.852450 12.398287 12.879089 14.257358
17.255958 19.84063 21.83053 25.87864 31.29833 31.89134 39.14718 39.55753 40.77912 43.64564
53.67698
[6,] 20.534217 22.189351 23.632622 23.786853 25.196463 29.335264 29.870784 30.329775 31.585938
34.065499 35.98370 37.35109 39.89831 42.92726 43.23714 46.75510 46.94078 47.48620 48.72534
52.68347
varY <- sweep(varY,1 ,(1 -theta),'*')

Resid <- -1 *sweep(expY,2 ,dat.zinb$y,'-')/sqrt(varY)
coefs.zinb <- as.matrix(as.data.frame(rstan:::extract(dat.zinb.brm$fit)))
coefs.zinb <- coefs.zinb[,grep('b_main', colnames(coefs.zinb))]

shape <- as.matrix(as.data.frame(rstan:::extract(dat.zinb.brm$fit,'shape')))
lambda.zinb<-coefs.zinb %*% t(Xmat)

eta.zinb <- exp(lambda.zinb)
expY.zinb <- sweep(eta.zinb,1 ,(1 -shape),"*")
varY.zinb <- eta.zinb+sweep(eta.zinb^2 ,1 ,shape,"*")
varY.zinb <- sweep(varY.zinb,1 ,(1 -theta),'*')

Resid.zinb <- -1 *sweep(expY.zinb,2 ,dat.zinb$y,'-')/sqrt(varY.zinb)
ggplot(data=NULL, aes(y=apply(Resid.zinb,2 ,mean), x=apply(eta.zinb,2 ,mean))) + geom_point()

When doing so, we need to consider the expected value and variance of the zero-inflated poisson. $$ E(y_i) =
\lambda\times(1-\theta)\\ Var(y_i) = \lambda\times(1-\theta)\times(1+\theta\times\lambda^2) $$
set.seed(2 )
#generate a matrix of draws from a zero-inflated poisson (ZINB) distribution
#YNew <- matrix(rZINB(length(lambda),eta, theta),nrow=nrow(lambda))
lambda <- sweep(eta,1 ,ifelse(dat.zinb$y==0 ,0 ,1 ),'*')
YNew <- matrix(rpois(length(lambda),lambda),nrow=nrow(lambda))
Resid1<-(expY - YNew)/sqrt(varY)
[1] 0.349
Whilst not ideal (as we would prefer a Bayesian P-value of around 0.5), this value is not wildly bad and does not constitute
overwhelming evidence of a lack of fit.
Xmat <- model.matrix(~x, dat.zinb)

nX <- ncol(Xmat)
dat.zinb.list2 <- with(dat.zinb,list(Y=y, X=Xmat,N=nrow(dat.zinb), nX=nX,
modelString="
model {
for (i in 1:N) {
mu.eff[i] <- z[i]*eta[i]
log(eta[i]) <- beta0 + beta1*X[i]
expY[i] <- eta[i]*(1-psi)

varY[i] <- (1-psi)*(eta[i]*psi*pow(eta[i],2))
Y1[i] ~ dnegbin(p[i], size)
}
}
"
library(R2jags)
system.time(
dat.zinb.jags3 <- jags(model='../downloads/BUGSscripts/tut11.5bS5.13.bug', data=dat.zinb.list2,
inits=NULL,
param=c('beta', 'gamma0','Pvalue'),
n.chain=3 ,
)
Allocating nodes
Deleting model
Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, : RUNTIME
ERROR:
Compilation error on line 5.
Unable to resolve node p[1]
This may be due to an undefined ancestor node or a directed cycle in the graph
Timing stopped at: 0.012 0 0.009
print(dat.zinb.jags3)
Error in print(dat.zinb.jags3): error in evaluating the argument 'x' in selecting a method for
function 'print': Error: object 'dat.zinb.jags3' not found
SSres.zinb<-apply(Resid.zinb^2 ,1 ,sum)
set.seed(2 )
#generate a matrix of draws from a zero-inflated poisson (ZINB) distribution
#YNew <- matrix(rZINB(length(lambda),eta, theta),nrow=nrow(lambda))
lambda.zinb <- sweep(eta.zinb,1 ,ifelse(dat.zinb$y==0 ,0 ,1 ),'*')
YNew.zinb <- matrix(MASS:::rnegbin(length(lambda.zinb),lambda.zinb, theta=mean(shape)),
nrow=nrow(lambda.zinb))
Resid1.zinb<-(expY.zinb - YNew.zinb)/sqrt(varY.zinb)
SSres.sim<-apply(Resid1.zinb^2 ,1 ,sum)
mean(SSres.sim>SSres.zinb)
[1] 0.3206667
Conclusions: the Bayesian p-value is very close to 0.5, suggesting that there is a good fit of the model to the data.
Since it is difficult to diagnose many issues from the typical residuals we will now explore simulated residuals.
colnames(dat.zinb.jags$BUGSoutput$sims.matrix)
[1] "beta0" "beta1" "deviance" "gamma" "psi" "theta"

theta <- dat.zinb.jags$BUGSoutput$sims.matrix[,'theta']
size <- dat.zinb.jags$BUGSoutput$sims.matrix[,'psi']
lambda <- exp(eta)

require(gap)
N = nrow(data)
'zip' = matrix(gamlss.dist:::rZIP(n*N,apply(lambda,2 ,mean),theta),ncol=N, byrow=TRUE ),
'zinb' = matrix(gamlss.dist:::rZINBI(n*N,apply(lambda,2 ,mean),sigma=theta,nu=size),ncol=N,
byrow=TRUE )
)
resid<-NULL
if (plot==T) {
par(mfrow=c(1 ,2 ))
}
resid
}
simRes(lambda,dat.zinb, family='zinb',theta=theta, size=size)
[1] 0.5533333 0.8466667 0.6400000 0.8333333 0.8733333 0.6266667 0.9333333 0.8933333 0.7200000
0.1600000 0.4933333 0.3866667 0.3866667 0.7800000 0.0200000 0.2000000 0.8533333 0.2133333 0.3800000
0.8600000
close to 1 or 0 (which would imply overdispersion). Hence, once zero-inflation is accounted for, the model does not display
overdispersion.
Although there is a slight hint of non-linearity in that the residuals are high for low and high fitted values and lower in the
middle, this might well be an artifact of the small data set size. By change, most of the observed values in the middle range
of the predictor were zero.
5.5: Exploring the model parameters, inference

beta0 0.973 0.453 0.058 0.679 0.961 1.269 1.882 1.001 3000
beta1 0.066 0.042 -0.014 0.037 0.065 0.094 0.151 1.001 3000
gamma -0.210 0.462 -1.133 -0.521 -0.202 0.090 0.674 1.002 1200
psi 0.450 0.109 0.244 0.373 0.450 0.522 0.662 1.003 1000
theta 2.190 0.363 1.724 1.939 2.102 2.360 3.133 1.001 3000
deviance 82.759 2.824 79.157 80.658 82.132 84.199 89.605 1.001 3000


pD = 4.0 and DIC = 86.7
library(plyr)
adply(dat.zinb.jags$BUGSoutput$sims.matrix, 2 , function(x) {
})
1 beta0 0.96103123 0.97276291 0.13961096 1.9515909 0.66801095 1.25524602
2 beta1 0.06467899 0.06570763 -0.01525967 0.1486955 0.03618379 0.09232641
3 deviance 82.13216622 82.75885652 78.74846503 88.3394029 79.66753681 82.68026059
4 gamma -0.20172349 -0.21028082 -1.12438237 0.6754259 -0.52792406 0.07915285
5 psi 0.44973945 0.45021792 0.24269534 0.6600966 0.37100120 0.51977789
6 theta 2.10185552 2.18996772 1.62902894 2.9316514 1.83725582 2.19604146
scale.
library(plyr)
adply(exp(dat.zinb.jags$BUGSoutput$sims.matrix[,1 :2 ]), 2 , function(x) {
})
1 beta0 2.614391 2.933031 0.8237093 5.78519 1.615310 3.052375
2 beta1 1.066817 1.068866 0.9848562 1.16032 1.034969 1.094738
(e^{0.064679}=1.0668165) 1.0668165 unit increase in the abundance of $y$.

Alternatively, we could use McFadden's psuedo $$R^2 = 1-\frac{\mathcal{LL}(Model_{full})}{\mathcal{LL}
(Model_{reduced})}$$ [[http://www.statisticalhorizons.com/category/uncategorized]] - about a quarter of the way down the
page.
Xmat <- model.matrix(~x, dat=dat.zinb)

neta<-coefs %*% t(Xmat)
eta <- exp(neta)
lambda <- sweep(eta,2 ,ifelse(dat.zinb$y==0 ,0 ,1 ),'*')
SSres <- apply((-1 *(sweep(expY,2 ,dat.zinb$y,'-')))^2 ,1 ,sum)
SSres.null <- sum((dat.zinb$y - mean(dat.zinb$y))^2 )
[1] 0.3869893
Section 6: Worked Examples
Basic χ2 references
Logan (2010) - Chpt 16-17
Quinn & Keough (2002) - Chpt 13-14
Example 1: Poisson t-test

A marine ecologist was interested in examining the effects of wave exposure on the abundance of the striped limpet Siphonaria
diemenensis on rocky intertidal shores. To do so, a single quadrat was placed on 10 exposed (to large waves) shores and 10
sheltered shores. From each quadrat, the number of Siphonaria diemenensis were counted.
Download Limpets data set
Format of limpets.csv data files
Count Shore
1 sheltered
3 sheltered
2 sheltered
1 sheltered
4 sheltered
... ...
Shore Categorical description of the shore type (sheltered or exposed) -

Predictor variable
Count Number of limpets per quadrat - Response variable
Open the limpet data set.
Show code
limpets <- read.table('../downloads/data/limpets.csv', header=T, sep=',', strip.white=T)
limpets
Count Shore
1 2 sheltered
2 2 sheltered
3 0 sheltered
4 6 sheltered
5 0 sheltered
6 3 sheltered
7 3 sheltered
8 0 sheltered
9 1 sheltered
10 2 sheltered
11 5 exposed
12 3 exposed
13 6 exposed
14 3 exposed
15 4 exposed
16 7 exposed
17 10 exposed
18 3 exposed
19 5 exposed
20 2 exposed
Q1.1. Is there any evidence that these assumptions have been violated?
Show code
boxplot(Count~Shore, data=limpets)
library(vcd)
fit <- goodfit(limpets$Count, type='poisson')
summary(fit)
Goodness-of-fit test for poisson distribution
X^2 df P(> X^2)

Likelihood Ratio 14.03661 7 0.05053407
rootogram(fit)
fit <- goodfit(limpets$Count, type='nbinom')

summary(fit)
Goodness-of-fit test for nbinomial distribution
X^2 df P(> X^2)

rootogram(fit)
Ord_plot(limpets$Count, tol=0.2 )
distplot(limpets$Count, type='poisson')
## Although we could argue that NB more appropriate than Poisson, this is possibly
## due to the small sample size. Small samples are more varied than the populations
## from which they are drawn
a. The assumption of normality has been violated?
b. The assumption of homogeneity of variance has been violated?
c. Notwithstanding the very small sample size, is a Poisson distribution appropriate?
At this point we could transform the data in an attempt to satisfy normality and homogeneity of variance. However, the
analyses are then not on the scale of the data and thus the conclusions also pertain to a different scale. Furthermore, linear
modelling on transformed count data is generally not as effective as modelling count data with a Poisson distribution.
Q1.2. Lets instead we fit the same model with a Poisson distribution. $$ \begin{align} Y_i&\sim{}P(\lambda) &
(\text{response distribution})\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 X_i &

(\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,100) & (\text{week Bayesian prior})\\ \end{align} $$
Show BRMS code

limpets.brm <- brm(Count~Shore, family=poisson, data=limpets,
set_prior("normal(0,100)", class="b")),

#

#

#
Q1.3. Explore the chain mixing diagnostics.

a. Trace plots
Show code
library(gridExtra)
grid.arrange(stan_trace(limpets.brm$fit, ncol=1 ),
stan_dens(limpets.brm$fit, separate_chains=TRUE ,ncol=1 ),
ncol=2 )
b. Autocorrelation
Show code
stan_ac(limpets.brm$fit)
c. Step size characteristics (STAN only)
Show code
summary(do.call(rbind, args = get_sampler_params(limpets.brm$fit, inc_warmup = FALSE )), digits = 2 )
stan_diag(limpets.brm$fit)
stan_diag(limpets.brm$fit, information = "stepsize")
stan_diag(limpets.brm$fit, information = "treedepth")
stan_diag(limpets.brm$fit, information = "divergence")
library(gridExtra)
grid.arrange(stan_rhat(limpets.brm$fit) + theme_classic(8 ),
stan_ess(limpets.brm$fit) + theme_classic(8 ),
stan_mcse(limpets.brm$fit) + theme_classic(8 ),
ncol = 2 )
Q1.4. Explore the model fit diagnostics.

a. Explore the patterns in simulated residuals:
Show code
lambda.brm = fitted(limpets.brm, scale='response', summary=FALSE )
simRes <- function(lambda, Y,n=250 , plot=T, family='negbin', size=NULL,theta=NULL) {

require(gap)
N = length(Y)
byrow=TRUE )
)
resid<-NULL
for (i in 1 :length(Y)) resid<-c(resid,a[[i]](Y[i] + runif(1 ,-0.5 ,0.5 )))
if (plot==T) {
par(mfrow=c(1 ,2 ))

}
resid
}
simRes(lambda.brm, limpets$Count, family='poisson')
[1] 0.636 0.584 0.060 1.000 0.096 0.724 0.736 0.100 0.320 0.672 0.640 0.276 0.736 0.220 0.440 0.772
0.988 0.216 0.520 0.100
b. Goodness of fit test:
Show code
Resid.brm <- residuals(limpets.brm, type='pearson', summary=FALSE )
[1] 0.1686667
c. Fitted and observed values (plot):
Show code
newdata <- data.frame(Shore=levels(limpets$Shore))
tempdata = fitted(limpets.brm, scale='response',newdata=newdata)
tempdata <- cbind(newdata, Mean=tempdata[,'Estimate'], Lower=tempdata[,'2.5%ile'],
Upper=tempdata[,'97.5%ile'])
tempdata
Shore Mean Lower Upper
1 exposed 4.808977 3.582597 6.194391
2 sheltered 1.891051 1.172518 2.870922
library(ggplot2)
ggplot(limpets, aes(y=Count, x=Shore)) + geom_boxplot() +
geom_pointrange(data=tempdata,aes(y=Mean, ymin=Lower, ymax=Upper), color='red')
##OR the manual way

Xmat <- model.matrix(~Shore, data=newdata)
coefs <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))[,1 :2 ]
tempdata <- cbind(newdata, plyr:::adply(fit, 2 , function(x) {
data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
}))
geom_pointrange(data=tempdata,aes(y=Mean, ymin=lower, ymax=upper), color='red')
d. Check for overdispersion:
Show code
coefs <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))[,1 :2 ]
Resid <- residuals(limpets.brm, type='pearson',summary=FALSE )
RSS <- apply(Resid^2 ,1 ,sum)
Disp <- RSS/(nrow(limpets)-ncol(coefs))
data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))
Median Mean lower upper
var1 1.384552 1.415549 1.319196 1.619498
Q1.5. Explore the parameter estimates
Show code
summary(limpets.brm)
Formula: Count ~ Shore
Data: limpets (Number of observations: 20)
WAIC: Not computed
Fixed Effects:
Intercept 1.56 0.14 1.28 1.82 1361 1
Shoresheltered -0.95 0.26 -1.45 -0.43 952 1

exp(coef(limpets.brm))
mean
Intercept 4.761504
Shoresheltered 0.387136
coefs.brm <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))
})
1 b_Intercept 4.8089767 4.7877066 3.4711793 6.0637796
2 b_Shoresheltered 0.4008196 0.3880695 0.2186746 0.6188941
marginal_effects(limpets.brm)
Q1.6. Estimate the strength of the relationship
Show code

SSres.brm <- apply((-1 *(sweep(lambda.brm,2 ,limpets$Count,'-')))^2 ,1 ,sum)
SSres.null <- sum((limpets$Count - mean(limpets$Count))^2 )
#OR
[1] 0.2857745
#OR manually
coefs <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))
coefs <- coefs.brm[,grep('b', colnames(coefs))]
Xmat <- model.matrix(~Shore, data=limpets)
lambda.brm = exp(coefs %*% t(Xmat))

SSres.null <- crossprod(limpets$Count - mean(limpets$Count))
[,1]
[1,] 0.2857745
## Alternatively..
R.var = apply(residuals(limpets.brm, summary=FALSE ),1 ,var)
X.var = apply(fitted(limpets.brm, summary=FALSE ),1 ,var)
R2.marginal <- X.var/(X.var + R.var)
data.frame(Median=median(R2.marginal), coda:::HPDinterval(as.mcmc(R2.marginal)))
Median lower upper
var1 0.3386231 0.1113502 0.5261667
Q1.7. Construct a summary plot
Show code
newdata=data.frame(Shore=levels(limpets$Shore))
coefs <- coefs.brm[,grep('b', colnames(coefs))]
newdata = cbind(newdata, plyr:::adply(fit, 2 , function(x) {
}))
ggplot(limpets, aes(y=Count, x=Shore)) +

geom_point(color='grey') +
geom_pointrange(data=newdata, aes(y=Mean, ymin=lower, ymax=upper)) +
scale_y_continuous('Abundance of limpets') +
scale_x_discrete('Shore type', breaks=c('exposed','sheltered'), labels=c('Exposed','Sheltered')) +
theme_classic() +
theme(axis.line.x=element_line(), axis.line.y=element_line())
Example 2: Poisson ANOVA (regression)

We once again return to a modified example from Quinn and Keough (2002). In Exercise 1 of Workshop 9.4a, we introduced an
experiment by Day and Quinn (1989) that examined how rock surface type affected the recruitment of barnacles to a rocky
shore. The experiment had a single factor, surface type, with 4 treatments or levels: algal species 1 (ALG1), algal species 2
(ALG2), naturally bare surfaces (NB) and artificially scraped bare surfaces (S). There were 5 replicate plots for each surface
type and the response (dependent) variable was the number of newly recruited barnacles on each plot after 4 weeks.
Download Day data set
Format of day.csv data files
TREAT BARNACLE
ALG1 27
.. ..
ALG2 24
.. ..
NB 9
.. ..
S 12
.. ..
TREAT Categorical listing of surface types. ALG1 = algal species 1, ALG2
= algal species 2, NB = naturally bare surface, S = scraped bare
surface.
BARNACLE The number of newly recruited barnacles on each plot after 4
weeks.
Open the day data file.

Show code
day <- read.table('../downloads/data/day.csv', header=T, sep=',', strip.white=T)
head(day)
TREAT BARNACLE
1 ALG1 27
2 ALG1 19
3 ALG1 18
4 ALG1 23
5 ALG1 25
6 ALG2 24
We previously analysed these data with via a classic linear model (ANOVA) as the data appeared to conform to the linear model
assumptions. Alternatively, we could analyse these data with a generalized linear model (Poisson error distribution). Note that
as the boxplots were all fairly symmetrical and equally varied, and the sample means are well away from zero (in fact there are
no zero's in the data) we might suspect that whether we fit the model as a linear or generalized linear model is probably not
going to be of great consequence. Nevertheless, it does then provide a good comparison between the two frameworks.
Q2.1. Using boxplots re-examine the assumptions of normality and homogeneity of variance. Note that when sample sizes
are small (as is the case with this data set), these ANOVA assumptions cannot reliably be checked using boxplots since boxplots
require at least 5 replicates (and preferably more), from which to calculate the median and quartiles. As with regression
analysis, it is the assumption of homogeneity of variances (and in particular, whether there is a relationship between the mean
and variance) that is of most concern for ANOVA.
Show code
boxplot(BARNACLE~TREAT, data=day)
library(vcd)
fit <- goodfit(day$BARNACLE, type='poisson')
summary(fit)
X^2 df P(> X^2)

rootogram(fit)
fit <- goodfit(day$BARNACLE, type='nbinom')

summary(fit)
X^2 df P(> X^2)

rootogram(fit)
distplot(day$BARNACLE, type='poisson')
## Poisson would appear appropriate
Q2.2. Fit the generalized linear model (GLM) relating the number of newly recruited barnacles to substrate type. $$
\begin{align} Y_i&\sim{}P(\lambda) & (\text{response distribution})\\ log(\lambda_i)&=\eta_i & (\text{link function})\\
\eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week
Show BRMS code

day.brm <- brm(BARNACLE~TREAT, family=poisson, data=day,

#

#


#

a. Trace plots
Show code
library(gridExtra)
grid.arrange(stan_trace(day.brm$fit, ncol=1 ),
stan_dens(day.brm$fit, separate_chains=TRUE ,ncol=1 ),
ncol=2 )
b. Autocorrelation
Show code
stan_ac(day.brm$fit)
Show code
summary(do.call(rbind, args = get_sampler_params(day.brm$fit, inc_warmup = FALSE )), digits = 2 )
stan_diag(day.brm$fit)
stan_diag(day.brm$fit, information = "stepsize")
stan_diag(day.brm$fit, information = "treedepth")
stan_diag(day.brm$fit, information = "divergence")
library(gridExtra)
grid.arrange(stan_rhat(day.brm$fit) + theme_classic(8 ),
stan_ess(day.brm$fit) + theme_classic(8 ),
stan_mcse(day.brm$fit) + theme_classic(8 ),
ncol = 2 )

Show code
lambda.brm = fitted(day.brm, scale='response', summary=FALSE )

require(gap)
N = length(Y)
byrow=TRUE )
)
resid<-NULL
if (plot==T) {
par(mfrow=c(1 ,2 ))
}
resid
}
simRes(lambda.brm, day$BARNACLE, family='poisson')
[1] 0.804 0.280 0.192 0.572 0.676 0.224 0.816 0.416 0.332 0.780 0.040 0.368 0.724 0.484 0.944 0.376
0.116 0.672 0.952 0.320
Show code
Resid.brm <- residuals(day.brm, type='pearson', summary=FALSE )
[1] 0.5986667
Show code
newdata <- data.frame(TREAT=levels(day$TREAT))
tempdata = fitted(day.brm, scale='response',newdata=newdata)
tempdata
TREAT Mean Lower Upper
1 ALG1 22.42653 18.50291 26.69970
2 ALG2 28.34577 23.75776 33.21707
3 NB 14.93645 11.56363 18.37661
4 S 13.15741 10.15435 16.54420
library(ggplot2)
ggplot(day, aes(y=BARNACLE, x=TREAT)) + geom_boxplot() +
##OR the manual way

Xmat <- model.matrix(~TREAT, data=newdata)
coefs <- as.matrix(as.data.frame(rstan:::extract(day.brm$fit)))
coefs <- coefs[,grep('^b_', colnames(coefs))]

}))
ggplot(day, aes(y=BARNACLE, x=TREAT)) + geom_boxplot() +
Show code
Resid <- residuals(day.brm, type='pearson',summary=FALSE )
Disp <- RSS/(nrow(day)-ncol(coefs))
var1 1.072787 1.110258 0.9095692 1.39219
Q2.5. Explore the parameter estimates
Show code
summary(day.brm)
Formula: BARNACLE ~ TREAT
Data: day (Number of observations: 20)
WAIC: Not computed
Fixed Effects:
Intercept 3.11 0.09 2.92 3.28 1099 1.00
TREATALG2 0.23 0.13 -0.01 0.48 1026 1.00
TREATNB -0.41 0.15 -0.72 -0.12 818 1.01
TREATS -0.54 0.16 -0.84 -0.22 721 1.00

exp(coef(day.brm))
mean
Intercept 22.3302880
TREATALG2 1.2647170
TREATNB 0.6643044
TREATS 0.5845021
coefs.brm <- as.matrix(as.data.frame(rstan:::extract(day.brm$fit)))

})
1 b_Intercept 22.4265258 22.3563112 18.7801517 26.9246921
2 b_TREATALG2 1.2750621 1.2633289 0.9820844 1.5952722
3 b_TREATNB 0.6718217 0.6643929 0.4654365 0.8588770
4 b_TREATS 0.5919046 0.5845745 0.4160054 0.7742567
marginal_effects(day.brm)
Show code
SSres.brm <- apply((-1 *(sweep(lambda.brm,2 ,day$BARNACLE,'-')))^2 ,1 ,sum)
SSres.null <- sum((day$BARNACLE - mean(day$BARNACLE))^2 )
#OR
[1] 0.635007
#OR manually
Xmat <- model.matrix(~TREAT, data=day)

SSres.brm <- apply((-1 *(sweep(lambda.brm,2 ,day$BARNACLE,'-')))^2 ,1 ,sum)
SSres.null <- crossprod(day$BARNACLE - mean(day$BARNACLE))
[,1]
[1,] 0.635007
## Alternatively..
R.var = apply(residuals(day.brm, summary=FALSE ),1 ,var)
X.var = apply(fitted(day.brm, summary=FALSE ),1 ,var)
Median lower upper
var1 0.6985856 0.5149793 0.7759515
Q2.7. Although we have now established that there is a statistical difference between the group means, we do not yet know
which group(s) are different from which other(s). To address this we can explore all pairwise contrasts from the posterior.
Note, careful when back transforming from log. Recall that log laws indicate that $log(a) - log(b) = log(a/b)$.
Show code
coefs <-as.matrix(as.data.frame(rstan:::extract(day.brm$fit)))
coefs <- coefs[,grep('b_', colnames(coefs))]
library(multcomp)
tuk.mat <- contrMat(n=table(newdata$TREAT), type="Tukey")
Xmat <- tuk.mat %*% Xmat
fit <- exp(coefs %*% t(Xmat))
library(coda)
MCMCsum <- function(x) {
data.frame(Median=median(x, na.rm=TRUE ), t(quantile(x,na.rm=TRUE )),
}
plyr:::adply(as.matrix(fit),2 ,MCMCsum)
X1 Median X0. X25. X50. X75. X100. lower upper
1 ALG2 - ALG1 1.2633289 0.8452016 1.1549126 1.2633289 1.3888569 2.0054558 0.9820844 1.5952722
2 NB - ALG1 0.6643929 0.4142791 0.5990942 0.6643929 0.7348131 1.0752521 0.4654365 0.8588770
3 S - ALG1 0.5845745 0.3666083 0.5252027 0.5845745 0.6472396 0.9985889 0.4160054 0.7742567

4 NB - ALG2 0.5246730 0.3142344 0.4775387 0.5246730 0.5817067 0.8599918 0.3790918 0.6773199
5 S - ALG2 0.4627503 0.2666343 0.4155122 0.4627503 0.5136620 0.7633502 0.3366329 0.6175163
6 S - NB 0.8804989 0.5160321 0.7789036 0.8804989 0.9970252 1.6316219 0.6258814 1.2183909
## Due to log laws, these effects are multipliers rather than differences
## If you want to express the outcomes in terms of differences then it is necessary to

## estimate the cell posteriors and subtract those - this cannot easily be done Frequentist!
library(multcomp)
tuk.mat <- contrMat(n=table(newdata$TREAT), type="Tukey")
Xmat <- tuk.mat
fit <- fit %*% t(Xmat)
1 ALG2 - ALG1 5.895130 -3.772229 3.619496 5.895130 8.21805575 17.17097099 -0.03095788 12.031425
2 NB - ALG1 -7.483685 -16.063734 -9.424278 -7.483685 -5.67358521 1.43404144 -12.96977305 -2.324986
3 S - ALG1 -9.257110 -19.048803 -11.112041 -9.257110 -7.47521694 -0.02471089 -14.60523050 -4.132667
4 NB - ALG2 -13.383835 -25.127509 -15.420303 -13.383835 -11.30591077 -3.08899312 -19.21054550 -7.484513
5 S - ALG2 -15.182663 -26.816258 -17.194247 -15.182663 -13.16149810 -5.56128213 -20.74073778 -9.168501
6 S - NB -1.807393 -8.789820 -3.462725 -1.807393 -0.04160081 6.74548604 -6.36710762 2.861639
Q2.8. Alternatively, it might be interesting to explore specific contrasts

compare the two algal surfaces
compare the two bare surfaces
compare the algal and bare surfaces
Show code
## Effect as a multiplier
comp.mat <- rbind(c(1 ,-1 ,0 ,0 ), c(0 ,0 ,1 ,-1 ), c(1 /2 ,1 /2 ,-1 /2 ,-1 /2 ))

Xmat <- cbind(0 ,comp.mat %*% contr.treatment(4 ))
}
1 1 0.7915595 0.4986398 0.7200166 0.7915595 0.8658664 1.183150 0.6083706 0.9906496
2 2 1.1357198 0.6128871 1.0029837 1.1357198 1.2838560 1.937864 0.7969842 1.5466949
3 3 1.8067288 1.2633552 1.6705385 1.8067288 1.9401542 2.563707 1.4381279 2.1821891
## Effect as a difference
comp.mat <- rbind(c(1 ,-1 ,0 ,0 ), c(0 ,0 ,1 ,-1 ), c(1 /2 ,1 /2 ,-1 /2 ,-1 /2 ))
fit <- fit %*% t(comp.mat)
1 1 -5.895130 -17.170971 -8.21805575 -5.895130 -3.619496 3.772229 -12.031425 0.03095788
2 2 1.807393 -6.745486 0.04160081 1.807393 3.462725 8.789820 -2.861639 6.36710762
3 3 11.296350 4.225970 9.88922295 11.296350 12.684859 18.425620 7.272778 14.98168802
Q2.9. Produce a suitable summary plot
Show code

data.frame(Mean=mean(x, na.rm=TRUE ),Median=median(x, na.rm=TRUE ), HPDinterval(as.mcmc(x)))
}))
ggplot(day, aes(y=BARNACLE, x=TREAT)) +

scale_y_continuous('Number of newly recruited Barnacles') +
scale_x_discrete('Substrate type', breaks=c('ALG1','ALG2','NB','S'),
labels=c('Algae 1','Algae 2', 'Naturally\nBare','Scraped\nBare')) +
theme_classic() +
Example 3: Poisson ANOVA (regression)

To investigate habitat differences in moth abundances, researchers from the Australian National University (ANU) in Canberra
counted the number of individuals of two species of moth (recorded as A and P) along transects throughout a landscape
comprising eight different habitat types ('Bank', 'Disturbed', 'Lowerside', 'NEsoak', 'NWsoak', 'SEsoak', 'SWsoak', 'Upperside').
For the data presented here, one of the habitat types ('Bank') has been ommitted as no moths were encounted in that habitat.
Although each transect was approximately the same length, each transect passed through multiple habitat types.
Consequently, each transect was divided into sections according to habitat and the number of moths observed in each section
recorded. Clearly, the number of observed moths in a section would be related to the length of the transect in that section.
Therefore, the researchers also recorded the length of each habitat section.
Download moths data set
Format of moth.csv data files
METERS A P HABITAT
25 9 8 NWsoak
37 3 20 SWsoak
109 7 9 Lowerside
10 0 2 Lowerside
133 9 1 Upperside
26 3 18 Disturbed
METERS The length of the section of transect
A The number of moth species A observed in section of
transect
P The number of moth species P observed in section of
transect
HABITAT Categorical listing of the habitat type within section of
transect.
Open the day data file.

Show code
moths <- read.csv('../downloads/data/moths.csv', header=T, strip.white=T)
head(moths)
METERS A P HABITAT
1 25 9 8 NWsoak
2 37 3 20 SWsoak
3 109 7 9 Lowerside
4 10 0 2 Lowerside
5 133 9 1 Upperside
6 26 3 18 Disturbed
The primary focus of this question will be to investigate the effect of habitat on the abundance of moth species A.
Q3.1. To standardize the moth counts for transect section length, we could convert the counts into densities (by dividing the
A by METERS. Create such as variable (DENSITY and explore the usual ANOVA assumptions.
Show code
moths <- within(moths, DENSITY<-A/METERS)
boxplot(DENSITY~HABITAT, data=moths)
library(vcd)
fit <- goodfit(moths$A, type='poisson')
summary(fit)
X^2 df P(> X^2)

Likelihood Ratio 170.2708 11 1.037761e-30
rootogram(fit)
fit <- goodfit(moths$A, type='nbinom')

summary(fit)
X^2 df P(> X^2)

rootogram(fit)
Ord_plot(moths$A, tol=0.2 )
distplot(moths$A, type='poisson')
distplot(moths$A, type='nbinom')
Q3.2. Clearly, the there is an issue with normality and homogeneity of variance. Perhaps it would be worth transforming
density in an attempt to normalize these data. Given that there are densities of zero, a straight logarithmic transformation
would not be appropriate. Alternatives could inlude a square-root transform, a forth-root transform and a log plus 1 transform.
Show code
moths <- within(moths, DENSITY<-A/METERS)
boxplot(sqrt(DENSITY)~HABITAT, data=moths)
boxplot((DENSITY)^0.25 ~HABITAT, data=moths)
boxplot(log(DENSITY+1 )~HABITAT, data=moths)
Q3.3. Arguably, non of the above transformations have improved the data's adherence to the linear modelling assumptions.
Count data (such as the abundance of moth species A) is unlikely to follow a normal or even log-normal distribution. Count
data are usually more appropriately modelled via a Poisson or Negative binomial distributions. Note, dividing by transect
length is unlikely to alter the underlying nature of the data distribution as it still based on counts. Fit the
generalized linear model (GLM) relating the number of moths to habitat. $$ \begin{align} Y_i&\sim{}P(\lambda) &
(\text{response distribution})\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 X_i &
(\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ \end{align} $$ OR
$$ \begin{align} Y_i&\sim{}NB(p, size) & (\text{response distribution})\\ p &= size/(size + \lambda)\\
log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0,
\beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ size &\sim{}T3,0,5)& (\text{week student t prior})\\
\end{align} $$
Show code
## Poisson model
moths.brm <- brm(A~HABITAT, family=poisson, data=moths,

#

#

#
## Negative Binomial model
moths.brm2 <- brm(A~HABITAT, family=negbinomial, data=moths,
set_prior("normal(0,1000)", class="b"),

#


#

#
Q3.4. Actually, the above models fail to account for the length of the section of transect. We could do this a couple of ways:
a. Add METERS as a covariate. Note, since we are modelling against a Poission distribution with a log link function, we
should also log the covariate - in order to maintain linearity between the expected value of species A abundance and
transect length. $$ \begin{align} Y_i&\sim{}P(\lambda) & (\text{response distribution})\\ log(\lambda_i)&=\eta_i &
(\text{link function})\\ \eta_i&=\beta_0+\beta_1 Habitat_i + \beta_2 log(Meters_i) & (\text{linear predictor})\\ \beta_0,
\beta_1, \beta_2 &\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ \end{align} $$ OR $$ \begin{align}
Y_i&\sim{}NB(p, size) & (\text{response distribution})\\ p &= size/(size + \lambda)\\ log(\lambda_i)&=\eta_i & (\text{link
function})\\ \eta_i&=\beta_0+\beta_1 Habitat_i + \beta_2 log(Meters_i) & (\text{linear predictor})\\ \beta_0, \beta_1,
\beta_2 &\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ size &\sim{}T3,0,5)& (\text{week student t
prior})\\ \end{align} $$ $$log(\mu) = \beta_0+\beta_1 Habitat_i + \beta_2 log(meters) +\epsilon_i, \hspace{1cm}
\epsilon \sim Poisson(\lambda)\\ log(\mu) = \beta_0+\beta_1 Habitat_i + \beta_2 log(meters) +\epsilon_i, \hspace{1cm}
\epsilon \sim NB(n,p) $$
Show code
## Poisson model
moths.brmP <- brm(A~log(METERS)+HABITAT, family=poisson, data=moths,

#

#


#

moths.brmNB <- brm(A~log(METERS)+HABITAT, family=negbinomial, data=moths,

#

#

#
b. Add METERS as an offset in which case $\beta_2$ is assumed to be 1 (a reasonable assumption in this case). The
advantage is that it does not sacrifice any residual degrees of freedom. Again to maintain linearity, the offset should include
the log of transect length. $$ \begin{align} Y_i&\sim{}P(\lambda) & (\text{response distribution})\\
log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 Habitat_i + log(Meters_i)& (\text{linear
predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ \end{align} $$ OR $$
\begin{align} Y_i&\sim{}NB(p, size) & (\text{response distribution})\\ p &= size/(size + \lambda)\\
log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 Habitat_i + log(Meters_i)& (\text{linear
predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ size &\sim{}T3,0,5)&
(\text{week student t prior})\\ \end{align} $$ $$log(\mu) = \beta_0+\beta_1 Habitat_i + log(meters) +\epsilon_i,
\hspace{1cm} \epsilon \sim Poisson(\lambda)\\ log(\mu) = \beta_0+\beta_1 Habitat_i + log(meters) +\epsilon_i,
\hspace{1cm} \epsilon \sim NB(n,p)$$
Show code
## Poisson model
moths.brmPO <- brm(A~HABITAT + offset(log(METERS)), family=poisson, data=moths,


#

#

#

moths.brmNBO <- brm(A~HABITAT + offset(log(METERS)), family=negbinomial, data=moths,

#

#


#

a. Trace plots
Show code
library(gridExtra)
grid.arrange(stan_trace(moths.brmP$fit, ncol=1 ),
stan_dens(moths.brmP$fit, separate_chains=TRUE ,ncol=1 ),
ncol=2 )
grid.arrange(stan_trace(moths.brmNB$fit, ncol=1 ),
stan_dens(moths.brmNB$fit, separate_chains=TRUE ,ncol=1 ),
ncol=2 )
grid.arrange(stan_trace(moths.brmPO$fit, ncol=1 ),
stan_dens(moths.brmPO$fit, separate_chains=TRUE ,ncol=1 ),
ncol=2 )
grid.arrange(stan_trace(moths.brmNBO$fit, ncol=1 ),
stan_dens(moths.brmNBO$fit, separate_chains=TRUE ,ncol=1 ),
ncol=2 )
b. Autocorrelation
Show code
stan_ac(moths.brmP$fit)
stan_ac(moths.brmNB$fit)
stan_ac(moths.brmPO$fit)
stan_ac(moths.brmNBO$fit)
Show code
summary(do.call(rbind, args = get_sampler_params(moths.brmP$fit, inc_warmup = FALSE )), digits = 2 )
Min. :0.39 Min. :0.26 Min. :2.0 Min. : 3 Min. :0
1st Qu.:0.86 1st Qu.:0.26 1st Qu.:3.0 1st Qu.: 7 1st Qu.:0
Median :0.94 Median :0.26 Median :3.0 Median : 7 Median :0
Mean :0.90 Mean :0.27 Mean :3.3 Mean :10 Mean :0
3rd Qu.:0.98 3rd Qu.:0.30 3rd Qu.:4.0 3rd Qu.:15 3rd Qu.:0
Max. :1.00 Max. :0.30 Max. :5.0 Max. :31 Max. :0
stan_diag(moths.brmP$fit)
stan_diag(moths.brmP$fit, information = "stepsize")
stan_diag(moths.brmP$fit, information = "treedepth")
stan_diag(moths.brmP$fit, information = "divergence")
library(gridExtra)
grid.arrange(stan_rhat(moths.brmP$fit) + theme_classic(8 ),
stan_ess(moths.brmP$fit) + theme_classic(8 ),
stan_mcse(moths.brmP$fit) + theme_classic(8 ),
ncol = 2 )
summary(do.call(rbind, args = get_sampler_params(moths.brmNB$fit, inc_warmup = FALSE )), digits = 2 )

stan_diag(moths.brmNB$fit)
stan_diag(moths.brmNB$fit, information = "stepsize")
stan_diag(moths.brmNB$fit, information = "treedepth")
stan_diag(moths.brmNB$fit, information = "divergence")
library(gridExtra)
grid.arrange(stan_rhat(moths.brmNB$fit) + theme_classic(8 ),
stan_ess(moths.brmNB$fit) + theme_classic(8 ),
stan_mcse(moths.brmNB$fit) + theme_classic(8 ),
ncol = 2 )
summary(do.call(rbind, args = get_sampler_params(moths.brmPO$fit, inc_warmup = FALSE )), digits = 2 )

Min. :0.36 Min. :0.28 Min. :1 Min. : 1.0 Min. :0
1st Qu.:0.85 1st Qu.:0.28 1st Qu.:2 1st Qu.: 3.0 1st Qu.:0
Median :0.94 Median :0.31 Median :3 Median : 7.0 Median :0
Mean :0.90 Mean :0.31 Mean :3 Mean : 7.6 Mean :0
3rd Qu.:0.98 3rd Qu.:0.32 3rd Qu.:3 3rd Qu.: 7.0 3rd Qu.:0
Max. :1.00 Max. :0.32 Max. :5 Max. :31.0 Max. :0
stan_diag(moths.brmPO$fit)
stan_diag(moths.brmPO$fit, information = "stepsize")
stan_diag(moths.brmPO$fit, information = "treedepth")
stan_diag(moths.brmPO$fit, information = "divergence")
library(gridExtra)
grid.arrange(stan_rhat(moths.brmPO$fit) + theme_classic(8 ),
stan_ess(moths.brmPO$fit) + theme_classic(8 ),
stan_mcse(moths.brmPO$fit) + theme_classic(8 ),
ncol = 2 )
summary(do.call(rbind, args = get_sampler_params(moths.brmNBO$fit, inc_warmup = FALSE )), digits = 2 )

stan_diag(moths.brmNBO$fit)
stan_diag(moths.brmNBO$fit, information = "stepsize")
stan_diag(moths.brmNBO$fit, information = "treedepth")
stan_diag(moths.brmNBO$fit, information = "divergence")
library(gridExtra)
grid.arrange(stan_rhat(moths.brmNBO$fit) + theme_classic(8 ),
stan_ess(moths.brmNBO$fit) + theme_classic(8 ),
stan_mcse(moths.brmNBO$fit) + theme_classic(8 ),
ncol = 2 )

Show code
require(gap)
N = length(Y)
'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2 ,mean),mean(size)),ncol=N, byrow=TRUE ),
'zinb' = matrix(gamlss.dist:::rZINBI(n*N,apply(lambda,2 ,mean),sigma=theta,nu=mean(size)),ncol=N,
byrow=TRUE )
)
resid<-NULL
if (plot==T) {
par(mfrow=c(1 ,2 ))
}
resid
}
lambda.brm = fitted(moths.brmP, scale='response', summary=FALSE )

simRes(lambda.brm, moths$A, family='poisson')
[1] 0.000 0.092 0.756 0.008 0.952 0.952 0.024 0.076 0.024 1.000 0.648 1.000 0.740 0.140 0.024 0.540
0.424 0.716 0.624 0.832 0.624 0.848 0.100 0.332 0.440 0.964 0.232 0.772 0.192 0.060 0.064 0.488 0.220
0.928 1.000 0.056 0.204 0.976 0.052 0.888
lambda.brm = fitted(moths.brmNB, scale='response', summary=FALSE )

size = rstan:::extract(moths.brmNB$fit, 'shape')[[1 ]]
simRes(lambda.brm, moths$A, family='negbin', size=size)
[1] 0.100 0.116 0.500 0.020 0.744 0.848 0.136 0.228 0.132 0.796 0.552 0.984 0.644 0.208 0.108 0.528
0.552 0.800 0.512 0.652 0.640 0.800 0.148 0.460 0.448 0.900 0.416 0.676 0.228 0.128 0.296 0.508 0.312
0.812 0.900 0.148 0.080 0.884 0.140 0.804
lambda.brm = fitted(moths.brmPO, scale='response', summary=FALSE )
simRes(lambda.brm, moths$A, family='poisson')
[1] 0.000 0.064 0.000 0.120 0.984 0.312 0.080 0.176 0.336 0.956 0.984 1.000 0.860 0.888 0.668 0.632
0.960 0.792 1.000 0.460 1.000 1.000 0.016 0.104 0.816 1.000 0.396 0.640 0.340 0.480 0.044 0.420 0.100
0.788 1.000 0.004 0.192 0.688 0.004 0.808
lambda.brm = fitted(moths.brmNBO, scale='response', summary=FALSE )

size = rstan:::extract(moths.brmNB$fit, 'shape')[[1 ]]
simRes(lambda.brm, moths$A, family='negbin', size=size)
[1] 0.008 0.048 0.000 0.012 0.856 0.208 0.048 0.392 0.300 0.264 0.784 0.972 0.248 0.588 0.112 0.460
0.864 0.676 0.356 0.188 0.884 0.916 0.004 0.172 0.780 0.992 0.300 0.492 0.180 0.444 0.064 0.372 0.196
0.424 0.988 0.024 0.008 0.472 0.012 0.340
Show code
## Poisson
Resid.brm <- residuals(moths.brmP, type='pearson', summary=FALSE )
lambda.brm = fitted(moths.brm, scale='response', summary=FALSE )
[1] 0.005333333
## Negative Binomial
Resid.brm <- residuals(moths.brmNB, type='pearson', summary=FALSE )
[1] 0.8346667
## Poisson with offset
Resid.brm <- residuals(moths.brmPO, type='pearson', summary=FALSE )
[1] 0
## Negative Binomial with offset
Resid.brm <- residuals(moths.brmNBO, type='pearson', summary=FALSE )
[1] 0.872
Show code
## Poisson
newdata <- data.frame(HABITAT=levels(moths$HABITAT), METERS=mean(moths$METERS))
tempdata = fitted(moths.brmP, scale='response',newdata=newdata)
tempdata
HABITAT METERS Mean Lower Upper
1 Disturbed 45.5 1.474522 0.5637094 3.031730
2 Lowerside 45.5 5.457930 3.5429458 7.981774
3 NEsoak 45.5 2.377828 1.2604467 3.745447
4 NWsoak 45.5 26.209140 19.6452112 34.361091
5 SEsoak 45.5 5.851585 4.0600126 8.137415
6 SWsoak 45.5 6.959453 4.2331902 10.367149
7 Upperside 45.5 4.761405 2.9405168 6.993508
library(ggplot2)
ggplot(moths, aes(y=A, x=HABITAT)) + geom_boxplot() +
##OR the manual way

Xmat <- model.matrix(~log(METERS) + HABITAT, data=newdata)
coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmP$fit)))
}))
tempdata = fitted(moths.brmNB, scale='response',newdata=newdata)
library(ggplot2)
##OR the manual way

coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmNB$fit)))

}))

tempdata = fitted(moths.brmPO, scale='response',newdata=newdata)
library(ggplot2)
##OR the manual way - dont forget to account for the offset
Xmat <- model.matrix(~HABITAT, data=newdata)
coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmPO$fit)))
fit = exp(coefs %*% t(Xmat) + log(mean(moths$METERS)))
}))

tempdata = fitted(moths.brmNBO, scale='response',newdata=newdata)
library(ggplot2)
##OR the manual way

Xmat <- model.matrix(~HABITAT, data=newdata)
coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmNBO$fit)))
fit = exp(coefs %*% t(Xmat) + log(mean(moths$METERS)))

}))
Show code
## Poisson
coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmP$fit)))
Resid <- residuals(moths.brmP, type='pearson',summary=FALSE )
Disp <- RSS/(nrow(moths)-ncol(coefs))
var1 2.42229 2.442048 2.273809 2.671013
Resid <- residuals(moths.brmNB, type='pearson',summary=FALSE )
var1 0.8497883 0.9132843 0.7213029 1.304696

coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmPO$fit)))
Resid <- residuals(moths.brmPO, type='pearson',summary=FALSE )
var1 8.512843 8.533245 8.35746 8.765346

coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmNBO$fit)))
Resid <- residuals(moths.brmNBO, type='pearson',summary=FALSE )
var1 0.7153655 0.9899389 0.6280123 1.46994
Q3.7. At this stage we will conclude that the offset models are poor fits. The Negative Binomial model appears to be a better
fit than the Poisson, the latter of which is overdispersed. Explore the parameter estimates for the Negative Binomial model.
Show code
summary(moths.brmNB)
Family: negbinomial (log)
Formula: A ~ log(METERS) + HABITAT
Data: moths (Number of observations: 40)
WAIC: Not computed
Fixed Effects:
Intercept -0.16 0.50 -1.17 0.76 651 1.01
logMETERS 0.14 0.15 -0.14 0.42 566 1.00
HABITATLowerside 1.37 0.52 0.38 2.32 556 1.00
HABITATNEsoak 0.51 0.66 -0.75 1.85 492 1.00
HABITATNWsoak 2.95 0.61 1.79 4.17 466 1.00
HABITATSEsoak 1.40 0.55 0.36 2.50 431 1.00
HABITATSWsoak 1.62 0.69 0.37 2.98 523 1.00
HABITATUpperside 1.18 0.81 -0.27 2.75 464 1.00
Family Specific Parameters:

shape 3.27 1.41 1.31 6.8 649 1

exp(coef(moths.brmNB))
mean
Intercept 0.8530651
logMETERS 1.1495789
HABITATLowerside 3.9311608
HABITATNEsoak 1.6688071
HABITATNWsoak 19.0893054
HABITATSEsoak 4.0741209
HABITATSWsoak 5.0406178
HABITATUpperside 3.2528957
coefs.brm <- as.matrix(as.data.frame(rstan:::extract(moths.brmNB$fit)))
})
1 b_Intercept 0.9640476 0.8667087 0.2455259 1.924904
2 b_logMETERS 1.1622245 1.1454692 0.8537851 1.493643
3 b_HABITATLowerside 4.4979287 4.0014549 1.0605590 8.794885
4 b_HABITATNEsoak 2.0883054 1.6512687 0.2406724 5.097136
5 b_HABITATNWsoak 23.0365909 19.1029295 2.6240569 52.580216
6 b_HABITATSEsoak 4.7417832 4.1066968 0.7076325 9.973949
7 b_HABITATSWsoak 6.4842791 4.9137160 0.8596906 15.789222
8 b_HABITATUpperside 4.5218465 3.1861182 0.3221937 12.337032
marginal_effects(moths.brmNB, effects='HABITAT')
Show code
lambda.brm = fitted(moths.brmNB, scale='response', summary=FALSE )
SSres.brm <- apply((-1 *(sweep(lambda.brm,2 , moths$A,'-')))^2 ,1 ,sum)
SSres.null <- sum((moths$A - mean(moths$A))^2 )
#OR
[1] 0.3798634
#OR manually
Xmat <- model.matrix(~log(METERS) + HABITAT, data=moths)

SSres.brm <- apply((-1 *(sweep(lambda.brm,2 ,moths$A,'-')))^2 ,1 ,sum)
SSres.null <- crossprod(moths$A - mean(moths$A))
[,1]
[1,] 0.3798634
## Alternatively..
R.var = apply(residuals(moths.brmNB, summary=FALSE ),1 ,var)
X.var = apply(fitted(moths.brmNB, summary=FALSE ),1 ,var)
Median lower upper
var1 0.6292283 0.3216714 0.7238538
Q3.9. Although we have now established that there is a statistical difference between the group means, we do not yet know
which group(s) are different from which other(s). To address this we can explore all pairwise contrasts from the posterior.
Note, careful when back transforming from log. Recall that log laws indicate that $log(a) - log(b) = log(a/b)$.
Show code
coefs <-as.matrix(as.data.frame(rstan:::extract(moths.brmNB$fit)))
library(multcomp)
tuk.mat <- contrMat(n=table(newdata$HABITAT), type="Tukey")
Xmat <- model.matrix(~log(METERS)+HABITAT, data=newdata)
Xmat <- tuk.mat %*% Xmat
library(coda)
}
X1 Median X0. X25. X50. X75. X100. lower
upper
1 Lowerside - Disturbed 4.0014549 0.75058230 2.6927014 4.0014549 5.7029870 34.0094273 1.06055902
8.7948850
2 NEsoak - Disturbed 1.6512687 0.17191247 1.0657961 1.6512687 2.5321403 15.0458543 0.24067244
5.0971364
3 NWsoak - Disturbed 19.1029295 2.62405689 12.8337712 19.1029295 27.8190728 183.3046958 2.62405689
52.5802164
4 SEsoak - Disturbed 4.1066968 0.59830817 2.8466216 4.1066968 5.7985525 32.5714541 0.70763248
9.9739487
5 SWsoak - Disturbed 4.9137160 0.33815372 3.1774973 4.9137160 8.0255117 55.9231149 0.85969057
15.7892225
6 Upperside - Disturbed 3.1861182 0.32219367 1.8454384 3.1861182 5.6032516 36.7013276 0.32219367
12.3370325
7 NEsoak - Lowerside 0.4302645 0.09341336 0.3011472 0.4302645 0.5996800 1.7768091 0.12722450
0.9474301
8 NWsoak - Lowerside 4.7596871 1.40687834 3.5717100 4.7596871 6.6442824 22.2970521 1.40687834
10.2087762
9 SEsoak - Lowerside 1.0230626 0.30087781 0.8127477 1.0230626 1.3137698 3.8331690 0.49229150
2.0169133
10 SWsoak - Lowerside 1.2891018 0.16286740 0.8997906 1.2891018 1.8098058 7.7462736 0.16286740
3.0991319
11 Upperside - Lowerside 0.8160707 0.15518045 0.5505156 0.8160707 1.2557586 5.5328028 0.15518045
2.2845561
12 NWsoak - NEsoak 11.3275794 2.38413610 7.9713011 11.3275794 16.4057151 74.0606945 3.52147567
27.5885849
13 SEsoak - NEsoak 2.4364756 0.49206697 1.7481048 2.4364756 3.4164552 8.6936643 0.70461238
5.4812849
14 SWsoak - NEsoak 2.9909563 0.49728521 2.0435734 2.9909563 4.3208633 22.0843886 0.64089541
7.6826544
15 Upperside - NEsoak 1.9914847 0.17379858 1.4008755 1.9914847 2.7697335 13.3831349 0.39992138
4.8920306
16 SEsoak - NWsoak 0.2155425 0.05343626 0.1554463 0.2155425 0.2892431 0.9090968 0.08251825
0.4751569
17 SWsoak - NWsoak 0.2691229 0.04348677 0.1833171 0.2691229 0.3791270 1.4805610 0.06255997
0.6595744
18 Upperside - NWsoak 0.1718971 0.02995233 0.1149770 0.1718971 0.2468979 0.9966972 0.02995233
0.4584054
19 SWsoak - SEsoak 1.2630026 0.14038551 0.8964932 1.2630026 1.7209516 8.8383401 0.27528176
2.8265154
20 Upperside - SEsoak 0.7956164 0.13375965 0.5553199 0.7956164 1.1470159 5.5829958 0.13375965
2.0120800
21 Upperside - SWsoak 0.6492371 0.10376765 0.4309988 0.6492371 0.9598559 5.3098336 0.11219507
1.7852750
## Due to log laws, these effects are multipliers rather than differences
## If you want to express the outcomes in terms of differences then it is necessary to

## estimate the cell posteriors and subtract those - this cannot easily be done Frequentist!
library(multcomp)
tuk.mat <- contrMat(n=table(newdata$HABITAT), type="Tukey")
fit <- fit %*% t(tuk.mat)

X1 Median X0. X25. X50. X75. X100.
lower upper
1 Lowerside - Disturbed 4.0689569 -2.202499 3.0265807 4.0689569 5.3089091 14.57838684
1.2409552 8.1448004
2 NEsoak - Disturbed 0.9017660 -7.312479 0.1340833 0.9017660 1.6467105 6.94889870
-1.8583399 3.6882766
3 NWsoak - Disturbed 25.4062806 9.963564 20.2980553 25.4062806 32.7550632 85.26925867
10.6408161 51.6049801
4 SEsoak - Disturbed 4.2722895 -2.581770 3.3043011 4.2722895 5.4839030 14.10709025
1.0868378 8.2307350
5 SWsoak - Disturbed 5.7266432 -3.546058 3.8621775 5.7266432 8.1649238 31.40750951
0.2218100 13.8002744
6 Upperside - Disturbed 3.0285388 -5.831655 1.6724786 3.0285388 4.9985728 16.28502930
-0.8236566 9.9717337
7 NEsoak - Lowerside -3.1348894 -14.535877 -4.6852567 -3.1348894 -1.9589919 3.03830738
-8.6001836 0.6197858
8 NWsoak - Lowerside 21.3923404 3.912020 15.6498096 21.3923404 28.6200122 78.99186890
4.2933904 46.4128749
9 SEsoak - Lowerside 0.1404985 -10.703890 -1.1670427 0.1404985 1.5726694 11.06519437
-5.7490841 4.1488093
10 SWsoak - Lowerside 1.7227120 -10.186649 -0.5544462 1.7227120 3.9412710 27.85100117
-5.1579731 12.1380466
11 Upperside - Lowerside -1.0124225 -12.908841 -3.0498762 -1.0124225 1.2095389 12.87205248
-7.8457054 6.6083564
12 NWsoak - NEsoak 24.7864257 7.193430 19.1047828 24.7864257 32.1681075 85.29390321
7.1934299 50.4455294
13 SEsoak - NEsoak 3.3941761 -3.446630 2.1103394 3.3941761 4.7262440 12.75358000
-0.2609212 8.8197256
14 SWsoak - NEsoak 4.6908240 -2.163572 2.9268259 4.6908240 7.1472431 30.53130201
-0.3405123 14.5425090
15 Upperside - NEsoak 2.3346951 -9.374825 1.0356404 2.3346951 3.7770028 15.27627974
-2.4397183 7.7411171
16 SEsoak - NWsoak -20.9607257 -77.798077 -28.8479819 -20.9607257 -15.6463026 -1.08124784
-47.7083030 -6.1904502
17 SWsoak - NWsoak -19.5044667 -81.107760 -26.6646773 -19.5044667 -13.4849094 9.43627683
-47.2339163 -0.3334677
18 Upperside - NWsoak -22.5288701 -82.524769 -29.9152465 -22.5288701 -16.3370635 -0.03711989
-50.4070421 -4.8827190
19 SWsoak - SEsoak 1.5127864 -11.093903 -0.7095885 1.5127864 3.9371915 28.36110107
-5.4554441 10.4675256
20 Upperside - SEsoak -1.1427166 -12.535355 -3.0425532 -1.1427166 0.7945388 14.15899246
-6.9809698 6.0547850
21 Upperside - SWsoak -2.3541702 -23.471257 -5.2799897 -2.3541702 -0.2025978 12.69251435

-12.3680295 5.9271043
Show code
coefs <-as.matrix(as.data.frame(rstan:::extract(moths.brmNB$fit)))

}))
ggplot(moths, aes(y=A, x=HABITAT)) +

scale_y_continuous('Number of Moths') +
scale_x_discrete('HABITAT') +
theme_classic() +
Example 4: Poisson t-test with zero-inflation

Yet again our rather fixated marine ecologist has returned to the rocky shore with an interest in examining the effects of wave
exposure on the abundance of yet another intertidal marine limpet Patelloida latistrigata on rocky intertidal shores. It would
appear that either this ecologist is lazy, is satisfied with the methodology or else suffers from some superstition disorder (that
prevents them from deviating too far from a series of tasks), and thus, yet again, a single quadrat was placed on 10 exposed
(to large waves) shores and 10 sheltered shores. From each quadrat, the number of scaley limpets (Patelloida latistrigata) were
counted. Initially, faculty were mildly interested in the research of this ecologist (although lets be honest, most academics
usually only display interest in the work of the other members of their school out of politeness - Oh I did not say that did I?).
Nevertheless, after years of attending seminars by this ecologist in which the only difference is the target species, faculty have
started to display more disturbing levels of animosity towards our ecologist. In fact, only last week, the members of the
school's internal ARC review panel (when presented with yet another wave exposure proposal) were rumored to take great
pleasure in concocting up numerous bogus applications involving hedgehogs and balloons just so they could rank our ecologist
proposal outside of the top 50 prospects and ... Actually, I may just have digressed!
Download LimpetsScaley data set
Format of limpetsScaley.csv data files
Count Shore
4 sheltered
1 sheltered
2 sheltered
0 sheltered
4 sheltered
... ...
Shore Categorical description of the shore type (sheltered or exposed) -

Predictor variable
Count Number of limpets per quadrat - Response variable
Open the limpetsSmooth data set.
Show code
limpets <- read.table('../downloads/data/limpetsScaley.csv', header=T, sep=',', strip.white=T)
limpets
Count Shore
1 2 sheltered
2 1 sheltered
3 3 sheltered
4 1 sheltered
5 0 sheltered
6 0 sheltered
7 1 sheltered
8 0 sheltered
9 2 sheltered
10 1 sheltered
11 4 exposed
12 9 exposed
13 3 exposed
14 1 exposed
15 3 exposed
16 0 exposed
17 0 exposed
18 7 exposed
19 8 exposed
20 5 exposed
Q4.1. As part of the routine exploratory data analysis:

a. Explore the distribution of counts from each population
Show code
boxplot(Count~Shore,data=limpets)
rug(jitter(limpets$Count), side=2 )
b. Compare the expected number of zeros to the actual number of zeros to determine whether the model might be zero
inflated.
c. Calculate the proportion of zeros in the data
Show code
limpets.tab<-table(limpets$Count==0 )
limpets.tab/sum(limpets.tab)
FALSE TRUE
0.75 0.25
d. Now work out the proportion of zeros we would expect from a Poisson distribution with a lambda equal to the mean
count of limpets in this study.
Show code
limpets.mu <- mean(limpets$Count)
cnts <-rpois(1000 ,limpets.mu)
limpets.tab<-table(cnts==0 )
limpets.tab/sum(limpets.tab)
FALSE TRUE
0.91 0.09
Clearly there are substantially more zeros than expected so the data are likely to be zero inflated.
Q4.2. Lets explore a zero-inflated Poisson (ZIP) model. $$ p(y_i|\theta,\lambda) = \left\{ \begin{array}{l l} \theta + (1-
\theta)\times \text{Pois}(0|\lambda) & \quad \text{if $y_i=0$ and}\\ (1-\theta)\times \text{Pois}(y_i|\lambda) & \quad
\text{if $y_i>0$} \end{array} \right.\\ \begin{align} log(\lambda_i)&=\eta_i & (\text{link function})\\
\eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week
Show code
limpets.brm <- brm(Count~0 +spec+main+main:Shore, family='zero_inflated_poisson', data=limpets,

#


#

#

a. Trace plots
Show code
library(gridExtra)
grid.arrange(stan_trace(limpets.brm$fit, ncol=1 ),
stan_dens(limpets.brm$fit, separate_chains=TRUE ,ncol=1 ),
ncol=2 )
b. Autocorrelation
Show code
stan_ac(limpets.brm$fit)
Show code
summary(do.call(rbind, args = get_sampler_params(limpets.brm$fit, inc_warmup = FALSE )), digits = 2 )
Min. :0.00 Min. :0.22 Min. :1.0 Min. : 1.0 Min. :0.00
1st Qu.:0.80 1st Qu.:0.22 1st Qu.:3.0 1st Qu.: 4.0 1st Qu.:0.00
Median :0.93 Median :0.22 Median :3.0 Median : 7.0 Median :1.00
Mean :0.84 Mean :0.28 Mean :3.1 Mean : 7.8 Mean :0.51
3rd Qu.:1.00 3rd Qu.:0.40 3rd Qu.:4.0 3rd Qu.:11.0 3rd Qu.:1.00
Max. :1.00 Max. :0.40 Max. :5.0 Max. :24.0 Max. :1.00
stan_diag(limpets.brm$fit)
stan_diag(limpets.brm$fit, information = "stepsize")
stan_diag(limpets.brm$fit, information = "treedepth")
stan_diag(limpets.brm$fit, information = "divergence")
library(gridExtra)
grid.arrange(stan_rhat(limpets.brm$fit) + theme_classic(8 ),
stan_ess(limpets.brm$fit) + theme_classic(8 ),
stan_mcse(limpets.brm$fit) + theme_classic(8 ),
ncol = 2 )

Show code
lambda.brm = exp(fitted(limpets.brm, scale='linear', summary=FALSE ))[,1 :nrow(limpets)]
theta = binomial()$linkinv(mean(rstan:::extract(limpets.brm$fit, 'b_spec')[[1 ]]))
simRes(lambda.brm, limpets$Count, family='zip', theta=theta)
[1] 0.660 0.484 0.944 0.464 0.288 0.196 0.736 0.312 0.768 0.636 0.592 0.996 0.376 0.056 0.420 0.008
0.000 0.884 0.944 0.708
Show code
## Poisson
Resid.brm <- residuals(limpets.brm, type='pearson', summary=FALSE )
[1] 0.07466667
Show code
tempdata = fitted(limpets.brm, scale='response',newdata=newdata)
library(ggplot2)
##OR the manual way

coefs <- coefs[,grep('^b_main', colnames(coefs))]
}))
Show code
Resid <- residuals(limpets.brm, type='pearson',summary=FALSE )
Disp <- RSS/(nrow(limpets)-ncol(coefs))
var1 1.750797 1.789582 1.686107 2.004651
Q4.5. Given that the model seems to be well specified and there are no diagnostics that would question the reliability of the
model, lets explore the model parameters.
Show code
summary(limpets.brm)
Family: zero_inflated_poisson (log)
Formula: Count ~ 0 + spec + main + main:Shore
Data: limpets (Number of observations: 20)
WAIC: Not computed
Fixed Effects:
spec -787.73 602.86 -2246.33 -35.06 459 1
main 1.37 0.16 1.03 1.68 520 1
main:Shoresheltered -1.33 0.34 -2.00 -0.70 590 1

exp(coef(limpets.brm))
mean
spec 0.0000000
main 3.9513940
main:Shoresheltered 0.2637503
plyr:::adply(exp(coefs), 2 , function(x) {
})
1 b_spec 5.746643e-08 3.422828e-287 0.0000000 1.358137e-28
2 b_main 4.000980e+00 3.973522e+00 2.8246056 5.369929e+00
3 b_main.Shoresheltered 2.787194e-01 2.671325e-01 0.1166521 4.613737e-01
marginal_effects(limpets.brm)
Show code
SSres.brm <- apply((-1 *(sweep(lambda.brm,2 , limpets$Count,'-')))^2 ,1 ,sum)
SSres.null <- sum((limpets$Count - mean(limpets$Count))^2 )
#OR
[1] 0.2554819
#OR manually
Xmat <- model.matrix(~Shore, data=limpets)

SSres.null <- crossprod(limpets$Count - mean(limpets$Count))
[,1]
[1,] 0.2554819
## Alternatively..
R.var = apply(residuals(limpets.brm, summary=FALSE ),1 ,var)
X.var = apply(fitted(limpets.brm, summary=FALSE ),1 ,var)

Median lower upper
var1 0.2907571 0.1057379 0.4574874
Show code
coefs <-as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))

}))
ggplot(limpets, aes(y=Count, x=Shore)) +

scale_y_continuous('Abundance of limpets') +
scale_x_discrete('Shore type') +
theme_classic() +
© flutterbys.com.au, 2010 - 2015 Design by Murray Logan

Tutorial 106b - Poisson Regression and Log-Linear Models (Bayesian)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tutorial 106b - Poisson Regression and Log-Linear Models (Bayesian)

Uploaded by

Copyright:

Available Formats

Tutorial 10.

6b - Poisson regression and log-linear models (Bayesian)

Home Tutorials and Workshops Downloads Links

Tutorial 10.6b - Poisson regression and log-

1.1: Summary of important equations

Poisson Negative Binomial (p,r)

Density $f(y|\lambda)=\frac{\lambda^{y}e^{- $f(y|r,p)=\frac{\Gamma(y+r)}

Expected $E(Y)=\mu=\lambda$ $E(Y)=\mu=\frac{r(1-p)}{p}$

Variance $var(Y)=\lambda$ $var(Y)=\frac{r(1-p)}{p^2}$

log- $\mathcal{LL}(\lambda\mid y) = $\begin{align}\mathcal{LL}(r,p\mid y) =

Negative Binomial (a,b)

Density $f(y \mid a,b) = \frac{b^{a}y^{b-1}e^{-b y}}{\Gamma(a)}$

log- $\mathcal{LL}(\lambda;y)=\sum\limits^n_{i=1} log\Gamma(y_i + a) - log\Gamma(y_i+1) -

log- $\mathcal{LL}(\theta,\lambda\mid y) = \left\{ \begin{array}{l l} \sum\limits^n_{i=1} (1-\theta)\times

Residuals $\varepsilon_i = y_i - \hat{y}$

Pearson's Residuals $\frac{y_i - \hat{y}}{\sqrt{var(y)}}$

Residual Sums of Squares $RSS = \sum \varepsilon_i^2$

Dispersion statistic $\phi=\frac{RSS}{df}$

$R^2$ $R^2 = 1-\frac{RSS_{model}}{RSS_{null}}$

McFadden's $R^2$ $R_{McF}^2 = 1 - \frac{\mathcal{LL}_{model}}{\mathcal{LL}_{null}}$

Deviance $dev = -2*\mathcal{LL}$

DIC $DIC = -pD$

AIC $AIC = min(dev+(2pD))$

Section 2: Poisson regression

Likelihood: $$ \begin{align} \mathcal{L}(\lambda\mid y) &= \prod\limits_{i=1}^n \dfrac{\lambda^{y_i}e^{-\lambda}}

2.1: Scenario and Data

Details of data generation

Random data incorporating the following trends (effect parameters)

the sample size = 20

scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$

2.2: Exploratory data analysis and initial assumption checking

The assumptions are:

2.2.1: Confirm non-normality and explore clumping

2.2.2: Confirm linearity

the response ($y$) and the predictor ($x$)

#now for the scatterplot

2.2.3: Explore zero inflation

#proportion of 0's in the data

#proportion of 0's expected from a Poisson distribution

2.3: Model fitting or statistical analysis

Xmat <- model.matrix(~x, dat)

For each parameter, n.eff is a crude measure of effective sample size,

DIC info (using the rule, pD = var(deviance)/2)

Xmat <- model.matrix(~x, dat)

For each parameter, n.eff is a crude measure of effective sample size,

DIC info (using the rule, pD = var(deviance)/2)

Chain 1, Iteration: 1 / 2000 [ 0%] (Warmup)

SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration: 1 / 2000 [ 0%] (Warmup)

SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration: 1 / 2000 [ 0%] (Warmup)

2.4: Chain mixing and Model validation

View code associated with BRMS model

The chains appear well mixed and stable

3 chains, each with 1e+05 iterations (first 20000 discarded), n.thin = 50

For each parameter, n.eff is a crude measure of effective sample size,

DIC info (using the rule, pD = var(deviance)/2)

View code associated with BRMS model

Explore the step size characteristics (STAN only)

summary(do.call(rbind, args = get_sampler_params(dat.brm$fit, inc_warmup = FALSE )), digits = 2 )

Min. :0.24 Min. :0.64 Min. :1.0 Min. :1.0 Min. :0

stan_diag(dat.brm$fit, information = "stepsize")