Week 3, Solution: Exercise 4.1

02418 Week 3, solution
Exercise 4.1
In this exercise we study the number of boys in families of size 2 and 6.

Using the data
n2 <- c(42860,89213,47819)
n6 <- c(1096,6233,15700,22221,17332,7908,1579)
(N2 <- sum(n2))
## [1] 179892
(N6 <- sum(n6))
## [1] 72069
i.e. we have e.g. 42860 families of size 2 with 0 boys.
a)
With the assmuption of constant probability of observing boys and indepen-

dence the probability mass function is given by
pk = c(n, k)θ k (1 − θ )n−k (1)
where k is the number of observed boys in the family (eg 0, 1, or 2) and n is

size of the family (eg 2). The probability of observing nk boys is proportional
to
pnk k (2)
and hence the likelihood in each of the two cases is given by

n
L(θ ) = ∏ pk (θ )nk (3)
k=0
Hence the log-likelihood is

n
l(θ ) = ∑ nk log(pk (θ )) (4)
k=0
n
= ∑ nk (k log(θ ) + (n − k)log(1 − θ )) + c̃(nk , n, k) (5)
k=0
where n is the number of children in the family and nk is the number of
families with k boys.
The derivative is
n
∂l k n−k
= ∑ nk − (6)
∂ θ k=0 θ 1−θ
Setting the derivative equal zero we get

n
1
∑ nk (k(1 − θ ) − (n − k)θ ) = 0
θ (1 − θ ) k=0
(7)
or
n n n
0 = ∑ nk (k − nθ ) = ∑ nk k − nθ ∑ nk (8)
k=0 k=0 k=0
n
= ∑ nk k − nθ N (9)
k=0
and hence the MLE is

∑nk=0 nk k
θ̂ = (10)
nN
where N is the total number of families with the specific size (2, or 6).
Hence the MLE in each case can be calculated as
(theta1 <- sum(n2 * c(0:2)) / (N2 * 2))
## [1] 0.5137833
(theta2 <- sum(n6 * c(0:6)) / (N6 * 6))
## [1] 0.5148723
We can also formulate the likelihood function as an R-function
## Likelihood function
nll <- function(theta,nk){
n <- length(nk) - 1
-(sum(nk * ((0:n) * log(theta) + (n:0)* log(1-theta))))
}
## Optimize
optimize(nll,c(0.01,0.99),nk=n2)$minimum ## Family size = 2
2
## [1] 0.5137949
optimize(nll,c(0.01,0.99),nk=n6)$minimum ## Family size = 6
## [1] 0.5148818
which give (almost) the same number.
b)
The joint log-likelighood is simply the sum of the log-likelihoods, ie.
l(θ ) = l2 (θ ) + l6 (θ ) (11)
and we can find the derivative as

n m
∂l k n−k k m−k
= ∑ nk − + ∑ mk − (12)
∂ θ k=0 θ 1−θ k=0 θ 1−θ
where n amd m are used for families of size 2 and 6 respectively. Setting the
derivative equal zero we get
n m
0 = ∑ nk (k − nθ ) + ∑ mk (k − mθ ) (13)
k=0 k=0
! !
n m
= ∑ nk k − nN2 θ + ∑ mk k − mN6 θ (14)
k=0 k=0
or
∑nk=0 nk k + ∑m
k=0 mk k
θ̂ = (15)
nN2 + mN6
which give the estimate
(sum(n2 * (0:2)) + sum(n6 * (0:6))) / (2 * N2 + 6 * N6)
## [1] 0.5143777
we coluld also implement in R
## joint likelihood
jnll <- function(theta,n1,n2){
nll(theta,n1) + nll(theta,n2)
3
}
## MLE
opt <- optimize(jnll, c(0.01,0.99), n1 = n2, n2 = n6)
(theta.hat <- opt$minimum)
## [1] 0.5143882
Hence a small rounding error in the numrical optimizer.

We could calculate the formula for the information, but here we will just
find it by numnerical optimization
library(numDeriv)
H <- hessian(jnll,opt$minimum, n1 = n2, n2 = n6) ## Negative Hessian

(sd.theta <- as.numeric(sqrt(1/H))) ## standard error
## [1] 0.00056153
We can now plot the likelihood
theta <- seq(theta.hat- 4 * sd.theta, theta.hat + 4 * sd.theta, length=100)

jointnll <- sapply(theta, jnll, n1 = n2, n2 = n6)
plot(theta,exp(-jointnll+opt$objective),type="l")
## Compare with estimates based on the two datasets
lines(theta1 * c(1,1), c(0,1), col = 2)
lines(theta2 * c(1,1), c(0,1), col = 2)
## cut off to define profile likellihood CI
lines(range(theta),exp(-qchisq(0.95,df=1)/2)*c(1,1),lty=2,col=2)
## Compare with wald CI
rug(theta.hat+1.96*sd.theta*c(-1,1),col=3,lwd=2)
4
1.0
0.8
exp(−jointnll + opt$objective)
0.6
0.4
0.2
0.0
0.512 0.513 0.514 0.515 0.516
theta
c)
In order to answer the question we need the goodness of fit we calculate the
expected number of observations under the model
e2 <- dbinom(0:2,size = 2, prob = theta.hat) * N2

e6 <- dbinom(0:6,size = 6, prob = theta.hat) * N6
and find the normalized residuals
r2 <- (size2 - e2)/sqrt(e2)

r6 <- (size6 - e6)/sqrt(e6)
The goodness of fit statistics is the sum of squared residuals
5
chisq.stat <- sum(r2^2) + sum(r6^2)
chisq.stat
## [1] 123.633
which should be compared with a χ 2 distribution with degrees of freedom

equal the number of observations (10) minus the number of parameters (1),
i.e. we can calculate the p-value by
pchisq(chisq.stat,df=10-1,lower.tail=FALSE)
## [1] 2.407376e-22
hence this is significant on any reasonable level.

In order to examine where the problem occur we can make tables like the
one given on page 76 of the textbook
round(rbind(n2,e2,r2),digits=1)
## [,1] [,2] [,3]

## n2 42860.0 89213.0 47819.0
## e2 42421.9 89871.5 47598.6
## r2 2.1 -2.2 1.0
round(rbind(n6,e6,r6),digits=1)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]

## n6 1096.0 6233.0 15700.0 22221.0 17332.0 7908.0 1579.0
## e6 945.1 6006.7 15906.7 22465.7 17847.7 7562.1 1335.0
## r6 4.9 2.9 -1.6 -1.6 -3.9 4.0 6.7
We see that we have too many observations in the tail of the distribution
(in particular for families of size 6). One explanation could be that the
probability is not constant between families.
Exercise 4.12
Folllowing the arguments above we can implement the likelihood as
6
## negative log likelihood
nll <- function(theta,nk){
k <- 0:(length(nk)-1)
- sum(nk * dpois(k, lambda = theta, log = TRUE))
}
## Estimate
nk <- c(109,65,22,3,1,0)
opt <- optimize(nll,c(0,100),nk=nk)
opt$minimum
## [1] 0.6099907
## Expected number
e <- c(dpois(0:3,lambda = opt$minimum), 1 - ppois(3,lambda = opt$minimum)) *
sum(nk)
e
## [1] 108.6711840 66.2884121 20.2176576 4.1108611 0.7118853
## Collapse two last (expected colse to 5)

e <- c(e[1:3],sum(e[4:5]))
obs <- c(nk[1:3], sum(nk[4:5]))
## chi.sq statistics
sum((e-obs)^2/e)
## [1] 0.3235224
## Compared with chisq with 3 df, i.e. cannot reject that data follow
## a poisson dist.
Exercise 4.26
We have
Xi ∼ N(µ, σx2 ) (16)

Yi ∼ N(µ + δ , σy2 ) (17)
(18)
we can read the data and make a simple t-test by
7
y <- as.vector(t(read.table("../IALscript/LKPACK/michel.dat")))
x <- y[1:20]
y <- y[-c(1:20)]
t.test(x,y,var.equal=TRUE)
##
## Two Sample t-test
##
## data: x and y
## t = 3.8197, df = 98, p-value = 0.0002344
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 33.99336 107.50664
## sample estimates:
## mean of x mean of y
## 909.00 838.25
a)
Compute the profile likelihood for δ assuming σx2 = σy2 = σ 2 . We have θ =

[µ, δ , σ 2 ]
n 1 n m 1 m
l(θ ) = − log(σ 2 ) − 2 ∑ (xi − µ)2 − log(σ 2 ) − 2 ∑ (yi − µ − δ )2
2 2σ i=1 2 2σ i=1
(19)
n+m 1 n 1 m
=− log(σ 2 ) − 2 ∑ (xi − µ)2 − 2 ∑ (yi − µ − δ )2 (20)
2 2σ i=1 2σ i=1
(21)
i.e
∂ l(θ ) 1 n 1 m
= 2 ∑ (xi − µ) + 2 ∑ (yi − µ − δ ) (22)
∂µ σ i=1 σ i=1
1
= 2 (n(x̄ − µ) + mȳ − m(µ + δ )) (23)
σ
1
= 2 (nx̄ − µ(n + m) + mȳ − mδ ) = 0 (24)
σ
giving
nx̄ + mȳ − mδ
µ̂(δ ) = (25)
n+m
8
solving for σ 2 gives
∂ l(θ ) n+m 1 n 2 1 m
∂σ2
= −
2σ 2
+ ∑
2σ 4 i=1
(x i − µ) + ∑ (yi − µ − δ )2
2σ 2 i=1
(26)
!
1 1 n 1 m
= 2 −(n + m) + 2 ∑ (xi − µ)2 + 2 ∑ (yi − µ − δ )2 (27)
2σ σ i=1 σ i=1
giving
∑ni=1 (xi − µ)2 + ∑m
i=1 (yi − µ − δ )
2
σ̂ 2 (δ , µ) = (28)
n+m
and
∑ni=1 (xi − µ̂(δ ))2 + ∑m
i=1 (yi − µ̂(δ ) − δ )
2
σ̂ 2 (δ ) = (29)
n+m
and we can write the profile likelihood as
lP (δ ) = l([δ , µ̂(δ ), σ̂ 2 (δ )]) (30)
The above is implemented and plotted below
lp.delta <- function(delta,x,y){

n <- length(x)
m <- length(y)
mu.hat <- (n * mean(x) + m * (mean(y) - delta)) / (m + n)
sigmasq.hat <- ( sum((x - mu.hat)^2) +
sum((y - mu.hat - delta)^2)) / (m + n)
sum(dnorm(x, mean = mu.hat, sd = sqrt(sigmasq.hat),
log =TRUE)) +
sum(dnorm(y, mean = mu.hat + delta,
sd = sqrt(sigmasq.hat), log =TRUE))
}
delta <- seq(-140,0,by=0.1)

lp.a <- sapply(delta, lp.delta, x = x, y = y)
lp.a <- lp.a -max(lp.a) ## normalise
## plot
par(mfrow=c(1,2))
plot(delta,lp.a,type="l")
lines(range(delta),-qchisq(0.95,df=1)/2*c(1,1),col=2,lwd=2,lty=2)
plot(delta,exp(lp.a),type="l")
lines(range(delta),exp(-qchisq(0.95,df=1)/2)*c(1,1),col=2,lwd=2,lty=2)
## Compare with the t-test
rug(-as.numeric(t.test(x,y,var.equal=TRUE)$conf.int),lwd=2)
9
1.0
0
−1
0.8
−2
0.6
−3
exp(lp.a)
lp.a
−4
0.4
−5
0.2
−6
0.0
−7
−140 −100 −60 −20 −140 −100 −60 −20
delta delta
0.0.1 b)
Compute the profile likelihood for δ without further assumptions

n 1 n m 1 m
l(θ ) = − log(σx2 ) − 2 ∑ (xi − µ)2 − log(σy2 ) − 2 ∑ (yi − µ − δ )2
2 2σx i=1 2 2σy i=1
(31)
hence
∂ l(θ ) 1 n 1 m
= 2 ∑ (xi − µ) + 2 ∑ (yi − µ − δ ) (32)
∂µ 2σx i=1 2σy i=1
1 1
= 2 (nx̄ − nµ) + 2 (mȳ − mµ − mδ ) (33)
2σx 2σy
!
n m n m
=−µ 2
+ 2 + 2 x̄ + 2 (ȳ − δ ) (34)
2σx 2σy 2σx 2σy
10
and we have
n
σx2
x̄ + σm2 (ȳ − δ )
y
µ̂(δ ) = n (35)
σx2
+ σm2
y
nσy2 x̄ + mσx2 (ȳ − δ )

= (36)
nσy2 + mσx2
Hence the solution depends on the parameters σx2 and σy2 , and we have to
depend on the numerical solution. The varaince parameter can however
easily be written as a function of µ and δ by
1 n
σ̂x2 (µ, δ ) = ∑ (xi − µ)2 (37)
n i=1
1 m
σ̂y2 (µ, δ ) = ∑ (yi − µ − δ )2 (38)
m i=1
and the profile likelihood can be written as
l p (δ ) = arg max l([δ , µ, σ̂x2 (µ, δ ), σ̂y2 (µ, δ )]) (39)

µ
i.e a one dimensional optimization.

The above is implemented and plotted below (including comparing with the
answer in part a)
lp.delta <- function(delta,x,y){

n <- length(x)
m <- length(y)
fun.tmp <- function(mu,delta,x,y){
sigmasq.hat.x <- sum((x - mu)^2) / n
sigmasq.hat.y <- sum((y - mu - delta)^2) / m
sum(dnorm(x, mean = mu, sd = sqrt(sigmasq.hat.x),
log =TRUE)) +
sum(dnorm(y, mean = mu + delta,
sd = sqrt(sigmasq.hat.y), log =TRUE))
}
optimize(fun.tmp, lower = -1000, upper = 1000, delta = delta,
x = x, y = y, maximum=TRUE)$objective
}
delta <- seq(-140,0,by=0.1)

lp.b <- sapply(delta, lp.delta, x = x, y = y)
lp.b <- lp.b -max(lp.b) ## normalise
11
## Plot
par(mfrow=c(1,2))
plot(delta,lp.b,type="l")
lines(delta,lp.a,col=2)
plot(delta,exp(lp.b),type="l")
lines(delta,exp(lp.a),col=2)
lines(range(delta),exp(-qchisq(0.95,df=1)/2)*c(1,1),col=2,lwd=2,lty=2)
## Compare with the t-test
rug(-as.numeric(t.test(x,y)$conf.int),lwd=2)
rug(-as.numeric(t.test(x,y,var.equal=TRUE)$conf.int),lwd=2,col=2)
1.0
0
0.8
−1
0.6
exp(lp.b)
lp.b
−2
0.4
−3
0.2
0.0
−140 −100 −60 −20 −140 −100 −60 −20
delta delta
12
c)
σx2
Set β = σy2
, we have
n m
n 1 m
l(θ ) = − log(β σy2 ) − ∑ (x i − µ)2
− log(σy
2
) − f rac12σy
2
∑ (yi − µ − δ )2
2 2β σy2 i=1 2 i=1
(40)
From above we have
nσy2 x̄ + mβ σy2 (ȳ − δ )
µ̂(δ ) = (41)
nσy2 + mβ σy2
nx̄ + mβ (ȳ − δ )
= (42)
n + mβ
Further we have
∂ l(θ ) 1 m
= 2 ∑ (yi − µ − δ ) (43)
∂δ 2σy i=1
1
= 2 (mȳ − mµ − mδ ) (44)
2σy
hence δ̂ (µ) = ȳ − µ, and inserting in the above we get

nx̄ + mβ (ȳ − ȳ + µ̂)
µ̂ = (45)
n + mβ
nx̄ + mβ µ
= (46)
n + mβ
and hence µ = x̄ and δ = ȳ − x̄
n
∂ l(θ ) n 1 2 m 1 m
∂ σy2
= − + ∑ i
2σy2 2β σy4 i=1
(x − µ) − + ∑ (yi − µ − δ )2
2σy2 2σy4 i=1
(47)
!
1 1 n 2 1 m 2
= 2 −n + ∑ (xi − µ) − m + σy2 ∑ (yi − µ − δ ) (48)
2σy β σy2 i=1 i=1
inserting µ̂ and δ̂ we get

!
∂ l(θ ) 1 1 n 2 1 m
∂ σy2
= 2
2σy
−(n + m) + ∑
β σy2 i=1
(x i − x̄) − m + ∑ (yi − ȳ)2
σy2 i=1
(49)
=0 (50)
with the solution
!
n m
1 1
σ̂y2 (β ) = ∑ (xi − x̄)2 2
+ ∑ (yi − ȳ) (51)
n+m β i=1 i=1
The above is implemented below
13
lp.beta <- function(beta, x, y){
n <- length(x)
m <- length(y)
delta.hat <- mean(y) - mean(x)
mu.hat <- mean(x)
sigmasq.hat.y <- (sum((x - mu.hat)^2) / beta +
sum((y - mu.hat - delta.hat)^2))/ (n + m)
sum(dnorm(x, mean = mu.hat, sd = sqrt(sigmasq.hat.y * beta),
log =TRUE)) +
sum(dnorm(y, mean = mu.hat + delta.hat,
sd = sqrt(sigmasq.hat.y), log =TRUE))
}
beta <- seq(1,7,by=0.001)

lp.c <- sapply(beta, lp.beta, x = x, y = y)
lp.c <- lp.c -max(lp.c) ## normalise
par(mfrow=c(1,2))
plot(beta,lp.c,type="l")
plot(beta,exp(lp.c),type="l")
lines(range(beta),exp(-qchisq(0.95,df=1)/2)*c(1,1),
col=2,lwd=2,lty=2)
## Compare with observed value
lines(var(x)/var(y)*c(1,1),c(0,1),col=4)
14
1.0
0
0.8
−1
0.6
exp(lp.c)
−2
lp.c
0.4
−3
0.2
−4
0.0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
beta beta
Exercise 4.30
Show that Eθ [X] = A0 (θ ) and Vθ [X] = A00 (θ ). Use Eq. (4.3)

pθ = eθ x−A(θ )+c(x) (52)
Since
Z
pθ (x)dx = 1 (53)
we also have
Z
∂
pθ (x)dx = 0 (54)
∂θ
and when the function is weel behaved we can write
Z Z Z
∂ ∂
x − A0 (θ ) pθ (x)dx

pθ (x)dx = pθ (x)dx = (55)
∂θ ∂θ
15
Hence we have
Z Z Z
− A0 (θ )pθ (x)dx + xpθ (x)dx = − A0 (θ ) pθ (x)dx + Eθ [X] (56)
= − A0 (θ ) + Eθ [X] = 0 (57)
which show the first part.

For the variance we can write
∂2
Z Z
∂
x − A0 (θ ) pθ (x)dx

pθ (x)dx = (58)
∂θ2 Z ∂θ Z 2
−A00 (θ ) pθ (x)dx + x − A0 (θ ) pθ (x)dx

= (59)
Z Z
= − A00 (θ ) pθ (x)dx + (x − Eθ [X])2 pθ (x)dx (60)
= − A00 (θ ) +Vθ [X] = 0 (61)
and we are done.
16

Week 3, Solution: Exercise 4.1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 3, Solution: Exercise 4.1

Uploaded by

Copyright:

Available Formats

02418 Week 3, solution

In this exercise we study the number of boys in families of size 2 and 6.

(N6 <- sum(n6))

i.e. we have e.g. 42860 families of size 2 with 0 boys.

With the assmuption of constant probability of observing boys and indepen-

pk = c(n, k)θ k (1 − θ )n−k (1)

where k is the number of observed boys in the family (eg 0, 1, or 2) and n is

and hence the likelihood in each of the two cases is given by

Hence the log-likelihood is

Setting the derivative equal zero we get

and hence the MLE is

(theta1 <- sum(n2 * c(0:2)) / (N2 * 2))

(theta2 <- sum(n6 * c(0:6)) / (N6 * 6))

We can also formulate the likelihood function as an R-function

optimize(nll,c(0.01,0.99),nk=n6)$minimum ## Family size = 6

which give (almost) the same number.

The joint log-likelighood is simply the sum of the log-likelihoods, ie.

and we can find the derivative as

which give the estimate

(sum(n2 * (0:2)) + sum(n6 * (0:6))) / (2 * N2 + 6 * N6)

we coluld also implement in R

Hence a small rounding error in the numrical optimizer.

H <- hessian(jnll,opt$minimum, n1 = n2, n2 = n6) ## Negative Hessian

We can now plot the likelihood

theta <- seq(theta.hat- 4 * sd.theta, theta.hat + 4 * sd.theta, length=100)

0.512 0.513 0.514 0.515 0.516

e2 <- dbinom(0:2,size = 2, prob = theta.hat) * N2

and find the normalized residuals

r2 <- (size2 - e2)/sqrt(e2)

The goodness of fit statistics is the sum of squared residuals

which should be compared with a χ 2 distribution with degrees of freedom

hence this is significant on any reasonable level.

## [,1] [,2] [,3]

## [,1] [,2] [,3] [,4] [,5] [,6] [,7]

Folllowing the arguments above we can implement the likelihood as

## [1] 108.6711840 66.2884121 20.2176576 4.1108611 0.7118853

## Collapse two last (expected colse to 5)

Xi ∼ N(µ, σx2 ) (16)

we can read the data and make a simple t-test by

Compute the profile likelihood for δ assuming σx2 = σy2 = σ 2 . We have θ =

lp.delta <- function(delta,x,y){

delta <- seq(-140,0,by=0.1)

−140 −100 −60 −20 −140 −100 −60 −20

Compute the profile likelihood for δ without further assumptions

nσy2 x̄ + mσx2 (ȳ − δ )

and the profile likelihood can be written as

l p (δ ) = arg max l([δ , µ, σ̂x2 (µ, δ ), σ̂y2 (µ, δ )]) (39)

i.e a one dimensional optimization.

lp.delta <- function(delta,x,y){

delta <- seq(-140,0,by=0.1)

−140 −100 −60 −20 −140 −100 −60 −20

hence δ̂ (µ) = ȳ − µ, and inserting in the above we get

inserting µ̂ and δ̂ we get

The above is implemented below

beta <- seq(1,7,by=0.001)

Show that Eθ [X] = A0 (θ ) and Vθ [X] = A00 (θ ). Use Eq. (4.3)

which show the first part.

and we are done.

You might also like