Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

02418 Week 3, solution

Exercise 4.1

In this exercise we study the number of boys in families of size 2 and 6.


Using the data

n2 <- c(42860,89213,47819)
n6 <- c(1096,6233,15700,22221,17332,7908,1579)
(N2 <- sum(n2))

## [1] 179892

(N6 <- sum(n6))

## [1] 72069

i.e. we have e.g. 42860 families of size 2 with 0 boys.

a)

With the assmuption of constant probability of observing boys and indepen-


dence the probability mass function is given by

pk = c(n, k)θ k (1 − θ )n−k (1)

where k is the number of observed boys in the family (eg 0, 1, or 2) and n is


size of the family (eg 2). The probability of observing nk boys is proportional
to

pnk k (2)

and hence the likelihood in each of the two cases is given by


n
L(θ ) = ∏ pk (θ )nk (3)
k=0

Hence the log-likelihood is


n
l(θ ) = ∑ nk log(pk (θ )) (4)
k=0
n
= ∑ nk (k log(θ ) + (n − k)log(1 − θ )) + c̃(nk , n, k) (5)
k=0
where n is the number of children in the family and nk is the number of
families with k boys.
The derivative is
n  
∂l k n−k
= ∑ nk − (6)
∂ θ k=0 θ 1−θ

Setting the derivative equal zero we get


n
1
∑ nk (k(1 − θ ) − (n − k)θ ) = 0
θ (1 − θ ) k=0
(7)

or
n n n
0 = ∑ nk (k − nθ ) = ∑ nk k − nθ ∑ nk (8)
k=0 k=0 k=0
n
= ∑ nk k − nθ N (9)
k=0

and hence the MLE is


∑nk=0 nk k
θ̂ = (10)
nN
where N is the total number of families with the specific size (2, or 6).
Hence the MLE in each case can be calculated as

(theta1 <- sum(n2 * c(0:2)) / (N2 * 2))

## [1] 0.5137833

(theta2 <- sum(n6 * c(0:6)) / (N6 * 6))

## [1] 0.5148723

We can also formulate the likelihood function as an R-function

## Likelihood function
nll <- function(theta,nk){
n <- length(nk) - 1
-(sum(nk * ((0:n) * log(theta) + (n:0)* log(1-theta))))
}

## Optimize
optimize(nll,c(0.01,0.99),nk=n2)$minimum ## Family size = 2

2
## [1] 0.5137949

optimize(nll,c(0.01,0.99),nk=n6)$minimum ## Family size = 6

## [1] 0.5148818

which give (almost) the same number.

b)

The joint log-likelighood is simply the sum of the log-likelihoods, ie.

l(θ ) = l2 (θ ) + l6 (θ ) (11)

and we can find the derivative as


n   m  
∂l k n−k k m−k
= ∑ nk − + ∑ mk − (12)
∂ θ k=0 θ 1−θ k=0 θ 1−θ

where n amd m are used for families of size 2 and 6 respectively. Setting the
derivative equal zero we get
n m
0 = ∑ nk (k − nθ ) + ∑ mk (k − mθ ) (13)
k=0 k=0
! !
n m
= ∑ nk k − nN2 θ + ∑ mk k − mN6 θ (14)
k=0 k=0

or
∑nk=0 nk k + ∑m
k=0 mk k
θ̂ = (15)
nN2 + mN6

which give the estimate

(sum(n2 * (0:2)) + sum(n6 * (0:6))) / (2 * N2 + 6 * N6)

## [1] 0.5143777

we coluld also implement in R

## joint likelihood
jnll <- function(theta,n1,n2){
nll(theta,n1) + nll(theta,n2)

3
}

## MLE
opt <- optimize(jnll, c(0.01,0.99), n1 = n2, n2 = n6)
(theta.hat <- opt$minimum)

## [1] 0.5143882

Hence a small rounding error in the numrical optimizer.


We could calculate the formula for the information, but here we will just
find it by numnerical optimization

library(numDeriv)

H <- hessian(jnll,opt$minimum, n1 = n2, n2 = n6) ## Negative Hessian


(sd.theta <- as.numeric(sqrt(1/H))) ## standard error

## [1] 0.00056153

We can now plot the likelihood

theta <- seq(theta.hat- 4 * sd.theta, theta.hat + 4 * sd.theta, length=100)


jointnll <- sapply(theta, jnll, n1 = n2, n2 = n6)

plot(theta,exp(-jointnll+opt$objective),type="l")
## Compare with estimates based on the two datasets
lines(theta1 * c(1,1), c(0,1), col = 2)
lines(theta2 * c(1,1), c(0,1), col = 2)
## cut off to define profile likellihood CI
lines(range(theta),exp(-qchisq(0.95,df=1)/2)*c(1,1),lty=2,col=2)
## Compare with wald CI
rug(theta.hat+1.96*sd.theta*c(-1,1),col=3,lwd=2)

4
1.0
0.8
exp(−jointnll + opt$objective)

0.6
0.4
0.2
0.0

0.512 0.513 0.514 0.515 0.516

theta

c)

In order to answer the question we need the goodness of fit we calculate the
expected number of observations under the model

e2 <- dbinom(0:2,size = 2, prob = theta.hat) * N2


e6 <- dbinom(0:6,size = 6, prob = theta.hat) * N6

and find the normalized residuals

r2 <- (size2 - e2)/sqrt(e2)


r6 <- (size6 - e6)/sqrt(e6)

The goodness of fit statistics is the sum of squared residuals

5
chisq.stat <- sum(r2^2) + sum(r6^2)
chisq.stat

## [1] 123.633

which should be compared with a χ 2 distribution with degrees of freedom


equal the number of observations (10) minus the number of parameters (1),
i.e. we can calculate the p-value by

pchisq(chisq.stat,df=10-1,lower.tail=FALSE)

## [1] 2.407376e-22

hence this is significant on any reasonable level.


In order to examine where the problem occur we can make tables like the
one given on page 76 of the textbook

round(rbind(n2,e2,r2),digits=1)

## [,1] [,2] [,3]


## n2 42860.0 89213.0 47819.0
## e2 42421.9 89871.5 47598.6
## r2 2.1 -2.2 1.0

round(rbind(n6,e6,r6),digits=1)

## [,1] [,2] [,3] [,4] [,5] [,6] [,7]


## n6 1096.0 6233.0 15700.0 22221.0 17332.0 7908.0 1579.0
## e6 945.1 6006.7 15906.7 22465.7 17847.7 7562.1 1335.0
## r6 4.9 2.9 -1.6 -1.6 -3.9 4.0 6.7

We see that we have too many observations in the tail of the distribution
(in particular for families of size 6). One explanation could be that the
probability is not constant between families.

Exercise 4.12

Folllowing the arguments above we can implement the likelihood as

6
## negative log likelihood
nll <- function(theta,nk){
k <- 0:(length(nk)-1)
- sum(nk * dpois(k, lambda = theta, log = TRUE))
}

## Estimate
nk <- c(109,65,22,3,1,0)
opt <- optimize(nll,c(0,100),nk=nk)
opt$minimum

## [1] 0.6099907

## Expected number
e <- c(dpois(0:3,lambda = opt$minimum), 1 - ppois(3,lambda = opt$minimum)) *
sum(nk)
e

## [1] 108.6711840 66.2884121 20.2176576 4.1108611 0.7118853

## Collapse two last (expected colse to 5)


e <- c(e[1:3],sum(e[4:5]))
obs <- c(nk[1:3], sum(nk[4:5]))

## chi.sq statistics
sum((e-obs)^2/e)

## [1] 0.3235224

## Compared with chisq with 3 df, i.e. cannot reject that data follow
## a poisson dist.

Exercise 4.26

We have

Xi ∼ N(µ, σx2 ) (16)


Yi ∼ N(µ + δ , σy2 ) (17)
(18)

we can read the data and make a simple t-test by

7
y <- as.vector(t(read.table("../IALscript/LKPACK/michel.dat")))
x <- y[1:20]
y <- y[-c(1:20)]
t.test(x,y,var.equal=TRUE)

##
## Two Sample t-test
##
## data: x and y
## t = 3.8197, df = 98, p-value = 0.0002344
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 33.99336 107.50664
## sample estimates:
## mean of x mean of y
## 909.00 838.25

a)

Compute the profile likelihood for δ assuming σx2 = σy2 = σ 2 . We have θ =


[µ, δ , σ 2 ]

n 1 n m 1 m
l(θ ) = − log(σ 2 ) − 2 ∑ (xi − µ)2 − log(σ 2 ) − 2 ∑ (yi − µ − δ )2
2 2σ i=1 2 2σ i=1
(19)
n+m 1 n 1 m
=− log(σ 2 ) − 2 ∑ (xi − µ)2 − 2 ∑ (yi − µ − δ )2 (20)
2 2σ i=1 2σ i=1
(21)

i.e
∂ l(θ ) 1 n 1 m
= 2 ∑ (xi − µ) + 2 ∑ (yi − µ − δ ) (22)
∂µ σ i=1 σ i=1
1
= 2 (n(x̄ − µ) + mȳ − m(µ + δ )) (23)
σ
1
= 2 (nx̄ − µ(n + m) + mȳ − mδ ) = 0 (24)
σ
giving

nx̄ + mȳ − mδ
µ̂(δ ) = (25)
n+m

8
solving for σ 2 gives
∂ l(θ ) n+m 1 n 2 1 m
∂σ2
= −
2σ 2
+ ∑
2σ 4 i=1
(x i − µ) + ∑ (yi − µ − δ )2
2σ 2 i=1
(26)
!
1 1 n 1 m
= 2 −(n + m) + 2 ∑ (xi − µ)2 + 2 ∑ (yi − µ − δ )2 (27)
2σ σ i=1 σ i=1

giving
∑ni=1 (xi − µ)2 + ∑m
i=1 (yi − µ − δ )
2
σ̂ 2 (δ , µ) = (28)
n+m
and
∑ni=1 (xi − µ̂(δ ))2 + ∑m
i=1 (yi − µ̂(δ ) − δ )
2
σ̂ 2 (δ ) = (29)
n+m
and we can write the profile likelihood as
lP (δ ) = l([δ , µ̂(δ ), σ̂ 2 (δ )]) (30)
The above is implemented and plotted below

lp.delta <- function(delta,x,y){


n <- length(x)
m <- length(y)
mu.hat <- (n * mean(x) + m * (mean(y) - delta)) / (m + n)
sigmasq.hat <- ( sum((x - mu.hat)^2) +
sum((y - mu.hat - delta)^2)) / (m + n)
sum(dnorm(x, mean = mu.hat, sd = sqrt(sigmasq.hat),
log =TRUE)) +
sum(dnorm(y, mean = mu.hat + delta,
sd = sqrt(sigmasq.hat), log =TRUE))
}

delta <- seq(-140,0,by=0.1)


lp.a <- sapply(delta, lp.delta, x = x, y = y)
lp.a <- lp.a -max(lp.a) ## normalise

## plot
par(mfrow=c(1,2))
plot(delta,lp.a,type="l")
lines(range(delta),-qchisq(0.95,df=1)/2*c(1,1),col=2,lwd=2,lty=2)
plot(delta,exp(lp.a),type="l")
lines(range(delta),exp(-qchisq(0.95,df=1)/2)*c(1,1),col=2,lwd=2,lty=2)
## Compare with the t-test
rug(-as.numeric(t.test(x,y,var.equal=TRUE)$conf.int),lwd=2)

9
1.0
0
−1

0.8
−2

0.6
−3

exp(lp.a)
lp.a

−4

0.4
−5

0.2
−6

0.0
−7

−140 −100 −60 −20 −140 −100 −60 −20

delta delta

0.0.1 b)

Compute the profile likelihood for δ without further assumptions


n 1 n m 1 m
l(θ ) = − log(σx2 ) − 2 ∑ (xi − µ)2 − log(σy2 ) − 2 ∑ (yi − µ − δ )2
2 2σx i=1 2 2σy i=1
(31)
hence
∂ l(θ ) 1 n 1 m
= 2 ∑ (xi − µ) + 2 ∑ (yi − µ − δ ) (32)
∂µ 2σx i=1 2σy i=1
1 1
= 2 (nx̄ − nµ) + 2 (mȳ − mµ − mδ ) (33)
2σx 2σy
!
n m n m
=−µ 2
+ 2 + 2 x̄ + 2 (ȳ − δ ) (34)
2σx 2σy 2σx 2σy

10
and we have
n
σx2
x̄ + σm2 (ȳ − δ )
y
µ̂(δ ) = n (35)
σx2
+ σm2
y

nσy2 x̄ + mσx2 (ȳ − δ )


= (36)
nσy2 + mσx2

Hence the solution depends on the parameters σx2 and σy2 , and we have to
depend on the numerical solution. The varaince parameter can however
easily be written as a function of µ and δ by

1 n
σ̂x2 (µ, δ ) = ∑ (xi − µ)2 (37)
n i=1
1 m
σ̂y2 (µ, δ ) = ∑ (yi − µ − δ )2 (38)
m i=1

and the profile likelihood can be written as

l p (δ ) = arg max l([δ , µ, σ̂x2 (µ, δ ), σ̂y2 (µ, δ )]) (39)


µ

i.e a one dimensional optimization.


The above is implemented and plotted below (including comparing with the
answer in part a)

lp.delta <- function(delta,x,y){


n <- length(x)
m <- length(y)
fun.tmp <- function(mu,delta,x,y){
sigmasq.hat.x <- sum((x - mu)^2) / n
sigmasq.hat.y <- sum((y - mu - delta)^2) / m
sum(dnorm(x, mean = mu, sd = sqrt(sigmasq.hat.x),
log =TRUE)) +
sum(dnorm(y, mean = mu + delta,
sd = sqrt(sigmasq.hat.y), log =TRUE))
}
optimize(fun.tmp, lower = -1000, upper = 1000, delta = delta,
x = x, y = y, maximum=TRUE)$objective
}

delta <- seq(-140,0,by=0.1)


lp.b <- sapply(delta, lp.delta, x = x, y = y)
lp.b <- lp.b -max(lp.b) ## normalise

11
## Plot
par(mfrow=c(1,2))
plot(delta,lp.b,type="l")
lines(delta,lp.a,col=2)
plot(delta,exp(lp.b),type="l")
lines(delta,exp(lp.a),col=2)
lines(range(delta),exp(-qchisq(0.95,df=1)/2)*c(1,1),col=2,lwd=2,lty=2)
## Compare with the t-test
rug(-as.numeric(t.test(x,y)$conf.int),lwd=2)
rug(-as.numeric(t.test(x,y,var.equal=TRUE)$conf.int),lwd=2,col=2)

1.0
0

0.8
−1

0.6
exp(lp.b)
lp.b

−2

0.4
−3

0.2
0.0

−140 −100 −60 −20 −140 −100 −60 −20

delta delta

12
c)
σx2
Set β = σy2
, we have
n m
n 1 m
l(θ ) = − log(β σy2 ) − ∑ (x i − µ)2
− log(σy
2
) − f rac12σy
2
∑ (yi − µ − δ )2
2 2β σy2 i=1 2 i=1
(40)
From above we have
nσy2 x̄ + mβ σy2 (ȳ − δ )
µ̂(δ ) = (41)
nσy2 + mβ σy2
nx̄ + mβ (ȳ − δ )
= (42)
n + mβ
Further we have
∂ l(θ ) 1 m
= 2 ∑ (yi − µ − δ ) (43)
∂δ 2σy i=1
1
= 2 (mȳ − mµ − mδ ) (44)
2σy

hence δ̂ (µ) = ȳ − µ, and inserting in the above we get


nx̄ + mβ (ȳ − ȳ + µ̂)
µ̂ = (45)
n + mβ
nx̄ + mβ µ
= (46)
n + mβ
and hence µ = x̄ and δ = ȳ − x̄
n
∂ l(θ ) n 1 2 m 1 m
∂ σy2
= − + ∑ i
2σy2 2β σy4 i=1
(x − µ) − + ∑ (yi − µ − δ )2
2σy2 2σy4 i=1
(47)
!
1 1 n 2 1 m 2
= 2 −n + ∑ (xi − µ) − m + σy2 ∑ (yi − µ − δ ) (48)
2σy β σy2 i=1 i=1

inserting µ̂ and δ̂ we get


!
∂ l(θ ) 1 1 n 2 1 m
∂ σy2
= 2
2σy
−(n + m) + ∑
β σy2 i=1
(x i − x̄) − m + ∑ (yi − ȳ)2
σy2 i=1
(49)

=0 (50)
with the solution
!
n m
1 1
σ̂y2 (β ) = ∑ (xi − x̄)2 2
+ ∑ (yi − ȳ) (51)
n+m β i=1 i=1

The above is implemented below

13
lp.beta <- function(beta, x, y){
n <- length(x)
m <- length(y)
delta.hat <- mean(y) - mean(x)
mu.hat <- mean(x)
sigmasq.hat.y <- (sum((x - mu.hat)^2) / beta +
sum((y - mu.hat - delta.hat)^2))/ (n + m)
sum(dnorm(x, mean = mu.hat, sd = sqrt(sigmasq.hat.y * beta),
log =TRUE)) +
sum(dnorm(y, mean = mu.hat + delta.hat,
sd = sqrt(sigmasq.hat.y), log =TRUE))
}

beta <- seq(1,7,by=0.001)


lp.c <- sapply(beta, lp.beta, x = x, y = y)
lp.c <- lp.c -max(lp.c) ## normalise

par(mfrow=c(1,2))
plot(beta,lp.c,type="l")
plot(beta,exp(lp.c),type="l")
lines(range(beta),exp(-qchisq(0.95,df=1)/2)*c(1,1),
col=2,lwd=2,lty=2)
## Compare with observed value
lines(var(x)/var(y)*c(1,1),c(0,1),col=4)

14
1.0
0

0.8
−1

0.6
exp(lp.c)
−2
lp.c

0.4
−3

0.2
−4

0.0

1 2 3 4 5 6 7 1 2 3 4 5 6 7

beta beta

Exercise 4.30

Show that Eθ [X] = A0 (θ ) and Vθ [X] = A00 (θ ). Use Eq. (4.3)


pθ = eθ x−A(θ )+c(x) (52)
Since
Z
pθ (x)dx = 1 (53)

we also have
Z

pθ (x)dx = 0 (54)
∂θ
and when the function is weel behaved we can write
Z Z Z
∂ ∂
x − A0 (θ ) pθ (x)dx

pθ (x)dx = pθ (x)dx = (55)
∂θ ∂θ

15
Hence we have
Z Z Z
− A0 (θ )pθ (x)dx + xpθ (x)dx = − A0 (θ ) pθ (x)dx + Eθ [X] (56)
= − A0 (θ ) + Eθ [X] = 0 (57)

which show the first part.


For the variance we can write
∂2
Z Z

x − A0 (θ ) pθ (x)dx

pθ (x)dx = (58)
∂θ2 Z ∂θ Z 2
−A00 (θ ) pθ (x)dx + x − A0 (θ ) pθ (x)dx

= (59)
Z Z
= − A00 (θ ) pθ (x)dx + (x − Eθ [X])2 pθ (x)dx (60)
= − A00 (θ ) +Vθ [X] = 0 (61)

and we are done.

16

You might also like