Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Problem set 5

Numerical Methods for EOR

07/03/2023

Week 5-Simulation
This week we focus on simulation, and we continue to use the data of the assignment.

Reading material
Read chapter 20 of Jones et al.

Problem 0
First, note that if Y follows a translated Gamma distribution, we have
FY (y) = Pr(Y ≤ y) = Pr(X + c ≤ y) = Pr(X ≤ x − c) = FX (x − c),
with FX (·) the distribution function of a Gamma distribution with parameters α and β. The probability of
finding an observation in the irow of the table (with lowerbound li and upperbound ui ) is then
Pr(li < Y ≤ ui ) = FX (ui − c; α, β) − FX (li − c; α, β).
Note that for the last row we have
Pr(Y ≥ l10 ) = 1 − Pr(Y ≤ l10 ) = 1 − FX (li − c; α, β).
As a consequence, the likelihood function takes the standard multinomial form
Y n
L(α, β, c) = (FX (ui − c; α, β) − FX (li − c; α, β)) i
i

with ni the number of observations in row i. The loglikelihood to be optimized is then


X
`(α, β, c) = ni log (FX (ui − c; α, β) − FX (li − c; α, β)) .
i

We implement this below.


table1 <- cbind(c(0,2.5,7.5,12.5,17.5,22.5,32.5,47.5,67.5,87.5),
c(2.5,7.5,12.5,17.5,22.5,32.5,47.5,67.5,87.5,Inf),
c(41,48,24,18,15,14,16,12,6,23))

loglik <-function(p,d){
upper <- d[,2]
lower <- d[,1]
n <- d[,3]
ll<-n*log(ifelse(upper<Inf,pgamma(upper-p[3],p[1],p[2]),1)-
pgamma(lower-p[3],p[1],p[2]))

sum( ll )
}

1
We need a decent starting value. The minimum of the domain of a Gamma distribution is 0, so we take that
to be the starting value for c. Then, we take a very rough approach: suppose we have the indicated number
of observations from the center of each interval. Then it is easy to estimate mean and variance, and obtain
starting values for α and β, as EX = α/β and varX = α/β 2 .
interval.center <- c((table1[1:9,1]+table1[1:9,2])/2,table1[10,1])
pseudo.data <- rep(interval.center,table1[,3])
mean.p.d <- mean(pseudo.data)
var.p.d <- var(pseudo.data)
beta0 <- mean.p.d/var.p.d
alpha0 <- beta0*mean.p.d

p0 <- c(alpha=alpha0,beta=beta0,c=0)
m <- optim(p0,loglik,control=list(fnscale=-1),
d=table1,hessian=T)
print(m)

## $par
## alpha beta c
## 0.36449625 0.01257362 1.88088830
##
## $value
## [1] -468.4725
##
## $counts
## function gradient
## 154 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
##
## $hessian
## alpha beta c
## alpha -1531.2802 16670.3179 -107.19179
## beta 16670.3179 -404466.4366 204.15174
## c -107.1918 204.1517 -21.66854
rbind(par=m$par,
se=sqrt(diag(solve(-m$hessian))))

## alpha beta c
## par 0.36449625 0.012573623 1.8808883
## se 0.05052261 0.002519416 0.3161037
The optimizer has converged, and the estimate for c seems reasonable. Note that it differs significantly from 0,
so in this case, a translated Gamma distribution should provide a better fit than just a Gamma distribution.

Problem 1
In problem 4 of the assignment, you were asked to estimate the parameters of a translated Gamma distribution,
based on grouped data. Use simulated data set to check your likelihoodfunction as follows. First, generate
data according to the Gamma distribution that you estimated in the first part of problem 4 of the assignment.
Use the same lower- and upper bounds as in Table 1 of the assignment (hint: to create the groups, use

2
ggplot2::cut_width). Generate 400 random numbers, and assign them to the thirteen groups. Use this
simulated table to estimate the parameters of your Gamma distribution. Suppose you would increase the
sample size to 40000, would you expect the standard deviations of your estimated parameters to decrease by
a factor 10? Assess this, and explain your result.
table2.10 <- cbind(lower=c(0,2.5,7.5,12.5,17.5,22.5,32.5,
47.5,67.5,87.5),upper=c(2.5,7.5,12.5,17.5,22.5,32.5,
47.5,67.5,87.5,Inf),freq=c(41,48,24,18,15,14,16,12,6,23))

loglik <-function(p,d){
upper <- d[,2]
lower <- d[,1]
n <- d[,3]
ll<-n*log(ifelse(upper<Inf,pgamma(upper-p[3],p[1],p[2]),1)-
pgamma(lower-p[3],p[1],p[2]))

sum( ll )
}

p0 <- c(alpha=0.47,beta=0.014,c=0)
m <- optim(p0,loglik,hessian=T,control=list(fnscale=-1),
d=table2.10)

## Warning in pgamma(upper - p[3], p[1], p[2]): NaNs produced


## Warning in pgamma(lower - p[3], p[1], p[2]): NaNs produced
## Warning in pgamma(upper - p[3], p[1], p[2]): NaNs produced
## Warning in pgamma(lower - p[3], p[1], p[2]): NaNs produced
m

## $par
## alpha beta c
## 0.36448483 0.01256196 1.88139535
##
## $value
## [1] -468.4725
##
## $counts
## function gradient
## 124 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
##
## $hessian
## alpha beta c
## alpha -1531.3511 16685.7389 -107.26790
## beta 16685.7389 -405229.6073 204.16466
## c -107.2679 204.1647 -21.70944

3
Now we simulate 400 observations from the estimated Gamma distribution, and bin them in the same type of
table:
x400 <- m$par[3]+rgamma(400,m$par[1],m$par[2])

# use cut to bin


x.table <- cut(x400,breaks=c(0,table2.10[,"upper"]))
table(x.table)

## x.table
## (0,2.5] (2.5,7.5] (7.5,12.5] (12.5,17.5] (17.5,22.5] (22.5,32.5]
## 90 94 31 34 27 23
## (32.5,47.5] (47.5,67.5] (67.5,87.5] (87.5,Inf]
## 24 30 14 33
replication.table2.10 <- table2.10
replication.table2.10[,"freq"] <- table(x.table)
m2 <- optim(p0,loglik,hessian=T,control=list(fnscale=-1),
d=replication.table2.10)

## Warning in pgamma(upper - p[3], p[1], p[2]): NaNs produced


## Warning in pgamma(lower - p[3], p[1], p[2]): NaNs produced
## Warning in pgamma(upper - p[3], p[1], p[2]): NaNs produced
## Warning in pgamma(lower - p[3], p[1], p[2]): NaNs produced
x40000 <- m$par[3]+rgamma(40000,m$par[1],m$par[2])
x.table <- cut(x40000,breaks=c(0,table2.10[,"upper"]))
table(x.table)

## x.table
## (0,2.5] (2.5,7.5] (7.5,12.5] (12.5,17.5] (17.5,22.5] (22.5,32.5]
## 7743 9067 4043 2788 2107 3054
## (32.5,47.5] (47.5,67.5] (67.5,87.5] (87.5,Inf]
## 3126 2590 1575 3907
replication.table2.10 <- table2.10
replication.table2.10[,"freq"] <- table(x.table)
m3 <- optim(p0,loglik,hessian=T,control=list(fnscale=-1),
d=replication.table2.10)

## Warning in pgamma(upper - p[3], p[1], p[2]): NaNs produced

## Warning in pgamma(upper - p[3], p[1], p[2]): NaNs produced


## Warning in pgamma(upper - p[3], p[1], p[2]): NaNs produced
## Warning in pgamma(lower - p[3], p[1], p[2]): NaNs produced
cbind(original=m$par,sd.original=sqrt(diag(solve(-m$hessian))),
n400=m2$par,sd.n400=sqrt(diag(solve(-m2$hessian))),
n40000=m3$par,sd.n40000=sqrt(diag(solve(-m3$hessian))))

## original sd.original n400 sd.n400 n40000 sd.n40000


## alpha 0.36448483 0.050510339 0.33504832 0.036188115 0.36233994 0.0037501113
## beta 0.01256196 0.002516809 0.01284136 0.001931353 0.01242269 0.0001824151
## c 1.88139535 0.315744218 1.85136595 0.250673632 1.86751787 0.0241354584

4
Problem 2
Suppose X1 , . . . , Xn ∼ Exp(λ). Then we know that
√ asy
n(λ̂ − λ0 ) ∼ N (0, I(λ0 )−1 )

with λ̂ the maximum likelihood estimator of λ, λ0 the true value, and

∂ 2 `i
I(λ) = −E .
∂λ2
In the case of the exponential distribution we have

`(λ; xi ) = log λ − λxi ,

so
∂ 2 `i ∂ 2 `i 1
−E 2
= − 2 = 2.
∂λ ∂λ λ
Summarizing, for the exponential distribution, we have
√ asy
n(λ̂ − λ0 ) ∼ N (0, (λ0 )2 ).

Take λ0 = 1 and show that this relation holds approximately for n = 10, n = 1000, and n = 10000. Use a
qq-plot to assess (approximate) normality (hint: ggplot2::stat_qq).

solution
We start with n = 10, and generate B samples from wich to calculate λ̂. Then those B λ’s should be
approximately normal.
set.seed(123456)
B <- 100000 # number of replications
n <- 10
l.hat10 <- rep(NA,B)
for (b in 1:B){
x.b <- rexp(n,rate=1)
l.hat10[b] <- 1/mean(x.b)
}
z10 <- sqrt(n)*(l.hat10-1) # should be approximately normal

qq10 <- data.frame(lambda.hat=z10,n=10)

n <- 100
l.hat100 <- rep(NA,B)
for (b in 1:B){
x.b <- rexp(n,rate=1)
l.hat100[b] <- 1/mean(x.b)
}
z100 <- sqrt(n)*(l.hat100-1) # should be approximately normal

qq100 <- data.frame(lambda.hat=z100,n=100)

n <- 1000
l.hat1000 <- rep(NA,B)

5
for (b in 1:B){
x.b <- rexp(n,rate=1)
l.hat1000[b] <- 1/mean(x.b)
}
z1000 <- sqrt(n)*(l.hat1000-1) # should be approximately normal

qq1000 <- data.frame(lambda.hat=z1000,n=1000)


qq <- bind_rows(qq10,qq100,qq1000)
ggplot(qq,aes(sample=lambda.hat)) + stat_qq() +
facet_wrap(~n,ncol=2) + geom_abline(intercept=0,slope=1,col="grey")

10 100

15

10

0
sample

−5
−2.5 0.0 2.5
1000

15

10

−5
−2.5 0.0 2.5
theoretical
Clearly, the asymptotic distribution is not reasonable at all if n = 10. With increasing number of observations,
the approximation of normality becomes better. It is perhaps more interesting to look at the densities of the
estimated parameters for different values of the number of observations.
ggplot(qq) + geom_density(aes(x=lambda.hat)) + facet_wrap(~n) +
stat_function(fun=dnorm,args=list(mean=0,sd=1),col="red")

6
10 100 1000

0.4

0.3
y

0.2

0.1

0.0

−5 0 5 10 15 −5 0 5 10 15 −5 0 5 10 15
lambda.hat

Problem 3
We continue with the exponential distribution with λ = 1, but now we look at the maximum:

Mn = max(X1 , . . . , Xn ),

with Xi iid Exp(λ).


1. Show that limn→∞ Pr(Mn − log n ≤ x) = exp(− exp(−x)) (so the asymptotic distribution of the
maximum follows a generalized extreme value distribution, and not a normal distribution).
2. Show by means of a qq-plot that

Pr(Mn − log n ≤ x) ≈ exp(− exp(−x))

for fixed n. Also compare the simulated distribution to a normal distribution.


First, we derive the asymptotic distribution of the maximum.

Pr(Mn − log n ≤ x) = Pr(Mn ≤ x + log n) = Pr(X1 ≤ x + log n, . . . , Xn ≤ x + log n)


 n
n 1
Pr(X1 ≤ x + log n)n = (1 − exp(−x − log n)) = 1 − exp(−x) .
n

A standard limit in analysis is  y


z
lim 1− = exp(−z),
y→∞ y
so we have  n
1
lim Pr(Mn − log n ≤ x) = lim 1 − exp(−x) = exp(− exp(−x)).
n→∞ n→∞ n

7
This is a special case of the socalled Generalized Extreme Value distribution with shape parameter ξ = 0.
This distribution function is also known as the Gumbel distribution.
We use the same setu as above to see whether the small sample distribution of the maximum is well
approximated by the limit distribution.
pgumbel <- function(x){ exp(exp(-x)) }

set.seed(123456)
B <- 10000 # number of replications
n <- 10
m10 <- rep(NA,B)
for (b in 1:B){
x.b <- rexp(n,rate=1)
m10[b] <- max(x.b)
}
z10 <- m10-log(10) # should be approximately Gumbel

qq10 <- data.frame(centered.max=z10,n=10)

n <- 100
m100 <- rep(NA,B)
for (b in 1:B){
x.b <- rexp(n,rate=1)
m100[b] <- max(x.b)
}
z100 <- m100-log(100) # should be approximately Gumbel

qq100 <- data.frame(centered.max=z100,n=100)

n <- 1000
m1000 <- rep(NA,B)
for (b in 1:B){
x.b <- rexp(n,rate=1)
m1000[b] <- max(x.b)
}
z1000 <- m1000-log(1000) # should be approximately Gumbel
qq1000 <- data.frame(centered.max=z1000,n=1000)

n <- 10000
m10000 <- rep(NA,B)
for (b in 1:B){
x.b <- rexp(n,rate=1)
m10000[b] <- max(x.b)
}
z10000 <- m10000-log(10000) # should be approximately Gumbel

qq10000 <- data.frame(centered.max=z10000,n=10000)


qq <- bind_rows(qq10,qq100,qq1000,qq10000)
ggplot(qq,aes(sample=centered.max)) + stat_qq() +
facet_wrap(~n,ncol=2) + geom_abline(intercept=0,slope=1,col="grey")

8
10 100
12

0
sample

1000 10000
12

−4 −2 0 2 4 −4 −2 0 2 4
theoretical
Clearly, the distribution of Mn − log n is not well approximated by a standard normal distribution. But
them we derived above that it should be well approximated by a Gumbel distribution. We need to be able to
calculate the quantiles of a Gumbel distribution:

x = − log(− log p)

is the inverse of the distribution function.


qgumbel <- function(p){
p <- ifelse(p>0,p,1e-8)
-log(-log(p))
}

ggplot(qq,aes(sample=centered.max)) + stat_qq(distribution = qgumbel) +


facet_wrap(~n,ncol=2) + geom_abline(intercept=0,slope=1,col="grey")

9
10 100
12

0
sample

1000 10000
12

−2.5 0.0 2.5 5.0 7.5 10.0 −2.5 0.0 2.5 5.0 7.5 10.0
theoretical
One could argue that the asymptotic approximation is better in this case than for the maximum likelihood
estimator above.

10

You might also like