Professional Documents
Culture Documents
Sampling
Sampling
Sampling
(Instructor : Nishant Panda) (Additional Reference (IMC) : Introducing Monte Carlo Methods with R,
Christian P. Robert & George Casella, Springer)
Introduction
The core step in simulation is drawing samples from probability distributions. This will be the topic of
interest in this lecture.
How does R generate samples from a named distribution? For example rnorm generates samples from a
normal distribution.
# Generates 10,000 samples from a normal distribution
# with mean 2 and standard deviation 0.5
norm.samples <- rnorm(10000, mean = 2, sd = 0.5)
hist(norm.samples, breaks = 21, col = "steelblue")
Histogram of norm.samples
1500
Frequency
0 500
0 1 2 3 4
norm.samples
# Plot the "density"
plot(density(norm.samples))
polygon(density(norm.samples), col="steelblue", border="red")
1
density.default(x = norm.samples)
0.8
Density
0.4
0.0
0 1 2 3 4
0.2
0.0
2
For a nice overview see https://en.wikipedia.org/wiki/Monte_Carlo_method
(Home Assignment!) Let X exponentially with rate 1. Show that the C.D.F of X is
F (x) = 1 ex
Example 1: Using the inverse transform method, generate 10000 samples from Exponential
Distribution with rate 1 i.e generate samples from X Exp(1).
First we need to get the inverse function F 1 . This means that if F (x) = y, then we solve for x. From Home
Assignment 1 in this Lecture, we know that F (x) = 1 ex . Thus solving for y,
y = 1 ex
= ex = 1 y
= x = log(1 y)
= x = log(1 y)
3
my.exp.samples
8
4
0
Index
Oops What happened! This doesnt look like an exponential distribution, does it? We are plotting samples
and not the distribution! Let us look at the histogram!
# Plot histogram
hist(my.exp.samples, breaks = 21, col = "steelblue", main = "Histogram of Exp(1) samples" )
Histogram of Exp(1) samples
4000
Frequency
2000
0
0 2 4 6 8 10
my.exp.samples
# Plot density
plot(density(my.exp.samples), xlim = c(0.5, 5), main = "Density plot for Exp(1)")
polygon(density(my.exp.samples), col="steelblue", border="red", xlim = c(0.5, 5))
Density plot for Exp(1)
0.8
Density
0.4
0.0
1 2 3 4 5
4
# Plot density
plot(density(rexp(10000)), xlim = c(0.5, 5), main = "Density plot for Exp(1) from rexp")
polygon(density(rexp(10000)), col="steelblue", border="red")
Density plot for Exp(1) from rexp
0.8
Density
0.4
0.0
1 2 3 4 5
(Home Assignment!) Let X be the Logistic distribution with the C.D.F given by
1
F (x) = (x)
( )
1+e
for two parameters and . Generate 1000 random samples using the inverse transform
method. Take = 5 and = 2.
R uses the inverse transform method to generate samples from the Normal distribution. Can we use the
inverse transform method to generate samples from discrete distributions like Binomial etc? Yes we can! But,
we wont delve further into this.
5
3. Let e(x) be the envelope function that serves as an upper bound for f having the property e(x) =
M g(x) f (x), where M 1 is some constant.
Goal: Draw a sample of X i.e simulate from f .
The accept-reject algorithm can be compactly stated as follows:
1. Generate Y g and U U (0, 1).
2. Set X = Y if
f (Y )
U< ,
e(Y )
3. else, go back to 1.
We see that picking an envelope (i.e picking M and a candidate density) is crucial for this method to work.
Let us illustrate this with the following example.
Example 2: Generate 1000 samples from Beta(4, 3) using the accept-reject method.
We know that if X Beta(, ), then its density f is given by,
( + ) 1
f (x|, ) = x (1 x)1
()()
plot_f
6
2.0
1.5
f(x| 4,3)
1.0
0.5
0.0
0.00 0.25 0.50 0.75 1.00
x
Now, our candidate density function g should have the same support as f and should be a valid density. Lets
take g to be uniform density U (0, 1).
A naive choice for the envelope e(x) = M g(x), would be to take M = max(f (x)). Thus, we can guarantee
that e(x) f (x). This naive choice works whenver we have a compact support for f (x) and can find a global
maxima. Lets find M . We can use the optimize function in R to find the maxima!
max_p <- optimize(function(x) 60*x^3*(1-x)^2, c(0.05, 0.95), maximum = TRUE)
round(max_p$maximum, 4)
## [1] 0.6
Thus, M = f (0.6).
# Lets create our beta density f
beta.f <- function(x) {
60*x^3*(1-x)^2
}
M <- beta.f(0.6)
print(M)
## [1] 2.0736
Our candidate density function g(x) = 1 for any x in [0, 1] because we chose it to be uniform(0,1). Hence,
our envelope function e(y) = M . Now, let us write a simple accept-reject algorithm for this example
num.samp <- 1000
beta.samples <- rep(0, num.samp)
count.samp <- 0
# keep running the accept-reject method until you have drawn 1000 samples
7
# increase the counter
count.samp <- count.samp + 1
# set X = Y
beta.samples[count.samp] <- y
}
}
Let us check if this worked
hist(beta.samples,prob=T,ylab="f(x)",xlab="x",ylim=c(0,max(beta.f(x.seq))),
main="Histogram of draws from Beta(4,3)" )
lines(x.seq,f,lty=2,col = "red")
Histogram of draws from Beta(4,3)
2.0
1.5
f(x)
1.0
0.5
0.0
x
Let us visualize which samples were accepted and which were rejected.
# Total number of samples
num.sim <- 3000
# e is just M, so ue is u * M
ue <- M*runif(num.sim)
8
# put them in a data frame
plot_data <- data.frame(mat)
colnames(plot_data) <- c("y", "ue")
# create a factor variable to see which were accepted
plot_data$Accepted <- ue < beta.f(y)
9
2.0
1.5
Accepted
ue
FALSE
1.0
TRUE
0.5
0.0
0.00 0.25 0.50 0.75 1.00
y
Let us check what proportion of the samples were accepted
round(sum(plot_data$Accepted)/nrow(plot_data),2)
## [1] 0.48
round((1/M),2)
## [1] 0.48
1
This is not a coincidence! M is the expected proportion of candidates that are accepted. If we find a better
M (i.e a tighter envelope) we will on avergae have fewer rejections.
A good envelope should have the following properties:
1. Exceeds the target f everywhere
2. Easy to sample from g.
3. Generete few rejections.
1
If f is supported on a set [a, b], then a naive but good choice for e is max(f (x)) ba i.e we take the candidate
density to be uniform U (a, b) and M = max(f (x)). Will this method work if f N (, )?
Use the Accept-Reject algorithm to sample from f by filling out the following steps: 1. We
dont need to know the normalizing constant c to sample from f ! We can disregard c in this
10
question! The support of f is (, ). Take g(x) to be the standard normal density. 2. Find
a suitable M (Hint, use optimize!). 3. Plot M g(x) and f (without the c). 4. Generate 1000
samples from f (without the c!). 5. Mimic the plot in this lecture to show the accepted and
rejected samples.
11