Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Lecture 10: Monte Carlo Simulations

10.1 Recommended textbook chapters for this section


• Frenkel and Smit Chapter 3

10.2 Topics in this lecture


• Sampling from a canonical ensemble

• Stochastic sampling of determinate equations

• Importance sampling

10.3 Announcements
• Exam 1 Thursday from 6-7:30 PM in EH 3024. Bring equation sheet, no calculator.

• Simulation project to be assigned Thursday

10.4 Interactions in molecular simulations


10.4.1 Non-bonded potentials
We ended the last lecture by introducing pairwise non-bonded interactions between particles that
are commonly used in molecular simulations. The most common non-bonded interaction in molec-
ular simulations is the Lennard-Jones potential, or the 12-6 potential, which is typically written
as:

" 12  6 #
σij σij
ELJ (rij ) = 4ij − (10.1)
rij rij

rij is the (scalar) distance between particle i and j, ij is a characteristic interaction energy and σij
is a characteristic interaction length scale describing the approximate diameter of a particle. The
Lennard-Jones potential is broken into an attractive interaction that scales with r−6 and a repulsive
potential that scales with r−12 . The attractive potential represents three contributions to typical
van der Waals interactions that are all attractive and scale with r−6 : London dispersion forces,
which are related to interactions between instantaneous dipoles that arise from quantum mechanical
considerations; dipole-induced dipole interactions, which are related to attractions between dipoles

1
University of Wisconsin-Madison Lecture 10
CBE 710, Fall 2019 - Prof. R. C. Van Lehn October 8, 2019

on a molecule and induced dipoles that arise from the polarizibility of a different molecule; and
Keesom interactions, which emerge from the orientation dependence of dipole-dipole interactions.
In general, we do not attempt to divide the LJ potential into contributions from these three distinct
interactions, but rather empirically identify parameters for  and σ which capture all three effects.
The repulsive potential represents Pauli exclusion, which acts to ensure that particle wave functions
do not overlap. There is no single scaling relation for Pauli exclusion other than that it must be
a strong repulsive force, so for computational convenience r−12 is chosen since the calculated r−6
term can be simply squared.
In addition to van der Waals interactions, it is typical to associate charges (or partial charges,
to account for the unequal distribution of electrons throughout a molecule that leads to dipoles) to
atoms or particles in a system. These charges interact via a long-range Coulombic potential:
1 qi qj
Ecoulomb (rij ) = (10.2)
4π0 r rij
where 0 is the permittivity of free space, r is the relative dielectric constant (1 in vacuum, 2-4 in
oil, 80 in water), and qi is the charge on particle i. In practice, Coulombic interactions are difficult
to calculate in simulations due to periodic boundary conditions because they decay slowly and
the minimum image convention is a severe underestimate of the total magnitude of electrostatic
interactions. Instead, advanced techniques, such as Ewald summations, have been derived handle
their calculation. Such methods are outside of the scope of this discussion but are discussed in the
Frenkel and Smit textbook if you would like to review them on your own.

Many other non-bonded interactions are possible and in common use, but these represent the
two most common functional forms used in most atomistic simulations.

10.4.2 Interactions in MC simulations


The interactions described above are all described by potentials that are continuous and differen-
tiable, and thus can be used in molecular dynamics simulations. However, Monte Carlo simulations
can use any potential energy function regardless of whether it is differentiable, expanding the reper-
toire of possible interactions. Some examples of interactions used in MC simulations are noted here.
The first is a hard-sphere interaction, where particles are strictly not allowed to overlap (akin to
an infinitely strong repulsive potential part of the Lennard-Jones interaction):
 
∞ , rij < σ
Ehard (rij ) = (10.3)
0 , rij ≥ σ

2
University of Wisconsin-Madison Lecture 10
CBE 710, Fall 2019 - Prof. R. C. Van Lehn October 8, 2019

Another example is the interaction between nearest-neighbors used in the Ising model, which
can be easily represented in MC simulations:
 
−J , if i, j are neighbors
EIsing (rij ) = (10.4)
0 , otherwise
In principle, many other possible interactions could be defined; here we only included a subset
that are commonly found in the simulation literature and map directly to many experimental
problems.

10.4.3 Periodic boundary conditions


A chellenge in molecular simulations is defining a represenative system volume, V , because we
typically simulate very small volumes relative to physical systems (i.e. box lengths on the order
of 10-100 nm in an atomistic or coarse-grained system). In such small box sizes, the interface, or
box walls, are substantial, and hence there must be some method for dealing with the behavior
of molecules near the walls. If we do not treat the boundaries differently - i.e. if we have a free
boundary - then molecules at the edge of the box will interact with vacuum, which may not be
intended.

While clearly the volume of a macroscopic system is inaccessible to atomistic simulations, it


is still possible to define a simulation volume in which we sample the same ensemble quantities
as the macroscale. We define the simulation box by three box vectors, a, b, and c, and by three
angles between these vectors, α, β, and γ. While these parameters can vary, for our purposes we
will assume that the simulation box is cubic, i.e. a = b = c = L and α = β = γ = 90◦ . The most
common way of treating the boundary of a simulation box is by imposing periodic boundary
conditions (PBCs), in which one side of the simulation box is connected to the opposite side
so that the simulation box is effectively infinite. Using PBCs is akin to using an infinite set of
equivalent simulation boxes that are offset by a box length L in a given Cartesian direction from
the “actual” simulation box. Each of these equivalent simulation boxes have equivalent particles
as well with their positions again modified by L - the particle located at position r(x, y, z) in
the simulation box is mirrored across all of these simulation boxes such that it is the same as
the particle at positions r(x + L, y, z), r(x, y + L, z), r(x, y, z + L), and so on. This treatment
is analogous to defining the simulation box as a unit cell in an infinitely periodic material: if a
particle leaves the box along the negative x-axis, a particle from one of the adjacent simulation boxes
immediately re-enters the box along the positive x-axis. Note that the boundaries of the simulation
box themselves have no particular significance. The system is also translationally invariant: all
particle positions are shifted uniformly, no system properties would change. If particles interact
via pairwise intermolecular potentials (see below), periodic boundary conditions then allows us to
write:

3
University of Wisconsin-Madison Lecture 10
CBE 710, Fall 2019 - Prof. R. C. Van Lehn October 8, 2019

N N
N 1 XX
E(r ) = E(|rij |) with free boundaries (10.5)
2
i j
N N
N 1 XXX
E(r ) = E(|rij + nL|) with periodic boundaries (10.6)
2 n i j

Here, rN is a vector representing the set of all N particle positions, rij is the (vector) distance
between particle i and j, n is a vector of 3 arbitrary integers, and L is the size of the box. The factor
of 1/2 eliminates overcounting of pairs of atoms. In other words, each particle can interact with
every possible periodic image since the system is infinite. This may not be desirable, so typically
interactions use the minimum image convention meaning that the distance used in computing
pair potentials is the shortest possible distance between two particles, taking into account periodic
boundary conditions. Thus, the distance in any one dimension of the box can never exceed L.
In practice, PBCs are used for the vast majority of simulations, but you must be careful that
they do not introduce artifacts. If the value of L is small relative to long-wavelength modes of
the system, for example, then the presence of PBCs could limit these modes. For example, lipid
bilayers, while largely planar, undulate out-of-plane over length scales of hundreds of nanometers;
these undulations are often damped in simulations that are too small to properly capture them.

10.5 Sampling from a canonical ensemble


In the last lecture, we introduced two types of simulation methodologies: molecular dynamics simu-
lations, in which statistical quantities are computed by evolving the positions of a series of particles
in time; and Monte Carlo simulations, in which statistical quantities are computed by sampling a
thermodynamcially relevant probability distribution. We then discussed common simulation rep-
resentations of particles, boundary conditions, and interactions between particles. In this lecture,
we will build upon these fundamentals to describe the principles of Monte Carlo simulations.
In general, the goal of a simulation might be to estimate the ensemble average of a quantity
which will be denoted as Y . The ensemble-average value of Y can be obtained readily if we know
the partition function for the system. Recall from our study of the Ising model that a general
expression for the partition function of a system with N non-independent particles is:

XXX X
Z= ··· e−βE(r1 ,r2 ,r3 ,...rN ) (10.7)
r1 r2 r3 rN

In this notation, each sum accounts for all possible positions (ri ) of one of the particles, and
the energy of each configuration is a function of all particle positions. The bold notation indicates
that ri is a vector; in this case a vector with 3 coordinates referring to the x/y/z positions of the
particle. We cannot factorize this partition function because the particles are interacting and hence
a single-particle partition function cannot be written without knowledge of the positions/states of
the other particles. If we now assume that particle positions are continuous, rather than discrete,
we can transform our sums to integrals in the classical limit and write the expression as:

4
University of Wisconsin-Madison Lecture 10
CBE 710, Fall 2019 - Prof. R. C. Van Lehn October 8, 2019

Z Z Z Z
Z= ··· dr1 dr2 dr3 . . . drN e−βE(r1 ,r2 ,r3 ,...rN ) (10.8)
r1 r2 r3 rN
Z
drN exp −βE(rN )
 
≡ (10.9)
VN
Here, each integral runs over some volume which is accessible to each particle in the system
since the particle positions have units of length. We simplify the notation by writing the integral
over a single vector, rN , which contains the positions of all N particles; in three-dimensions, this
is then a vector with 3N coordinates, and integrating over all possible positions is equivalent to
integrating over a 3N-dimensional volume V N which we call the volume of phase space accessible
to the N particles (often this partition function is written with a normalizing prefactor with units
of 1/volumeN to ensure that the partition function is unitless; we omit this prefactor here).
With this new notation, the ensemble average in the classical canonical ensemble (with N V T
fixed) is given as:
R N
dr exp −βE(rN ) Y (rN )
 
hY i = R (10.10)
drN exp [−βE(rN )]
This notation mirrors the notation used to sum over all states accessible to a system using a
discrete partition function, but here the sum is replaced by an integral over phase space. It is only
a notation change, and conceptually the quantities are the same.
In principle, the integrals in eq. (10.10) could be calculated in a brute force manner by de-
termining the value of Y (rN ) for every set of particle coordinates and integrating numerically.
However, such an approach would be impossible computationally because the number of system
configurations becomes effectively infinite for even a small number of particles. Moreover, it is
likely that the vast majority of the system configurations would have a high energy, E(rN )  kB T ,
and as a result the Boltzmann factor for most values of Y (rN ) would be zero. In other words, a
large portion of the phase space (V N ) possible for a simulation will be inaccessible due to its high
energy - those configurations will be infinitely unlikely. Performing such a calculation would thus
be not only nearly impossible, but also highly inefficient. Finally, the last thing to notice is that
to calculate hY i, it is not necessary to calculate the value of the integrals in both the numerator
and denominator of eq. (10.10); only their ratio must be determined. This observation will form
the basis of the Metropolis Monte Carlo algorithm. We will now describe Monte Carlo sampling in
general, then discuss the Metropolis algorithm.

10.6 Stochastic sampling of determinate equations


The main idea of the Monte Carlo method is the following: a determinate mathematical expression
(like the integrals in the equations for the canonical ensemble) is reformulated as a probablistic
analogue and then solved by stochastic sampling. Consider evaluating a function F which can be
estimated as the integral of some function f (x) within a defined interval [a, b]:
Z b
F = f (x)dx (10.11)
a
To follow the idea of equating a determinate function (the integral) with a stochastic solution
method, we can define an arbitrary probability density function, p(x), which represents the prob-
ability of finding any particular value of x. We assume that this probability density is known for
the function of interest. We can then write:

5
University of Wisconsin-Madison Lecture 10
CBE 710, Fall 2019 - Prof. R. C. Van Lehn October 8, 2019

Z b
F = f (x)dx (10.12)
a
Z b 
f (x)
= p(x)dx (10.13)
ρ(x)
a 
f (x)
= (10.14)
ρ(x)

This expression is the ensemble-average of an observable but P in the continuum limit - in other
words, this is the continuum version of the expression hY i = pi Yi where we have replaced
 the
f (x)
summation (for discrete states) with an integral and the observable we are computing is p(x) .
 
Now, we can calculate the average value of fp(x) (x)
by randomly selecting points within the
 
interval [a, b] according to the probability distribution p(x), and calculating fp(x) (x)
for each ran-
domly selected value of x. This stochastic sampling of x from all possible values defines the Monte
Carlo method. If we have an infinite number
  of trials, then each value of x will be sampled exactly
f (x)
according to p(x) and the average of p(x) computed from the infinite number of trials will be
exactly equal to the value of the integral above. We can thus approximate F by:


f (x)
F = (10.15)
p(x)
 
f (x)
≈ (10.16)
p(x) trials
τ
1 X f (x)
= (10.17)
τ p(x)
i

where for each of the τ trials (samples), x is chosen according to the probability p(x).
Let’s consider a simple example of how we might apply this idea. First, we will choose p(x) to
be a uniform probability density:
1
p(x) = for a ≤ x ≤ b (10.18)
b−a
Then, we can generally approximate F as:
τ
b−aX
F ≈ f (x) (10.19)
τ
i

This expression approximates F by randomly, uniformly sampling values of x between a and


b, calculating f (x), and taking the average, with additional values of τ increasing the accuracy of
our approximation. Of course, for many one-dimensional functions this may not be particularly
efficient relative to just performing numerical quadrature, but in a N -dimensional space, such as
the configurational space of the partition function, this type of methodology could be efficient.
Let’s consider applying this uniform sampling methodology to the calculation of the canonical
partition function. We could write:

6
University of Wisconsin-Madison Lecture 10
CBE 710, Fall 2019 - Prof. R. C. Van Lehn October 8, 2019

Z
drN exp −βE(rN )
 
Z= (10.20)
VN
τ
VN X
exp −βE(rN
 
≈ i ) (10.21)
τ
i

V N is again the volume of phase space which is the 3N -dimensional analogue to the interval
[a, b]; τ is the total number of samples used for the approximation, and E(rN i ) is the potential
energy of the system for the specific configuration denoted by i. There are two major problems
with this approach in practice. First, it’s difficult to estimate the total phase space volume, V N .
This problem can be avoided, however, by recognizing that calculating the ensemble-average value
of an observable requires only the ratio of two quantities within the phase space V N . So we can
write:

VN Pτ
Yi (rN N
 
τ i ) exp −βE(ri )
i
hY iN V T ≈ V N Pτ
(10.22)
N
 
τ i exp −βE(ri )
Pτ N
 N

i Yi (ri ) exp −βE(ri )
≈ Pτ  N
 (10.23)
i exp −βE(ri )

We use the notation hY iN V T as a reminder that the ensemble-average value of Y is calculated


in the canonical ensemble. Due to the ratio, the term involving the total phase space drops out.
But, we are left with a second problem, which is that the vast majority of configurations in most
systems will have
 a near-zero contribution to the ensemble average, since the Boltzmann weight
exp −βE(rN

i ) ≈ 0 for any configurations with unfavorable energies. For example, consider sam-
pling configurations from a set of hard spheres to represent a fluid - any configuration in which
there is slight overlap between spheres leads to an infinite system energy, and the corresponding
Boltzmann weight would be zero. Randomly selecting particle positions would thus lead to the
vast majority of configurations not contributing to the average, inhibiting an accurate calculation.
Instead, we would like to perform importance sampling, by only examining configurations with
finite contributions to the ensemble average.

10.7 Importance sampling and Markov chains


To evaluate ensemble averages from the canonical ensemble, we can use Eq. (10.17) but with
the probability density, p(x), chosen to maximize the likelihood of sampling configurations that

7
University of Wisconsin-Madison Lecture 10
CBE 710, Fall 2019 - Prof. R. C. Van Lehn October 8, 2019

contribute meaningfully to the calculation of the ensemble average. This is the essence of importance
sampling. We can perform importance sampling for configurations in the canonical ensemble by
recalling that the probability of finding the system in a given microstate of the canonical ensemble,
p(rN )N V T , is related to the Boltzmann factor for that state normalized by the partition function.
We can then write:

exp −βE(rN )
 
N
p(r )N V T = (10.24)
Z Z
hY iN V T = drN p(rN )N V T Y (rN ) (10.25)

Rb
The ensemble average then has the same form as F = a f (x)dx if we let x = rN and f (x) =
p(rN )N V T Y (rN ). Following the reasoning above, we can then approximate hY iN V T by:

 
f (x)
hY iN V T ≈ (10.26)
p(x) trials
p(rN )N V T Y (rN )
 
= (10.27)
p(rN ) trials

From this expression, we see that if we select trials according to the probability distribution
p(rN ) = p(rN )N V T , then we get:

hY iN V T = hY itrials (10.28)
Thus, we choose configurations in our ensemble according to the canonical ensemble probability
distribution, in which case the ensemble-average value of Y can be estimated by sampling con-
figurations according to their Boltzmann weight. Finally, we note that we just need to know the
probability of sampling a configuration - we do not necessarily need an expression for the partition
function itself. In principle, we could choose another probability distribution p(x) from which to
sample states, but the choice of p(rN ) = p(rN )N V T is the simplest.
Our problem then boils down to: how do we select states according to the correct probability
distribution without knowing the value of the partition function? To do so, we will generate a
Markov chain of states as a means of sampling our distribution. A Markov chain refers to a
sequence of states (i.e. configurations or trials using our previous nomenclature) that satisfy the
following two conditions:

• Each state generated belongs to a finite set of possible outcomes called the state space. The
statistical mechanical analogue to this statement is to say that each microstate generated
belongs to a finite ensemble. We can denote each possible state by rN N N
1 , r2 , r3 , . . . for the
enormous set of possible microstates within the canonical ensemble that we are sampling.
For the canonical ensemble, this state space is equal to V N , the accessible phase space.

• The probability of sampling state i + 1 in the sequence of states sampled depends only on
state i, and not on previous states in the chain.

Since the likelihood of sampling a new state is only related to what current state we are in, we
can define a transition probability, Π(m → n), which defines the likelihood of transitioning from
state m to state n. We can then imagine an algorithm in which we start in some state m then

8
University of Wisconsin-Madison Lecture 10
CBE 710, Fall 2019 - Prof. R. C. Van Lehn October 8, 2019

transition to a new state n with a probability given by Π(m → n) and repeat this for a large number
of trials. If we do this an infinite number of times, then the state m will appear with an overall
probability given by p(m), where p(m) is the limiting probability distribution that does not depend
on any of the other states (unlike the transition probability). When sampling from the canonical
ensemble, then, we want p(m) to equal p(rN m )N V T - that is, the likelihood of sampling state m if
we take enough states from our Markov chain is equal to the probability of sampling that state
according to the canonical ensemble distribution function. Thus, we need to find an expression for
the transition probability, Π, that yields this correct limiting distribution. We will return to this
problem in the next lecture.

You might also like