Professional Documents
Culture Documents
MS Thesis On Benchmark Test Set For Free Energy Calculations Using Molecular Simulations
MS Thesis On Benchmark Test Set For Free Energy Calculations Using Molecular Simulations
Himanshu Paliwal
University of Virginia
July, 2011
APPROVAL SHEET
This thesis is submitted in partial fulfillment of the
requirements for the degree of
Master of Science (Chemical Engineering)
______________________________________________________
Author
This thesis has been read and approved by the Examining Committee:
______________________________________________________
Michael R. Shirts, Thesis Advisor
______________________________________________________
John OConnell, Committee Chairperson
______________________________________________________
Erik J. Fernandez, Committee Member
______________________________________________________
Kathryn Thornton, Dean, School of Engineering and Applied Science
July, 2011
ABSTRACT
CONTENTS
ACKNOWLEDGEMENTS .............................................................................................. I
LIST OF FIGURES ......................................................................................................... II
LIST OF TABLES ........................................................................................................ VII
1. Introduction ................................................................................................................... 1
2. Molecular systems of benchmark set .......................................................................... 7
2.1 Minimal test system .................................................................................................. 7
2.2 Charge mutation system ............................................................................................ 7
2.3 Large molecule mutation system .............................................................................. 8
3. Comparing free energy methods based on statistical tests, using benchmark set 10
3.1 Free energy methods and error propagation ........................................................... 10
3.1.1 Thermodynamic integration ............................................................................. 10
3.1.2 Exponential averaging ..................................................................................... 12
3.1.3 Bennett acceptance ratio .................................................................................. 15
3.1.4 Unoptimized Bennett acceptance ratio ............................................................ 16
3.1.5 Range-based Bennett acceptance ratio............................................................. 16
3.1.6 Multistate Bennett acceptance ratio ................................................................. 17
3.1.7 Gaussian estimate of exponential averaging .................................................... 17
3.2 Building molecular systems, setting up simulation framework and designing
statistical experiments. .................................................................................................. 18
(A) System preparation and simulation parameters. ..................................................... 19
3.2.1 values and spacing between intermediate states for free energy calculations
................................................................................................................................... 19
3.2.1.1 Full set ................................................................................................... 20
3.2.1.2 Sparse set ............................................................................................... 21
3.2.2 Generating an ensemble of uncorrelated configurations ................................. 22
(B) Statistical tests. ....................................................................................................... 23
3.2.3 Quantifying accuracy and precision in uncertainty estimate of an estimator .. 23
3.2.3.1 Sample standard deviation ........................................................................ 23
3.2.3.2 Analytical estimate.................................................................................... 24
3.2.3.3 Bootstrap estimate ..................................................................................... 25
3.2.4 Quantifying bias of free energy estimates ....................................................... 26
3.2.4.1 Bias due to number of samples ................................................................. 27
3.2.4.2 Bias due to number of intermediate states ................................................ 28
3.2.5 Quantifying reliability of a free energy estimator............................................ 28
3.2.6 Validating the Gaussian distributions of the free energy differences .............. 31
3.3 Results and discussion ............................................................................................ 31
3.3.1 Validation of uncertainty estimates ................................................................. 32
3.3.2 Analysis of bias in free energy estimates ......................................................... 40
3.3.3 Overall reliability of a free energy estimator ................................................... 47
3.3.4 Testing the Gaussian distribution of free energies ........................................... 52
3.3.5 Convergence properties of free energy estimates ............................................ 55
3.4. Conclusions ............................................................................................................ 58
ACKNOWLEDGEMENTS
I would like to express my heartfelt gratitude to my advisor Prof. Michael R. Shirts.
Without his help, support and vision, this work would not have been possible. I would
like to thank my lab members, Kai, Christoph, Arjan, Joe, Tri and Jon. Their comments
and discussions have contributed in bringing this effort to its final form.
I thank UVA ITC and UVA XCG for their computing resources support. I thank Karolina
from the UVA Computer Science department. Her help during the beta test of cross
campus grid gave me ideas about setting up the framework for running high through-put
simulations. I thank James Watney at D. E. Shaw Research for significant help in running
Desmond.
Last but not the least I thank my parents, my family and my wife Priya for all the moral
support they have provided me throughout the work.
LIST OF FIGURES
Figure 1. (a) In the coupled state or solvated state both intermolecular and intramolecular
interactions for anthracene are turned on. (b) In the decoupled state or vacuum state the
intermolecular interactions with water molecules are turned off.
14
Figure 3. (a) Gaussians are plotted on mutually perpendicular axes. Both have the G
calculated using a method (here TI for methane solvation) as mean and RMSE1 and
RMSE2 as their standard deviations. (b) These are fused to generate a bivariate Gaussian
plot. (c) Top view of the bivariate Gaussian plot.
30
36
38
Figure 6. Bias plots for different test cases for the full set. DEXP, TI and TI3 show
numerically significant bias for methane, while all methods show moderate bias with
respect to number of states for anthracene.
45
II
Figure 7. Bias plots for different test cases or sparse set. DEXP and IEXP show large
biases both due to number of samples and due to number of intermediate states.
46
Figure 8. Bivariate Gaussian plots for UA methane solvation. Note how TI and TI3 fail
in reliability test for sparse set. GDEL and GINS are not at all reliable. MBAR and
BAR are accurate but the estimate of the precision in BAR is misleading as it
underestimates uncertainty, as shown in Figures 4 and 5. SP after the method name
indicates the results of the sparse set.
49
Figure 9. Bivariate Gaussian plots for dipole inversion. TI is reliable for this molecule,
but DEXP, IEXP, GDEL and GINS again are the least accurate and precise of all the
methods studied.
50
Figure 10. Bivariate Gaussian plots for Anthracene hydration free energy. The effect of
bias due to intermediate states is evident in almost all methods as all spreads are elliptical
in vertical direction. TI3 and MBAR both appear reliable, especially for the full set.
51
Figure 11. Each subplot has a comparison between the distribution of estimated free
energies from 100 repetitions (in blue), Gaussian with the mean G and standard
deviation (G) from 100 repetitions (in black), Gaussian with the mean (G)450ns and
standard deviation ((G)450ns) (in magenta), Gaussian with the mean (G)51states and
standard deviation ((G)51states ) (in cyan) for the full set for methane solvation.
53
Figure 12. Each subplot has a comparison between the distribution of estimated free
energies from 100 repetitions (in blue), Gaussian with the mean G and standard
deviation (G) from 100 repetitions (in black), Gaussian with the mean (G)450ns and
III
standard deviation ((G)450ns) (in magenta), Gaussian with the mean (G)51states and
standard deviation ((G)51states ) (in cyan) for sparse set for methane solvation.
54
Figure 13. Free energies estimated using large number of intermediate states and a large
number of samples converge to a single value for all free energy estimators.
56
Figure 14. Accuracy in methane solvation free energy estimates as a function of different
PME parameters.
76
Figure 15. Inaccuracy in free energy estimates increases with increasing phase space
77
overlap.
80
81
Figure 18. Simulation time for methane solvation as a function of different PME
parameters.
82
Figure 19. Subplot at row 2 column 2 in Figure 14 is expanded to explore exactly how
G estimates of methane solvation vary for different Fourier spacings and cutoffs
converge.
83
Figure 20. Subplot at row 2 column 2 in Figure 17 is expanded to explore exactly how
Hvap estimates vary for different Fourier spacings and cutoffs converge.
83
Figure 21. Subplot at row 2 column 2 in Figure 19 is expanded to explore exactly how
simulation time for methane solvation varies for different Fourier spacings and cutoffs. 84
IV
Figure S 1. Plots comparing the distribution of dipole inversion free energies with the
Gaussians for full set for dipole inversion.
108
Figure S 2. Plots comparing the distribution of dipole inversion free energies with the
Gaussians for sparse set for dipole inversion.
109
110
111
112
113
114
115
116
117
Figure S 12. Simulation time for anthracene solvation as a function of different PME
parameters.
118
Figure S 13. Subplot row 2, column 2 in Figure S10 is expanded to explore exactly how
G estimate of anthracene solvation vary for different Fourier spacings and cutoffs. 119
Figure S 14. Subplot at row 2, column 2 in Figure S12 is expanded to explore exactly
how simulation time for anthracene solvation varies for different Fourier spacings and
cutoffs.
119
VI
LIST OF TABLES
35
37
Table 3. Free energy estimates and corresponding uncertainty estimates in the large
number of samples (450 ns) and large number of intermediate states (51 states) for UA
methane solvation. Bootstrap estimates are reported as they are better than analytical
estimates. All quantities are in kJ/mol.
41
Table 4. Bias estimates due to number of samples and number of lambda states for full
and sparse sets for UA methane solvation. All quantities are in kJ/mol.
42
43
Table 6. Time consumed by different methods to calculate the free energy 201 times of
anthracene solvation.
57
60
74
VII
85
85
86
Table 12. Comparison of lead parameter sets with the parameter set used in simulation
done to test free energy estimators for methane solvation.
87
97
98
99
100
Table S 5. Free energy estimated using large number of samples and large number of
intermediate states for dipole inversion.
101
Table S 6. Bias in free energy estimates due number of samples and number of
intermediate states for dipole inversion are presented here. None of the methods show
significantly large bias for this molecular test set.
VIII
102
Table S 7. RMSEs and statistical uncertainties in RMSEs in free energy change of dipole
inversion of IEXP, DEXP, UBAR GDEL, and GINS are significantly higher (indicating
lower reliability) compared to other methods specifically in the sparse set.
103
Table S 8. Free energies estimated u large number of samples and large number of
intermediates states for anthracene hydration free energy.
104
Table S 9. GDEL and GINS have largest bias in free energy estimates for anthracene
solvation due to number of samples and number of intermediate states even for full set.
For sparse set GDEL and GINS have even larger bias compared to full set. DEXP and
IEXP also show significantly large biases in sparse set.
105
Table S 10. High RMSEs and statistical uncertainties in RMSEs for GDEL and GINS in
both the full and the sparse set indicate low reliability. DEXP, IEXP, UBAR also
become unreliable in the sparse set.
106
Table A 1. Single point simulation parameters for MD_sim_parm energies for methane
solvation, dipole inversion, anthracene solvation. The second set of parameters for
number of grid points is for dipole inversion, which had a larger box.
124
Table A 2. Single point simulation parameters for high cutoff energies for methane
solvation and anthracene solvation.
125
Table A 3. Single point simulation parameters for high cutoff energies for dipole
inversion.
125
IX
1. Introduction
Simulation and theory communities have developed substantial interest in using free
energy calculations for molecular design problems. Specifically, free energy calculations
can guide experimental screening techniques for measuring biological interaction
energies and offer the potential of a faster and cheaper way to get thermodynamic
information over large chemical spaces in a variety of molecular contexts.1 For example,
drug design2 requires prediction of binding affinities, tautomers, protonation states,
membrane permeabilities, and solubilities, all of which involve free energy calculations.36
Similarly, free energy calculations could become useful tools in material design
standardization of simulations. During the late 90s, substantial research efforts in the
field of computational fluid dynamics (CFD) were focused on establishing validation
benchmarks to improve the reliability of CFD simulations in various design
applications.14-16 This research helped to bring down costs, increase data fidelity, and
reduce design cycle time in the early development phases of new airplanes,17 Formula 1
cars,18 treatment and diagnosis of cardiovascular diseases19,20 and off-shore oil rigs.21
Similar validation benchmarks were developed for simulations in the nuclear industry, to
improve the reliability in nuclear reactor safety, underground nuclear waste storage, and
nuclear weapon safety.22 For molecular simulation to play a similarly useful role in
molecular engineering design,23 benchmarks and validation sets must be established. In
this paper we provide tools for improved standardization of molecular simulations
through the first version of a benchmark set for free energy calculations.
There are a large number of free energy methods available,23-30 which by early 2011 have
been cited collectively over 4600 times. As of the time of writing, 20% of those citations
occurred within the last 18 months.1 However, the simulation field lacks consensus in
choosing a method most appropriate for a given molecular design situation. At least
three fundamental issues contribute to this confusion.
These numbers were generated by adding the primary citations for each of the listed
methods (23-30) and searches for Thermodynamic Integration and Free Energy
Perturbation in ISI Web of Science, as the original papers for these methods are older
than the ISI database.
2
First, there is a lack of standard test cases for rigorous comparisons between different free
energy methods. Studies of new methods frequently use relatively trivial model systems,
such as a one- or two-dimensional analytically solvable potential energy function, or a
small Lenard-Jones sphere in water. Such test cases may not be representative of the
problems encountered in actual molecular changes. Alternatively, papers describing new
methods may use extremely complicated systems, such as protein-ligand binding systems
that are hard to converge and therefore make it difficult to accurately gauge true gains in
efficiency. Both of these scenarios yield little knowledge about whether a given method
will be useful in answering actual molecular design questions.
Second, computing
As a first step towards helping solve these problems, we propose the first version of a
molecular test set comprising realistic systems undergoing challenging molecular
transformations. We then use this test set to test the reliability of different methods for
estimating free energy differences. Although the molecular design applications listed in
the introduction seem very different, there are features common to all free energy
3
calculations performed for these applications. All involve determining the preference of
a molecule to partition between two environments, which can be calculated by way of a
difference in the free energies of molecular transformation between these two
environments. For example, we might wish to design a solute preferentially solvated by
a protein when compared to solvent (pure water) or a different complex medium (another
protein), as in the case of drug design. Alternately we might design a solvent which
preferentially solvates a given solute in a mixture; such as, designing ionic liquids25 for
sequestering CO2. These molecular transformations primarily involve either growing or
deleting atoms, changing the size or dispersion interaction between atoms, or altering the
partial charge on mutation sites. Any benchmark test set must include examples of these
transformations which are simultaneously challenging enough to push new methods and
yet possible to evaluate with sufficiently high precision to give meaningful comparisons
between methods in a reasonable amount computer time.
The most important features of a property estimation method to understand are the
statistical errors inherent in the method, both bias and statistical uncertainty, and the
reliability of the methods estimate of the property. Without such data, we cannot trust
our calculations or compare two different calculations for validation purposes. Studies
commissioned by U.S. science funding agencies on future directions for simulation based
engineering and science have emphasized the fundamental need for improved uncertainty
verification and validation.26,27 Almost all estimators of statistical quantities, like free
energy and ensemble average calculation methods, have some bias, a systematic
deviation from the true answer that would be obtained with perfect sampling.
4
A few valuable studies have compared28,29 free energy methods, but not necessarily in a
systematic way. We use our proposed benchmark set to directly compare the estimated
uncertainties with the sample uncertainty. We compute the change in the mean square
error as a function of the number of intermediate states and number of samples to capture
both bias and uncertainty. We test whether the distribution of free energy estimators is
indeed Gaussian, a condition usually assumed when using statistical uncertainty estimates
to calculate error. Finally, we evaluate the bootstrap method30 as a method for computing
statistical uncertainty, as this method can be easily implemented for all of free energy
algorithms described in this thesis, and indeed generally for most statistical estimates of
observables.
Bias and uncertainty associated with the free energy algorithm are not the only sources of
errors in free energy estimates. Simulation parameters associated with the calculations of
non-bonded interactions, like electrostatic potential, also contribute to the errors in free
energy estimates. These parameters specify the extent of approximations allowed during
force and potential calculations. Molecular dynamics simulations require accurate force
calculations at each step. Similarly, Monte Carlo simulations require accurate potentials
5
at each step for generating samples. The selection criterions for these simulation
parameters are not well defined and there still exists a challenge as to how these can be
tuned efficiently for an entirely new system. The most common technique is to tune the
parameters such that for a given accuracy in force calculations the selection results in
minimum computational cost31-33. We have introduced a new way of looking into this
problem. We explore the electrostatic potential parameter space using our benchmark test
set. We search for the set of parameters which results in the least computational cost for a
given accuracy in free energy estimates of systems in benchmark set, estimates of molar
enthalpy of vaporization of water and estimates of enthalpy of transformation of each
system in benchmark set.
In this paper, we first explain the molecular test set and the rationale behind the
molecular choices. Next, we use this set to test and compare the accuracy, precision and
reliability of ten free energy methods. We then present a summary of the method
comparison, with much of the data presented as supplementary material because of its
length, and present our recommendations for methods for performing free energy
calculations. We then get into a discussion about the parameter space search and explain
our methodology and protocols to explore it and finally present our recommendations for
electrostatic potential parameters for performing molecular simulations.
The systems in this benchmark set are designed to represent alchemical changes, or
changes of molecular identity, common to most molecular design applications.
Alchemical transformations frequently require the deletion or introduction of atoms and
large changes in the partial charges. Changes in torsional, angle, or dihedral parameters
usually result in smaller changes in phase space, as do small changes in dispersion
strength atomic radius, or charge. We therefore focus on atomic introduction/deletion and
large changes in partial charge.
2.1 Minimal test system: OPLS UA methane in TIP3P water (MS): The solvation of a
Lennard-Jones (LJ) sphere, representing methane, is perhaps the simplest molecular free
energy system that can be truly defined as molecularly realistic. There are no bonds,
angles, or torsions terms, nor are there solute/solvent charge terms. This system
represents a minimal test of whether the free energy method is at all valid or applicable
for molecular systems. We examine the transformation of coupling the sphere into water,
which corresponds to the solvation free energy of this molecule Figure 1 (a).
2.2 Charge mutation system: dipole inversion with OPLS UA ethane molecule in
TIP3P water (DI): We chose two LJ methyl spheres, tethered together, with +1/-1
charges and the bond length of a C-C bond as illustrated in Figure 1(b). This setup avoids
computing free energies of ions directly, as changing the total charge of a system with
periodic boundary conditions is not always handled completely correctly in many codes.35
7
This test measures whether the method can handle large water rearrangements around
charges. The system is a null transform; the free energy change is zero as the final state is
identical to the initial state by symmetry.
Figure 1. (a) Methane solvation (b) Dipole inversion (c) In the coupled state or solvated
state both intermolecular and intramolecular interactions for anthracene are turned on. (d)
In the decoupled state or vacuum state the intermolecular interactions with water
molecules are turned off.
3. Comparing free energy methods based on statistical tests, using benchmark set.
We use the benchmark set to test a total of ten free energy methods.
In the following
Brackets indicate an
2i x 2 x
, where x = (U ( ) / )i
(1)
The ( U()/ )i values at different intermediates are interpolated and then integrated
to get an overall free energy change.
G10 G( 1) G( 0)
10
(U ( ) / )
(2)
For TI we have used a linear interpolation, which leads to the standard trapezoid rule to
integrate the total free energy. For TI3, the ( U()/ )i vs i curve is fit piecewise to a
natural cubic spline and then integrated analytically using the coefficients of the cubic
equation (see Appendix 1 for the derivation). Both the trapezoidal and the cubic spline
integration can be expressed in the form of weighted sum of individual ( U()/ )i
K
G10 Wi (U ( ) / )i
(3)
i 1
Here the Wis are the respective weights corresponding to each state and K is the total
number of intermediate states. The variance of this estimate of free energy can be
calculated by the following variance propagation formula:
K
102 Wi 2 i2
(4)
i 1
Occasionally in the literature, the variance of the free energy over each interval i to i+1 is
computed individually, and then propagated into the total variance by the sum of squares.
This is incorrect, since the variance of each interval is then correlated to the variance of
its neighbors. For example, the free energy difference between state 1 and state 2 and
between state 2 and state 3 both contain statistical information from state 2. A number
of alternative thermodynamic integration schemes have been proposed.37,38 However,
such schemes require some knowledge of the magnitude of the statistical uncertainty for
optimality. Other schemes use nonlinear fits to two different functional forms separately
describing Lennard-Jones and Coulomb contributions to the free energy.39 Such schemes
are not particularly flexible and introduce integration bias that is difficult to quantify. By
using cubic splines, we can obtain a higher order formula independent of functional form
11
of dU/d, while propagating error using the same formalism as is used in standard TI
(Eq. 4).
3.1.2 Exponential averaging (EXP) in two forms: deletion (DEXP) and insertion
(IEXP)28,40:
In exponential averaging schemes, the free energy change Gij is calculated using the
exponential average of the difference of the potential energies Uij between two states i
and j over one of the ensembles. The free energy difference as a function of potential
energy difference Uij and N samples is then
Gij
1 N
ln exp[ U ( xn )ij ]
N n1
(5)
This averaging is performed using samples from state i to compute potential energy
differences Uij from state i to state j. The free energy of the reverse process can be
computed using samples from state j and computing potential energy differences to state
i. Since the labels themselves are arbitrary, to remove ambiguity in the direction we will
describe such computations as being either deletion or insertion. We will call Uij
taken in the direction of decreasing entropy as an insertion step, and Uij taken in the
direction of increasing entropy as a deletion step, as inspired by Wu et al.41 Hence the
free energy method using a Uij which steps in the direction of increasing entropy in Eq.
5 is labeled as deletion exponential averaging (DEXP), and the free energy method which
uses Uij which steps in the direction of decreasing entropy in Eq. 5 is labeled as
insertion exponential averaging (IEXP).
adjacent intermediate states can be estimated using standard point estimation theory as:
12
1
x
N x
2
ij
, x exp[ U ( xn )ij ]
In both the exponential averaging methods the overall free energy change G10
(6)
is
the sum of intermediate free energy changes Gij, and so the variance 210 is simply the
sum of the associated variances 2ij. In some cases the changes from the i state to the i-1
state and i+1 states might both be deletion or insertion cases; in this case, all the sampling
performed at i, and the two estimates of the free energy difference will not be statistically
independent. Complicated molecular changes will frequently involve both addition and
subtractions of phase space, and thus will fall somewhere in between these two general
schemes.
When we calculate the overall free energy change using a method which has inherent
directionality, like exponential averaging, then given our definitions, we need to make it
sure that the direction of entropy change remains intact throughout the process in order to
interpret the whole transformation as a deletion or an insertion process. Methane
solvation and anthracene solvation involve moving molecules from vapor to liquid phase,
resulting in a decrease in total entropy. However, in the dipole inversion case, we have a
symmetric transformation, and thus deletion and insertion happen within a single process.
In dipole inversion, going from a very large magnitude dipole to a small uncharged
intermediate, we have an increase of entropy as the water around the particle becomes
less structured, and thus we use the term deletion. From the intermediate uncharged
intermediate state to the reversed -/+ dipolar state (during the second half of the
inversion), we have the reverse process, and we use the term insertion consistent with the
13
entropy direction definition. To use the terminology IEXP, GINS, DEXP, and GDEL
pathways, we therefore need to combine mixed halves of what would typically be called
the forward and reverse pathways as illustrated in Figure 2.
Although these particular sums are nonzero, they provide a consistent definition of the
statistical variance of insertion and deletion.
transformations will simply be the average of the variance for the deletion and insertion
processes.
14
Nj
1 k 1 1 exp[ (U kj C )]
1 N
Gij ln Ni
C ln j
Ni
i
l 1 1 exp[ (U l C )]
C Gij
N
ln j
Ni
1
(7)
(8)
The first equation is true for any constant C, but when Eqs. (7) and (8) are solved selfconsistently, then Gij will have minimized variance. There exist a large number of ways
to solve the equations self-consistently which is beyond the scope of this paper. The
variance 2ij in free energy change can be estimated from:
2
1 f ( x) i
1
2
1 2
2
N i f ( x)
N j
i
2
ij
f 2 ( x)
1
f ( x) 2
(9)
Where f(x) is the Fermi function 1/(1+x) and x = (U-C). The total free energy change
is the sum over changes between consecutive intermediate states. Typically, the variance
in the full free energy is computed by assuming independent error and summing the
variance for consecutive intermediate states. However, the assumption that the errors add
independently is not correct, since the free energy difference from i-1 to i and from i to
i+1 states both depend on the potential energy at i, so their variances are not independent.
15
calculating the self-consistent solution. To apply this method, a range of starting values
of C is chosen. This trial C is fed as an initial guess and C is calculated using a single
iteration of Eq. 11, with corresponding G and then calculated. Accumulated averages
are maintained for each choice of G. A decent estimate of the range of C and G is
therefore a requirement for using this method. In some cases, it may end up being more
costly than BAR, as accumulated averages must be maintained for a certain number of
16
trial free energy values, instead of simply performing 5-10 self-consistent iterations.
However, the advantage of what we will call in this paper RBAR (range-based Bennett
acceptance ratio) is that data from each simulation step does not need to be retained for
post-processing as is required with BAR, and only the accumulated averages need to be
maintained.
exp[ U i ( xkn )]
Nk
ln
k 1 n1
N
k ' 1
k'
(11)
is solved self-consistently for each Gi. Gij = G(j) - G(i) gives the free energy change
between two states i and j. The statistical variance of Gij, ij2 , is calculated using Eqs. 8
and 12 in the paper by Shirts and Chodera.45
3.1.7 Gaussian estimate of exponential averaging in two forms: deletion (GDEL) and
insertion (GINS)29:
17
ij
2U
ij
(12)
The variance over N samples of this free energy difference is given by:
2
ij
2U
N
ij
2 4U
ij
2( N 1)
(13)
If the distribution Uij is close to Gaussian, then this estimation method can minimize the
statistical effect of rare events, resulting in a more efficient and substantially simpler
estimate method. To remove ambiguities with respect to direction of the process, we use
the same convention of deletion and insertion as described for exponential averaging. In
Eq. 12, when Uij is in the direction of increasing entropy, we refer to this as the
Gaussian estimate with deletion or GDEL, and if we use Uij in the direction decreasing
entropy, we refer to this estimate as the Gaussian estimate with insertion or GINS.
Summing the free energy changes between intermediates again gives the total free energy
changes. Total variance is calculated assuming independent sampling at each state, which
is not an approximation here, as each calculation depends on samples from only one state.
The total free energy is calculated by summing over the free energy changes between
neighboring states.
18
52
constrained using the SHAKE algorithm to a relative tolerance of 10-12. The systems were
then equilibrated at constant pressure at 1 atm using a Parrinello-Rahman barostat43 and a
Nose-Hoover thermostat44 for 100 ps. A coupling time constant of 5 ps was used for both
thermostat and barostat. A switching function was used for both PME and van der Waals
potentials. The PME switch started at 0.88 nm with a coulomb cutoff distance of 0.9 nm
for electrostatics. Other PME parameters were: Fourier spacing of 0.12 nm, 4th order Bspline interpolation and a Ewald tolerance of 10-8. A van der Waals switch at 0.8 nm and
cutoff distance of 0.9 nm was used. A long range van der Waals dispersion correction
was used for both energy and pressure.
3.2.1 values and spacing between intermediate states for free energy calculations:
Initial simulations (5 ns long, including 0.5 ns equilibration) were performed with 21
equally spaced values to guide the selection of the values for the main study.
19
Intermediate states were chosen so that each window contributed almost equally to the
total error, specifically insuring that the maximum variance among all windows was no
larger than the twice of the variance among all windows.
change in bias, statistical error, and mean square error as a function of the spacing
between coupling parameter values, we choose two sets of states for each model: a
full set, and a sparse set.
which is simply the reverse process and so includes a sign reversal. For RBAR, this
spacing means that the largest free energy between intervals is approximately 35 kJ/mol,
meaning we must use range of -40 to 40 kJ/mol, with increments of 1 kJ/mol.
21
The starting co-ordinate files for each intermediate state simulation, other than the first
state, were generated by running short 10 ps equilibration runs in series. The first state at
=0 used one of the 100 uncorrelated starting configurations as the starting co-ordinate
file. The starting configuration for any other state used the final configuration of the
previous equilibrated state. After this initial equilibration round, 5 ns of NPT simulation
were then performed for each initial configuration and each separate state. Data
corresponding to first 500 ps was discarded as equilibration. The remaining 4.5 ns of
equilibrium data for each model at each intermediate state were used for all subsequent
calculations.
3.2.3.1 Sample standard deviation: To compute the sample standard deviation, we take
the simulations started from 100 initial configurations and compute free energy
differences from each simulation to obtain a distribution of free energy differences. We
23
then directly compute the sample standard deviation corresponding to each individual
estimator from:
N
(G )
( G
i 1
Gi ) 2
N 1
(14)
Where N = 100, and where G is the mean over the 100 values of Gis. Crucially, the
standard deviation computed from a finite sized sample is itself a statistical quantity, and
must therefore have an associated uncertainty. Rigorously, in order to compute the
sample standard deviation of the uncertainty ((G)), we would need to repeat our 100
simulation experiment 100 times. Instead, we have used the bootstrap method (described
in the bootstrap estimate section) to estimate ((G)). From this exercise we finally get
G and (G) ((G)). G here is not an ensemble average but the average
over 100 repetitions.
3.2.3.2 Analytical estimate: Each free energy estimator has an associated uncertainty
estimator as discussed in previous sections, namely the square root of the estimated
variance of the total free energy. From 100 uncorrelated starting configurations, we will
obtain not only 100 Gs, but 100 error estimates from the methods analytical
uncertainty estimate. We denote the average and standard deviation of these estimated
uncertainties over all 100 independent runs as (G) ((G)), and call these the
analytical uncertainty estimate and the standard deviation of the analytical estimate
distribution.
24
3.2.3.3 Bootstrap estimate: For each of the 100 independent free energy calculations, we
also calculate a bootstrap error estimate, which is a rigorous and robust statistical
estimation technique.30 The bootstrap error is constructed as follows: From each set of
potential energy differences or dU/d values, we generate N bootstrap sets from the
original sample of molecular simulation data. To generate a bootstrap set, we first
subsample the data using an estimate of the autocorrelation time to obtain N statistically
uncorrelated values. For each bootstrap set, we draw N samples with replacement from a
set of uncorrelated measurements. For example, if our set was the integers {3,6,8,9},
then a bootstrap set would consist in randomly selecting each of the four numbers for
times; {3,3,8,9}, {9, 6, 6, 3}, and {8, 8, 8, 8} would all be valid sets, though clearly the
last one would be the rarest. We repeat this sub-sampling process many times; in this
case, we draw 200 bootstrap sets. For each of these 200 bootstrap sets, we compute the
free energy and uncertainties using the estimators as if they were the original data set.
This gives us 200 Gs, one for each of the 200 bootstrapped sets; standard rules of thumb
suggest using 50-200 bootstrapped sets to get robust estimates of uncertainty.30 The
average of these 200 bootstrapped Gbss gives Gbs. The bootstrap estimate of the
error, (G)bs is the sample standard deviation of the 200 Gbs values for each of the
initial configurations. The average (G)bs over all 100 initial configurations is the
bootstrap error estimate. The statistical uncertainty of this bootstrap uncertainty estimate
is estimated by computing the sample standard deviation over the 100 (G)bs, and is
denoted by ((G)bs).
25
A good free energy estimator should have an analytical uncertainty estimate that is
consistent with the sample standard deviation. If the analytical uncertainty estimate is not
accurate, then we have no way of knowing how accurate our free energy estimate is when
we run only a single calculation. This can be very problematic; for example, the analytic
estimate of the uncertainty of EXP diverges from the true estimate well before the error
in EXP itself.28 The smaller the difference between the direct sample standard deviation
error estimate (G) and the predicted or analytical error estimate (G), the better
we know the methods ability to predict error without having to run multiple trials. If the
statistics are well-behaved, the bootstrap estimate of the statistical uncertainty should also
agree with the sample estimate of the uncertainty. If we can show that the bootstrap
estimate agrees with the sample estimate of the uncertainty, then bootstrap error can
substitute for sample uncertainty estimates even when the analytical estimate fails.
[(A Ai)2]. This estimate can be shown to always be slightly too large. If N is
replaced by N-1, however, the estimator becomes unbiased. In the limit of very large N,
26
the difference is meaningless. Thus the nave estimator of the variance is not unbiased,
but it is asymptotically unbiased. Thermodynamic integration does not have asymptotic
bias, because at each intermediate state, the simple average of dU/d is unbiased for any
numbers of samples. Exponential averaging, BAR, and MBAR are only asymptotically
unbiased, though the bias of exponential averaging is usually significantly higher than
that of BAR and MBAR;28 careful design of the pathway can minimize this large bias to
some extent.55 However, unlike the simple case of the estimator for the variance of an
average, there exist no known unbiased versions of these estimators.
Bias also occurs due to using a limited number of intermediate states, because of lack of
either phase space overlap between intermediates in exponential averaging, as occurs in
acceptance ratio methods and exponential averaging, or because of numerical integration
error occurring in thermodynamic integration methods. Even for a large benchmark set
like the present study, computational expense and storage limits make it very difficult to
approach the number of sample limit and the number of intermediate states limit
simultaneously. Therefore, we estimate the two contributions to bias independently. For
asymptotic bias, we compare the results from combining a fixed amount of data in either
one large data set, or a series of shorter data sets. For bias as a function of number of
intermediate states, we vary the number of intermediates given fixed length simulations
to investigate bias as a function of number of intermediate states.
3.2.4.1 Bias due to number of samples: Data from all 100 5ns runs is stitched into a
single trajectory to mimic a single long trajectory. The data corresponding to
27
equilibrations was not included while estimating free energies, and since each simulation
contained 4.5 ns, stitching results in 450 ns total simulation data. The difference between
free energy estimates computed with 450 ns of data (G)450 and the same data used as an
average of 100 4.5 ns trajectories G is the bias due to number of samples.
3.2.4.2 Bias due to number of intermediate states: We also ran a set of simulations
with 51 states for each model as discussed above. We can estimate the bias due to the
number of intermediate states as the difference between free energy estimates computed
using 51 states (G)51, G estimated using the full states, and G estimated using the
sparse states. Besides having low and consistent error estimates, ideal free energy
estimation methods would show little or no bias in these tests. In many cases, we are
limited by the statistical uncertainty in determining the bias with high accuracy, since it is
computationally too demanding to generate 500 ns of simulation for all 51 states for all
test sets. In these cases, we can only determine if the bias is statistically insignificant
with respect to the statistical uncertainty. In the case of bias with respect to number of
samples, in most cases, asymptotic bias scales with 1/N while statistical uncertainty
scales with 1/N1/2, so statistical variance is usually dominant.
generated from these two reference states, we obtain two different estimates of mean
square error. Neither of these are true RMSEs because we lack the true reference answer.
However, these estimates of the RMSEs capture the combined effect of the statistical
error and the two different sources of bias. We define the two estimates of mean square
errors as MSE1 and MSE2:
MSE1i = i2 + Bias1i2, where Bias1i = (G)450 (Gest)i and 1 i 100
(15)
(16)
RMSE1 and RMSE2 will be square roots of MSE1 and MSE2 respectively. The errors in
RMSE1 and RMSE2, (RMSE1) and (RMSE2) respectively, are the standard deviations
over the 100 RMSE1i and 100 RMSE2i. With the two quantities RMSE1 and RMSE2,
we can examine qualitative information about the reliability of the methods using all the
information from our experiments. We plot a bivariate Gaussian with variances equal to
RMSE1 and RMSE2 on mutually perpendicular axes, as shown in Figure 3(a), with the
analytical average of the free energy estimate G estimated by the method as the mean.
An example bivariate Gaussian RMSE plots is shown in Figure 3(b). Figure 3(c) shows
29
the top view of this Gaussian. The overall spread of the rings in Figure 3(c) is a measure
of overall reliability.
Figure 3. (a) Gaussians are plotted on mutually perpendicular axes. Both have the G
calculated using a method (here TI for methane solvation) as mean and RMSE1 and
RMSE2 as their standard deviations. (b) These are fused to generate a bivariate Gaussian
plot. (c) Top view of the bivariate Gaussian plot.
A poor estimator has either large and or unequal spreads in horizontal and vertical
directions, yielding a large circle or an ellipse. A good estimator has small and equal
spread in both the horizontal and vertical direction. Larger spread in one direction
indicates dominance of a single type of bias. In the above plot, the vertical axis
corresponds to the Gaussian with variance equal to RMSE2. Vertical spread indicates that
bias due to number of intermediate states dominates the uncertainty estimate, while
30
horizontal spread indicates bias due to number of samples dominates the uncertainty
estimate.
estimates that are only rigorously true in the Gaussian limit. As long as the variances are
bounded (not infinite), the central limit theorem ensures that with enough samples,
variances will indeed converge to the Gaussian limit. However, for finite number of
samples, this assumption must be tested, not simply assumed, or else we run the risk of
underestimating the chance of large deviations (black swans) from the average value.
To test whether the shape of the free energy distribution is Gaussian or not, histograms of
free energy distributions are plotted against Gaussians using the free energy estimate as
mean and analytical uncertainty estimate as standard deviation for each method.
corresponding to 450 ns and 51 states runs, bias analysis, and reliability estimates of
free energy and uncertainty predictions for UA methane solvation, for full and sparse
sets. Figures 4-7 provide comparison of the accuracy and precision in free energy and
uncertainty predictions for all ten methods for the three test cases.
Tables of the
uncertainty analysis for the dipole inversion and anthracene hydration free energy test
cases are presented in the supplementary material as tables S1-S10. Bivariate Gaussian
plots presenting the analysis of reliabilities of free energy methods are presented in
Figures 8-10. Figures 11 and 12 compare the actual distribution of the free energies
(computed using 100 5ns simulations with full and sparse sets) with Gaussians using
the corresponding variance and mean estimates of the free energy methods, along with
the Gaussians of 450 ns trajectory solution and the 51 state simulation set for
comparison. In some plots, we omit GDEL and GINS, as their errors are significantly
larger than the scale of errors of the other methods in the plots.
3.3.1 Validation of uncertainty estimates. The first column in Table 1 is the free energy
change calculated as an average over the 100 uncorrelated repetitions. The next three
columns are the average of the analytical estimate of uncertainty over all repetitions
(G), the sample standard deviation (G) and the average bootstrap estimate of the
uncertainty over all repetitions (G)bs. The last column gives the percent deviation of
the analytical estimate of uncertainty from the sample standard deviation along with the
propagated error. Standard deviations {((G)),((G)), ((G)bs} for the error
estimate distributions are also reported.
32
We find that the analytic error estimate of most methods underestimates the true sample
standard deviation. In some of the methods this deviation is within either one or two
standard deviations, and thus likely within the statistical noise. Examining the data in
Tables 1, S2, and S3 for TI and TI3, analytical and sample estimates of uncertainty differ
by one or less standard deviation for all but the anthracene solvation with the full set,
where it differs by approximately two standard deviations, and is likely therefore noise.
For IEXP and DEXP, analytical and sample estimates also differ by one or in some cases
two standard deviations. However, analytical and sample uncertainty estimates for BAR,
RBAR and UBAR have differences of up to five standard deviations which indicates that
the BAR uncertainty estimates can be significantly inaccurate for multiple intermediate
states. In contrast, analytical uncertainty estimates with MBAR have less than one
standard deviation from sample uncertainty estimates. Over all methods, MBAR
analytical error estimates deviate least from the sample estimate.
The percentage deviation of analytical estimate from the sample uncertainty estimate is
listed explicitly in column six of Table 1. A negative percent deviation indicates
underestimation of uncertainty and a positive percent deviation indicates overestimation
of uncertainty by the analytical uncertainty estimate. The percentage deviation of the
analytical uncertainty estimate from sample uncertainty for TI is -78% and for MBAR
is -108%, in both cases, within the noise, while for BAR, the deviation from the true
uncertainty is -315%, which is clearly statistically significant. Large deviation of the
analytical estimate from sample standard deviation indicates poor accuracy of estimator
in estimating uncertainty. Similar deviations in analytical and sample estimates are seen
33
for UBAR and RBAR. As with BAR and its variants, analytical and sample estimates
GDEL and GINS are off by 305%.
Bootstrap uncertainty is clearly a robust alternative to the sample standard deviation for
all methods. Bootstrap estimates of the uncertainty (in column five of Table 1) are very
close to sample uncertainty (in column four of Table 1). Specifically in the case of BAR,
RBAR, UBAR, GDEL and GINS bootstrap estimates of uncertainty are better than the
analytical counterparts. In cases where the analytical estimate does not accurately predict
the sample standard deviation, such as with BAR, RBAR, and UBAR, the bootstrap
method provides a useful estimate of the error in the free energy without a need for
performing repeated sampling.
Figure 4 visually demonstrates the efficiency of free energy methods and the consistency
of uncertainty estimators. Short bars indicate precise free energy estimates. Equal height
bars indicate that the analytical and bootstrap uncertainties are consistent with the sample
standard deviation. For the full set MBAR, TI and TI3 predict free energies with the
highest precision and with the most reliable error estimate. BAR, RBAR and UBAR
analytical estimates have large deviations from the standard deviation estimate but the
bootstrap estimate closely matches sample standard uncertainty estimate. IEXP, DEXP
have the largest deviations particularly for large transformations like dipole inversion and
anthracene solvation.
34
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
9.081
0.1070.002
0.1150.009
0.1060.006
-7.17.9
TI3
9.008
0.1110.003
0.1190.012
0.1100.007
-6.99.5
DEXP
8.984
0.3200.255
0.5560.122
0.3190.267
-42.547.5
IEXP
8.928
0.1040.002
0.1100.011
0.1030.006
-5.89.6
UBAR
8.936
0.0790.002
0.1060.010
0.0980.005
-25.37.5
BAR
8.933
0.0750.001
0.1090.010
0.0990.006
-31.36.3
RBAR
8.937
0.0750.001
0.1090.010
0.0990.006
-31.06.7
MBAR
8.929
0.0950.002
0.1060.010
0.0940.005
-9.98.6
GDEL
7.042
0.0930.002
0.1360.011
0.1210.008
-31.75.8
GINS
1.097
0.2530.008
0.3990.032
0.4000.169
-36.55.5
35
36
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
2.545
0.175 0.007
0.177 0.014
0.175 0.011
-1.5 8.7
TI3
3.792
0.214 0.009
0.217 0.017
0.214 0.014
-1.5 8.7
DEXP
5.631
1.343 0.524
3.179 0.679
1.628 1.186
-57.8 18.8
IEXP
9.091
0.666 0.101
0.638 0.042
0.683 0.108
4.4 17.3
UBAR
8.954
0.344 0.018
0.358 0.024
0.351 0.024
-3.9 8.2
BAR
8.926
0.225 0.006
0.263 0.014
0.232 0.014
-14.4 4.9
RBAR
8.927
0.226 0.005
0.260 0.016
0.233 0.014
-13.2 5.6
MBAR
8.928
0.232 0.006
0.262 0.015
0.232 0.014
-11.4 5.4
GDEL
-3.833
0.112 0.004
0.189 0.014
0.200 0.016
-40.6 4.9
GINS
-1.68E+32
3E+30 34E+30
13E+32 10E+32
1E+32 15E+32
-99.7 2.7
37
38
For the sparse set, as shown in Tables 2, S2 and S4 and Figure 5, TI and TI3 still show
the lowest percent deviation from sample standard deviation. However, the free energy
estimate of methane solvation is off by 6.5 kJ/mol for TI and 5 kJ/mol for TI3. GDEL
and GINS have clearly un-converged free energy and uncertainty estimates; the free
energy estimate of methane solvation for GDEL is off by 12 kJ/mol and for GINS is off
by ~1032 kJ/mol. This clearly indicates the failure of the Gaussian approximation of U.
The free energy estimate of methane solvation for DEXP differs from the converged
answer by 3 kJ/mol and its uncertainty estimate is 5 times larger than the largest
estimated uncertainty shown in Figure 5. IEXP differs from the converged answer by
only 0.2 kJ/mol but its uncertainty estimate is twice the largest plotted uncertainty.
MBAR again has the lowest and most consistent uncertainty estimates. However, unlike
with the full set, BAR and RBAR have uncertainty estimates which are as accurate as
MBAR to within statistical noise.
Bootstrap and analytical estimates of the error in the sparse set are slightly lower than
the sample standard deviation for acceptance ratio methods for methane solvation, though
not for the other two molecules. For this set of states, MBAR does not provide the
same clear advantage over BAR in estimating the uncertainty as with the full set. We
hypothesize that this advantage may only exist when the overlap between states is
somewhat high. However, even our full set uses relatively aggressive spacing for
typical free energy calculations. MBAR analytical estimates will generally have the
advantage over BAR analytical estimates as with MBAR it is not necessary to attempt to
determine whether the calculation is in the low overlap or high overlap regime.
39
3.3.2 Analysis of bias in free energy estimates: Statistical uncertainty is the most
important measure to quantify to understand the reliability of the free energy estimate but
understanding systematic bias is also important. By comparisons with free energy
estimate using large number of samples we can test whether or not the method is
asymptotically biased. By comparisons with the free energy estimate using large number
of intermediate states we can find how sensitive a methods accuracy is to the overlap
between intermediate states. Table 3 shows the free energy and uncertainty estimates
predicted by different methods for UA methane solvation for large number of samples
(450 ns trajectory) and large number of intermediate states (51 states). Tables 4 and 5
include estimates of both types of bias in free energy estimates, with respect to number of
samples and with respect to number of intermediate states, and the corresponding root
mean square error estimates for UA methane solvation.
MBAR, BAR, RBAR, UBAR, TI, and TI3 have very low bias in free energy estimates
with respect to number of samples. The acceptance ratio methods, MBAR, BAR, RBAR,
and UBAR have biases with respect to number of states within the statistical noise. TI
and TI3 show larger bias in free energy estimates with respect to number of states than
the other methods. DEXP and IEXP show large biases in free energy estimates both with
respect to number of samples and states. GDEL and GINS show the largest bias with
respect to number of states. All methods show a larger bias with respect to the number of
states compared to the bias with respect to number of samples. However, this may be
an artifact of the lower precision of bias determination as a function of states.
40
Table 3. Free energy estimates and corresponding uncertainty estimates in the large
number of samples (450 ns) and large number of intermediate states (51 states) for UA
methane solvation. Bootstrap estimates are reported as they are better than analytical
estimates. All quantities are in kJ/mol.
((G)
((G)
((G)
Method (G))450,full
(G))450,sp
(G))51,full
TI
9.0850.010
2.5410.017
8.9200.041
TI3
9.0160.010
3.7860.021
8.9230.041
DEXP
9.0830.083
12.6574.018
8.9210.043
IEXP
8.9320.010
8.9860.074
8.9280.040
UBAR
8.9390.010
8.9300.035
8.9220.040
BAR
8.9390.010
8.9200.023
8.9210.040
RBAR
8.9400.010
8.9230.024
8.9220.040
MBAR
8.9360.009
8.9210.023
8.9240.036
GDEL
7.0480.012
-3.8370.020
8.8470.043
GINS
1.1010.041
-1.5E+323.2E+29
8.8410.040
The subscript (450,full) denotes the free energy estimate for 450 ns and full lambda set.
(450,sp) denotes the same for sparse lambda set.
41
Table 4. Bias estimates due to number of samples and number of lambda states for full
and sparse sets for UA methane solvation. All quantities are in kJ/mol.
(Bias1
(Bias2
(Bias1
(Bias2
(Bias1))450,full
(Bias2))51,full
(Bias1))450,sp
(Bias2))51,sp
TI
-0.0050.015
0.1600.042
0.0050.024
-6.3740.045
TI3
-0.0050.015
0.0880.042
0.0060.030
-5.1310.046
DEXP
-0.1000.092
0.0620.059
-7.0444.021
-3.3080.150
IEXP
-0.0020.014
0.0020.041
0.0970.100
0.1550.078
UBAR
-0.0030.013
0.0140.041
0.0240.049
0.0320.053
BAR
-0.0030.012
0.0150.041
0.0090.032
0.0080.046
RBAR
-0.0050.012
0.0130.041
0.0040.033
0.0050.046
MBAR
-0.0070.013
0.0050.037
0.0070.033
0.0040.043
GDEL
-0.0030.015
-1.8020.044
0.0010.023
-12.6830.044
GINS
-0.0010.048
-7.7410.047
-1.4E+313.4+E30
-1.7E+323.4E+30
Method
42
((RMSE2)
((RMSE1)
((RMSE2)
(RMSE1))sp
(RMSE2))sp
TI
0.1480.054
0.2080.084
0.2370.076
6.3770.178
TI3
0.1530.059
0.1720.069
0.2900.093
5.1350.218
DEXP
0.5310.474
0.4900.512
7.5962.138
4.4781.837
IEXP
0.1420.051
0.1420.052
0.8970.271
0.9040.273
UBAR
0.1210.054
0.1210.055
0.4730.147
0.4730.148
BAR
0.1200.055
0.1200.056
0.3300.103
0.3300.103
RBAR
0.1200.055
0.1210.056
0.3310.102
0.3310.102
MBAR
0.1320.052
0.1320.052
0.3340.102
0.3340.102
GDEL
0.1520.065
1.8070.137
0.2010.090
12.6820.190
GINS
0.4270.205
7.7480.402
3.1E+321.5E+33 1.5E+321.5E+33
Figures 6 and 7 show the biases for different methods for all three test cases. When error
bars are larger than the bias bars, for example, for MBAR in methane solvation,
comparisons for accuracy between free energy methods become difficult because the bias
is lost in the statistical noise. DEXP and IEXP have significant bias for all the cases
43
except in methane solvation. For UA methane solvation TI has the longest bar indicating
the largest bias due to number of states.
MBAR, BAR, UBAR, RBAR have consistently less bias compared to other methods and
hence are more accurate for estimating free energy. TI, TI3 rank next in accuracy in
prediction of free energy estimates followed by IEXP, DEXP and finally GDEL, GINS.
For dipole inversion, DEXP and IEXP show the largest bias both with respect to results
from large number of samples and large number of intermediate states. All other methods
for dipole inversion, including GINS and GDEL show almost equal biases within
statistical error limits. For anthracene hydration free energies BAR, UBAR, RBAR,
MBAR show moderate biases for the full set compared to TI and TI3, but these results
are again likely to be noise (see Table S9). DEXP and IEXP as usual show high biases.
44
Figure 6. Bias plots for different test cases for the full set. DEXP, TI and TI3 show
numerically significant bias for methane, while all methods show moderate bias with
respect to number of states for anthracene.
45
Figure 7. Bias plots for different test cases or sparse set. DEXP and IEXP show large
biases both due to number of samples and due to number of intermediate states.
46
3.3.3 Overall reliability of a free energy estimator: The knowledge of bias and
uncertainty can now be put together to analyze the reliability of a method in estimating
the free energy. The bivariate Gaussian plots (Figures 8, 9 and 10) show the reliability of
a method in predicting the free energy and the uncertainty in the predicted free energy for
a given test case. For each figure, the first two columns contain reliability plots for full
state runs and the last two show the results of the sparse state runs.
Figure 8 for UA methane solvation, shows that MBAR, RBAR, BAR and UBAR have
small and equal spreads in both horizontal and vertical directions indicating that these are
reliable estimates of free energy both for sparse as well as full sets. TI and TI3 give
reliable estimates of free energy only in full state runs but are dominated by bias due to
number of intermediate states with the sparse set. IEXP has lower RMSE compared to TI
and TI3. GDEL and GINS are unreliable in both the full and sparse sets.
In Figure 9 for dipole inversion, free energy estimates from TI and TI3 are comparable in
reliability with MBAR, BAR, RBAR, UBAR. GDEL and GINS work for dipole
inversion given the poor performance in other methods. This can be explained in the light
of the work done by Hummer, Pratt and Garcia on free energy of ionic hydration, 56 in
which they found that that electrostatic potential energy distribution follows Gaussian
behavior.
47
In Figure 10, we see that GDEL, GINS, DEXP are unreliable estimators of anthracene
hydration free energy for both the full and sparse set, with both low accuracy and
precision in their predicted free energy and uncertainty estimates. Anthracene solvation
free energy is a harder problem and no method is as accurate as with the other two
molecular cases. TI, TI3, IEXP, BAR, RBAR, UBAR, and MBAR perform equally well
within noise for full set. In the sparse set IEXP and UBAR become significantly
worse than the other methods with TI being slightly worse. Improved performance of TI
relative to acceptance ratio methods is because the sparse set for anthracene solvation
case (4 states between end points) is not as aggressive as the methane solvation case
(only 1 state between end points).
48
Figure 8. Bivariate Gaussian plots for UA methane solvation. Note how TI and TI3 fail
in reliability test for sparse set. GDEL and GINS are not at all reliable. MBAR and
BAR are accurate but the estimate of the precision in BAR is misleading as it
underestimates uncertainty, as shown in Figures 4 and 5. SP after the method name
indicates the results of the sparse set.
49
Figure 9. Bivariate Gaussian plots for dipole inversion. TI is reliable for this molecule,
but DEXP, IEXP, GDEL and GINS again are the least accurate and precise of all the
methods studied.
50
Figure 10. Bivariate Gaussian plots for Anthracene hydration free energy. The effect of
bias due to intermediate states is evident in almost all methods as all spreads are elliptical
in vertical direction. TI3 and MBAR both appear reliable, especially for the full set.
51
3.3.4 Testing the Gaussian distribution of free energies: Asymptotic error estimate
methods assume normal distribution of error, as does the use of a standard deviation to
describe the error distribution. It is important that we check to see if this assumption is
actually valid. Figure 11 demonstrates graphically the distribution of free energy results
for each test case, comparing it with a Gaussian with the mean and the variance of the
distribution. The free energy distributions are plotted as histograms with the optimal bin
width calculated from Scotts formula57, with optimum bin width h = 3.5 / n(1/3), where n
is the number of samples i.e. 100, and is the standard deviation. For calculating the
optimum bin width for the first eight methods we have used the mean of the standard
deviations from the 100 repetitions predicted by MBAR. For GDEL and GINS, their own
mean of predicted uncertainties are used to calculate the optimum bin width because
uncertainties estimated with GDEL and GINS significantly drastically from that of
MBAR.
In Figure 11, the blue curve is the histogram of free energies and the black curve is the
Gaussian with the mean and the standard deviation estimated by the specified method.
The shapes of the blue curve should match the black curve within noise if the free energy
distribution is indeed Gaussian. Additionally two more Gaussians are plotted, one
(magenta) with the mean (G)450ns and standard deviation ((G)450ns), and the other
(cyan) with the mean (G)51states and standard deviation ((G)51states ). If the mean of the
blue curve matches the means of the cyan and magenta curves, it indicates low bias in
free energy estimate due to both number of samples and number of states. The tighter the
spread of black and blue curves, the more precise the free energy estimate will be.
52
Figure 11. Each subplot has a comparison between the distribution of estimated free
energies from 100 repetitions (in blue), Gaussian with the mean G and standard
deviation (G) from 100 repetitions (in black), Gaussian with the mean (G)450ns and
standard deviation ((G)450ns) (in magenta), Gaussian with the mean (G)51states and
standard deviation ((G)51states ) (in cyan) for the full set for methane solvation.
53
Figure 12. Each subplot has a comparison between the distribution of estimated free
energies from 100 repetitions (in blue), Gaussian with the mean G and standard
deviation (G) from 100 repetitions (in black), Gaussian with the mean (G)450ns and
standard deviation ((G)450ns) (in magenta), Gaussian with the mean (G)51states and
standard deviation ((G)51states ) (in cyan) for sparse set for methane solvation.
54
In Figures 11, 12, and S1-S4, we see that when the variances are well converged then
distributions of free energies from all methods except DEXP well approximate the
Gaussian, even when the variances are large. However, in several cases the variances are
so large that the comparisons become statistically meaningless. Besides DEXP, which
fails in all cases, for dipole inversion IEXP and UBAR are sufficiently noisy that the
comparison to a Gaussian is problematic. For the sparse set IEXP and UBAR fail
completely to be Gaussian. Similarly, for anthracene solvation IEXP and UBAR fail to be
Gaussian for the sparse set. The 51 state results are omitted from GDEL and GINS
plots as they lie outside the plot axis limits. Interestingly, we find that typically errors are
distributed normally with the variance given by the analytical estimate even if the bias in
the free energy is very large.
3.3.5 Convergence properties of free energy estimates: The true free energy estimate is
not necessarily the experimental value of the free energy change of the process, but
instead the infinite sampling limit of the particular choice of molecular model. We see
that with large number of intermediate states all methods converge to the same value
(Figure 13), whereas the 450 ns results with the sparse data set vary for different
methods, with as usual large deviations seen in GDEL and GINS, indicating significant
bias with respect to the overlap between states. Given sufficient sampling, increasing the
number of states appears to be the best way to obtain asymptotic convergence to the
true answer for the model.
55
Figure 13. Free energies estimated using large number of intermediate states and a large
number of samples converge to a single value for all free energy estimators.
3.3.6 Amount of time required for free energy estimation methods: We chose the
calculation of anthracene solvation over 4.5 ns with the full set as our test system to
compare computational time required by methods to compute free energies, because the
system has the largest number of intermediate states with large free energy changes
between states. We report the time required to calculate the free energies 201 times (for
the original set and for 200 bootstrap sets) to eliminate variability caused by
computational overhead in single calculations. The time required to read in data, make
bootstrap samples and perform bookkeeping required by all methods was 249.5 sec for all
methods, and subtracted from the total time to yield the computational time of each
method alone.
56
Table 6. Time consumed by different methods to calculate the free energy 201 times of
anthracene solvation.
Method Time taken (s)
TI
0.2
TI3
5.8
DEXP
13.5
IEXP
11.5
UBAR
15.3
BAR
93.5
RBAR
1148.2
MBAR
4913.5
GDEL
4.0
GINS
4.3
From Table 6 it is evident that MBAR takes the longest of all methods as it processes
information from all the intermediate states to give an estimate of free energy and
uncertainty. RBAR is the next most computationally costly. RBAR takes more time
compared to BAR because multiple BAR calculations are performed over a range of free
energies at each intermediate stage if a large range of possible values for the selfconsistent constants are evaluated. UBAR takes less time compared to BAR because only
a single iteration is performed at each stage. DEXP and IEXP are similar in cost UBAR.
57
GDEL and GINS take less time compared to DEXP and IEXP. TI3 takes slightly longer
compared to TI as it fits a spline of higher degree, but both are much cheaper than any
others. However, the total time required even by MBAR is orders of magnitude less than
the time required to perform the sampling, so the higher cost of MBAR is not an obstacle
in most cases.
3.4. Conclusions:
In this paper we have proposed the first iteration of a set of test sets which can be used
for benchmarking free energy calculation methods for small molecule solvation. We have
demonstrated the utility of this test set by comparing ten equilibrium free energy methods
on three test cases for molecular solvation, with different spacing between intermediate
states.
deviation, the analytical estimate, and the bootstrap estimateas well as the uncertainty
in each of these estimates of the uncertainty. We also calculated biases in free energy
estimates at the large number of samples limit and large number of intermediate states
limit separately. We graphically demonstrated the effect of the variance and two separate
types of bias by bivariate Gaussian plots expressing the overall reliability of the methods.
We demonstrated that bootstrap sampling accurately predicts the properties of the sample
distribution observed from 100 independent simulations for all the free energy methods.
We also showed that the histogram of free energies from 100 independent simulations
has a Gaussian form for TI, TI3, BAR, RBAR, UBAR, MBAR, GDEL and GINS, but
that IEXP and DEXP deviate from the usual trend.
58
We have found that MBAR is the most reliable of all free energy estimators, showing
consistency in accuracy and precision in both free energy and uncertainty prediction. TI
and TI3 are better uncertainty estimators compared to BAR, UBAR, RBAR, with equal
performance to MBAR when sufficient intermediate states are included, but are biased
with respect to the number of intermediate states. When the U/ vs. curve has low
curvature, such as in dipole inversion both TI and TI3 are equally reliable. But when the
curve is non-linear i.e. when LJ spheres grow or disappear TI3 gives better estimates of
free energy than TI. BAR and RBAR have relatively negligible bias but their uncertainty
estimates are frequently underestimated by 25 to 30% when overlap between states is not
negligible. UBAR is often as good as BAR and RBAR, but can fail with low numbers of
intermediate states. IEXP and DEXP are less reliable than TI and acceptance ratio
methods, and should be avoided if samples can be collected from all intermediates in
both the forward or back direction, or if the derivative of the Hamiltonian along the
pathway can be computed. IEXP does works in some cases but in general IEXP and
DEXP give poor estimates for uncertainty and free energies. GINS and GDEL do not
compare well with the other methods in all the test systems except dipole inversion test
case. They only work if there is large number of intermediate states, or if the distributions
are inherently Gaussian. However, even here they are not as accurate or precise as the
other methods.
Computationally, MBAR is the most expensive, but the amount of time required is orders
of magnitude less than the time required for collecting data. UBAR takes less time
compared to RBAR and BAR and should only be considered as a quick and easy (but not
so reliable) alternative to BAR and RBAR. RBAR requires some knowledge of the
59
maximum free energy gap, but does not require storing all the energy data. BAR requires
storing energy data but requires no knowledge about the size of the maximum free energy
difference. GDEL and GINS take less time compared to DEXP and IEXP but they
heavily sacrifice accuracy and precision for speed in virtually all cases. Finally, TI takes
least time to estimate free energies and uncertainties but a little extra computation (fitting
cubic splines) in TI3 improves the accuracy in free energy estimate using thermodynamic
integration. However, the improvements from using cubic splines are not enough to
prevent TI3 from still having significant amounts of bias when there are few intermediate
states. We summarize these conclusions in Table 7.
Table 7. Summary of all statistical tests are presented.
Reliability of free energy estimate (high to low)
MBAR>BAR=RBAR>UBAR>TI3>TI>IEXP>DEXP>GDEL=GINS
Reliability of uncertainty estimate (high to low)
MBAR>TI3=TI>BAR=RBAR>UBAR>IEXP>DEXP>GDEL=GINS
Computational cost (high to low)
MBAR>RBAR>BAR>UBAR>IEXP=DEXP>GDEL=GINS>TI3>TI
Is distribution of estimated free energies Gaussian?
Yes for {MBAR, RBAR, BAR, UBAR, TI, TI3, GDEL, GINS}
No for {IEXP, DEXP}
Is bootstrap better than analytical uncertainty estimate?
Yes for {RBAR, BAR, UBAR, GDEL, GINS}
Both equally good for {MBAR,IEXP,DEXP,TI,TI3}
60
The main reason to explore the PME parameter space is to choose simulations that give
sufficient accuracy for the application, but are fast enough. Usually this is done through
the advice of experienced practitioners or by choosing defaults suggested in the
62
simulation package. Abraham31 and Wang32 propose rational ways of selecting PME
parameters. They tune the PME parameters by minimizing the computational cost for a
given accuracy in force calculations. Their technique involves full scale molecular
simulations at different sets of parameters; they iterate over parameter sets until they
reach a desired accuracy in force calculations. In this study, we introduce an alternate
method to select PME parameters which involves evaluating the computational efficiency
of multiple PME parameter sets with data sampled from a single set. The simulation is
run only once. New energies are evaluated by reading the same trajectory but with new
PME parameter values,. The new evaluated energies along with the sampled energies
can be used to evaluate observables at the new PME parameter sets using MBAR. The
aim is to find the set of PME parameters which, for a given level of error in our desired
observable, results in minimum simulation time.
We use the three molecular transformations defined in our benchmark set as test systems
since these represent a substantial set of alchemical changes. PME parameters good for
these three systems should be good for simulating analogous transformations.
The
observables we choose for this study are (a) free energy estimate of the three molecular
transformations (b) enthalpy change during molecular transformations which for methane
and anthracene solvation is simply the enthalpy of solvation (c) molar heat of
vaporization of water.
63
4.1 Methods
4.1.1 Difference between free energy estimates calculated with two different PME
parameter sets
A free energy estimator requires samples generated using equilibrium simulations
performed at the PME parameter set of interest. Calculating free energy using methods
discussed in Chapter 3 requires simulation at each and every set of PME parameters.
Carrying out these simulations will tremendously increase the computational costs and
make this process impractical. However, MBAR can estimate free energies and
equilibrium expectation values of enthalpies corresponding to different simulation inputs
without running new sets of equilibrium simulations. This is achieved by using the
samples from equilibrium simulations corresponding to just one simulation input set and
energies calculated for different simulation inputs using the same trajectory. Let us look
at the free energy estimating equation of MBAR again.
Gi
exp[ U i ( xkn )]
Nk
ln
k 1 n1
N
k ' 1
k'
Here K is the total number of states and Nk is the total number of uncorrelated samples
available from an equilibrium simulation at state k. Gi is the free energy associated with
state i. Ui(xkn) is the potential energy of the nth sample belonging to equilibrium simulation
of state k but evaluated at state i. Now suppose we have equilibrium samples from a
simulation done with simulation input set Si1 which has K states defining full molecular
transformation. We can also calculate the potential energies using simulation set Si2,
which also has K equilibrium states. The coordinates are read from the trajectory and new
64
energies are calculated according to the new set of simulation inputs. We can then create
a 3 dimensional matrix of size (2K, 2K, Nk ) as defined below.
U 0 ( x0,1N 0 )
U 0 ( x K 1,1N K 1 )
U
U 0 ( x K ,1N K )
U 0 ( x 2 K 1,1N 2 K 1 )
U K 1 ( x0,1N 0 )
U K ( x0,1N 0 )
U K 1 ( x K 1,1N K 1 ) U K ( x K 1,1N K 1 )
U K 1 ( x K 1,1N K )
U K ( x K 1,1N K )
U K 1 ( x 2 K 1,1N 2 K 1 ) U K ( x 2 K 1,1N 2 K 1 )
N0
N K 1
N
NK
N 2 K 1
U 2 K 1 ( x0,1N 0 )
U 2 K 1 ( x K 1,1N K 1 )
U 2 K 1 ( x K 1,1N K )
U 2 K 1 ( x 2 K 1,1N 2 K 1 )
(17)
The top left quarter of the U matrix has energies calculated at K states using
configurations from K different equilibrium simulations. The top right quarter of the U
matrix has energies evaluated at K states but with different simulation inputs using the
same configurations from previous K equilibrium simulations. These form a new set of K
thermodynamic states. Notice that now we have a total of 2K thermodynamics states.
U0(xo.n) evaluated at simulation input set Si1 is different compared to U0(xo,n) evaluated at
simulation input set Si2. U0(xo,n) evaluated with Si2 in the U matrix is present as a
different thermodynamic state UK(x0,n). Similarly UK-1(xo.n) evaluated at simulation input
set Si1 is different compared to UK-1(xo,n) evaluated at simulation input set Si2, which in
the U matrix appears in the last column as a different thermodynamic state U2K-1(x0,n). The
same holds true for other intermediate states as well and hence we have now a total of 2K
thermodynamic states.
65
Since we only have samples from the first 0 to K-1 states, NK = NK+1== N2K-1=0. Notice
that since NK = NK+1== N2K-1=0 the energies in the bottom half of the U matrix do not
participate in the calculation of Gi in the MBAR formula. In fact we dont have these
energies, as we have not done any equilibrium simulations at simulation input set Si2 and
hence dont have samples or configurations. Still we can get estimates of free energy at
these states using the information we have. We can re-write the U and the N matrices as:
U 0 ( x k ,1N k )
U ( x
)
U K 1 k ,1N k
0
U K 1 ( x k ,1N k ) U K ( x k ,1N k )
U K 1 ( x k ,1N k ) U K ( x k ,1N k )
0
0
0
0
U 2 K 1 ( x k ,1N k )
U 2 K 1 ( x k ,1N k )
N0
N K 1
N
0
(18)
The U matrix and the N vector are the inputs to the MBAR Eq. 11 which is solved selfconsistently for Gi for all 2K states. Once we have solved for the free energies we can we
can easily write a matrix G which has pair wise free energy difference estimates.
G0 G0
G G K 1
G 0
G0 G K
G0 G2 K 1
G K 1 G0
G K 1 G K 1
G K 1 G K
G K 1 G2 K 1
G K G0
G K G K 1
GK GK
G K G2 K 1
66
G2 K 1 G0
G2 K 1 G K 1
G2 K 1 G K
G2 K 1 G2 K 1
(19)
(G0 G0 )
(G0 G K 1 )
(G )
(G0 G K )
(G0 G2 K 1 )
(G K 1 G0 )
(G K G0 )
(G K 1 G K 1 ) (G K G K 1 )
(G K 1 G K )
(G K G K )
(G K 1 G2 K 1 ) (G K G2 K 1 )
(G2 K 1 G0 )
(G2 K 1 G K 1 )
(G2 K 1 G K )
(G2 K 1 G2 K 1 )
The uncertainties in the pair wise difference estimates can be estimated using Eq. 12 by
Shirts and Chodera45. The free energy and the uncertainty estimates can be directly read
from these matrices for simulation input set Si1 (GK-1 - G0)(GK-1 - G0) and with
simulation inputs Si2 (G2K-1 GK)(G2K-1 GK). The difference in the free energy
estimates (G)Si2-Si1 is simply the difference between the two free energy estimates.
(G)Si2-Si1 =(G)Si1 - (G)Si2 = (GK-1 - G0) -(G2K-1 GK)
(20)
The variance estimate of (G)Si1-Si2, is not the trivial sum of the variances in (G)Si1,
var(G)Si1 and in (G)Si2, var(G)Si2, since the two free energy estimates, (G)Si2 and
(G)Si1, are correlated. We derive the variance equation for (G)Si2-Si1 using the original
covariance formula.
(G)Si2-Si1 = cov((GK-1 - G0) -(G2K-1 GK) , (GK-1 - G0) -(G2K-1 GK) )
= cov((GK-1 - G0), (GK-1 - G0)) + cov((G2K-1 GK), (G2K-1 GK))
- 2 cov((GK-1 - G0),(G2K-1 GK))
(21)
Where cov is the covariance. The above equation finally yields the following:
(G)Si1-Si2 = cov(GK-1, GK-1) + cov(G0, G0) + cov(G2K-1, G2K-1) + cov(GK, GK)
- 2 {cov(GK-1, G0) + cov(G2K-1, GK) + cov(GK-1, G2K-1) - cov(GK-1, GK)
- cov(G0, G2K-1) + cov(G0, GK)}
67
(22)
Each term in the above equation can be easily extracted from the asymptotic covariance
matrix, ij estimated by Eq. 8 in the paper by Shirts and Chodera.45
68
H 0
H 0
H K 1
K 1
H
H K
H K
H 2 K 1
H 2 K 1
H 0
H 0 H K 1
H 0 H K
H 0 H 2 K 1
H
( H 0 H 0)
( H H
)
0
K 1
(H )
( H 0 H K )
( H 0 H 2 K 1 )
H 0
H K 1 H K 1
H K 1 H K
H K 1 H 2 K 1
H
K 1
H 0
H K H K 1
H K H K
H K H 2 K 1
H
H 2 K 1 H K 1
H 2 K 1 H K
H 2 K 1 H 2 K 1
H
( H K 1 H 0 )
( H K H 0)
( H K 1 H K 1 ) ( H K H K 1 )
( H K 1 H K )
( H K H K )
( H K 1 H 2 K 1 ) ( H K H 2 K 1 )
2 K 1
( H 2 K 1 H 0 )
( H 2 K 1 H K 1 )
( H 2 K 1 H K )
( H 2 K 1 H 2 K 1 )
(23)
The difference, or the error between the enthalpy differences simulated with two different
sets of simulation inputs can then be easily extracted from the H matrix or the H vector.
(H)Si2-Si1 =(H)Si1 - (H)Si2 = (HK-1 -H0) - (H2K-1 - HK)
(24)
The uncertainty in (H)Si2-Si1 is again not trivial. It is derived using the same formalism
as was used for calculating (G).
(H)Si2-Si1 = cov((HK-1 - H0) -(H2K-1 HK) , (HK-1 - H0) -(H2K-1 HK) )
= cov((HK-1 - H0), (HK-1 - H0)) + cov((H2K-1 HK), (H2K-1 HK))
- 2 cov((HK-1 - H0),( H2K-1 HK))
69
(25)
(26)
The first two variance terms in Eq. 26 can be directly read from the (H) matrix. Unlike
in the case of (G) the covariance terms cannot be read directly from the covariance
matrix which is computed anew for the expectation values.
The estimator of uncertainty for equilibrium expectation values is given by Eq. 16 in the
paper by Shirts and Chodera.45 It has the following form
2
)
2 A cov(c A / ca , c A / ca ) A 2 (
AA
aa
Aa
Using Eq. 10 in the paper by Shirts and Chodera45 we can derive an expression for the
covariance terms of the form cov(cA/ca, cB/cb).
)
cov(c A / ca , cB / cb ) A B (
AB
Ab
ab
aB
(27)
The four covariance terms in Eq. 26 can be calculated using the above relationship. Thus
we can have (H (H)) for a pair of simulation input sets. For very strict
simulation parameters, these should also converge to a constant value.
methane sphere in the final state, it has only kinetic energy. If we could separate the
kinetic energy associated with the ghost sphere from the total enthalpy of the box, all we
are left with is the enthalpy of water at 300K. The kinetic energy of a united atom
methane sphere can be approximated by using the equipartition theorem and the degrees
of freedom associated with the sphere. There are three degrees of freedom for a sphere
since it has 3 translational degrees of freedom, one in each x, y, and z direction.
Therefore K.E of a UA methane sphere is 1.5kbT, where kb is the Boltzmann constant
(0.00831657534 kJ/mol/K) and T is the temperature i.e. 300K. We are interested in
finding the enthalpy of vaporization at a new simulation input set Si2. Therefore we
choose to work with the enthalpy of the final state corresponding to Si2 which is 2K-1.
H( 893 water molecules @ 300K) = H2K-1 3/2 x kb x 300
(28)
H( 893 water molecules @ 300K) enthalpy corresponds to the energy of 893 water molecules so we
need to divide this enthalpy by the number of water molecules to make it independent of
system size.
Hwater @ 300K = ( H2K-1 3/2 x kb x 300) / 893
(29)
Enthalpy of vaporization is equal to the amount of enthalpy change from the initial liquid
state (i.e. Hwater @ 300K ) to the final gas state. If we assume ideal gas conditions, i.e. no
potential energy, then we can write the enthalpy of the gas state as a sum of kinetic
energy and the PV (pressure times the volume of the simulation box) term. Using the
equipartition theorem and the degrees of freedom for water molecule, we can write the
kinetic energy of water molecules at 300 K. A water molecule has 6 degrees of freedom,
3 of which are translational in the x, y, and z directions and the other 3 are rotational
71
about the x, y, and z axes. For a mole of ideal gas we can write PV = RT. Here R is the
gas constant and T is the temperature (300K).
Hvap = Hgas @300 Hliquid @300 = RT+ 6/2 x kb x 300 - Hwater @ 300K
= RT+ 6/2 x kb x 300 - ( H2K-1 3/2 x kb x 300) / 893
(30)
(31)
In this way we can calculate the Hvap(Hvap) for a new set of simulation inputs. Like
(G (G)) and (H (H)), Hvap(Hvap) should also converge to a value
within error at strict simulation parameters. It is possible that a set of simulation inputs
can yield converged (G (G)) with low computational cost but inaccurate
enthalpy estimates. We need simulation parameters which predict Hvap to within
reasonable (Hvap), so that we not only have reliable estimates of free energy but also
reliable estimates of enthalpy to get the bulk properties right, incurring minimal
computational costs.
We have studied 4 PME parameters Coulomb cut-off, Fourier spacing, Ewald tolerance
and, order of interpolation of beta spline. Later we will extend our study to other factors
such as switching distance and cut-off type. We have chosen parameters covering the
72
Simulations performed in Chapter 3 to compare the free energy estimators, all had the
same PME parameters. A switch at 0.88 nm with a Coulomb cutoff of 0.9nm was
applied. A fourth order beta spline interpolation was used with a Fourier spacing of 0.12
nm and Ewald tolerance of 1e-08. We will refer this PME set as Si1. The energies from
these simulations form the top left quarter of the three dimensional U matrix defined in
Eq.18. We choose a new set of PME parameters from the set of values, given in Table 8.
For example, in methane solvation a strict coulomb cut off of 0.6 nm, with Fourier
spacing of 0.2 nm, Ewald tolerance of 1E-06 and order of beta spline interpolation of 4 is
a set of new simulation parameters. Let us call this simulation set Si2. We do not run new
equilibrium simulations at this set of input parameters but rather evaluate energies using
the trajectories from the set of 100 equilibrium simulations performed in Chapter 3. The
energies form the top right corner of the U matrix in Eq.18. We can now evaluate (G
(G))Si1-Si2 and (H (H))Si1-Si2 and
We could repeat the above process for other sets of simulation parameters as well. Table
8 shows that there are 1400 different sets with different permutation and combinations of
73
the PME parameters. For dipole inversion and anthracene solvation we have 1800
different sets of simulation inputs.
1.E-10 1.E-08
1.E-06
1.E-04
1.E-02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
0.20
1440
Dipole inversion
order of interpolation
Ewald Tolernace
Fourier spacing (nm)
Coulomb cut off (nm)
1.E-10 1.E-08
1.E-06
1.E-04
0.045
0.065
0.085
0.105
0.6
0.7
0.8
0.9
1.E-02
0.125 0.145 0.165 0.185 0.205
1.0
1.1
1.2
1.3
1.4
1.5
1800
Anthracene solvation
order of interpolation
Ewald Tolernace
Fourier spacing (nm)
Coulomb cut off (nm)
1.E-10 1.E-08
1.E-06
1.E-04
0.040
0.060
0.080
0.100
0.6
0.7
0.8
0.9
1.E-02
0.120 0.140 0.160 0.188 0.200
1.0
1.1
1.2
1.3
1.4
1.5
1800
We use strict Coulomb cutoff instead of switched cutoff in our parameter search. The
switching distance would have introduced an extra dimension in our existing parameter
space. This would have increased the computational burden. Instead we chose to first find
the best set of parameters in 4 dimensions, i.e. for a given accuracy in the free energy and
the enthalpy estimates which set results in the minimum simulation time in a smaller
74
parameter space which varies in only four dimensions instead of five. Then we can
extend our study to find the best switching distance and the cutoff strategy, i.e. if we want
to use a shift function or a switch function to smoothly zero the potential at the cut-off
length. We could similarly extend our study to the van der Waal cut off length and the
van der Waal switching/shifting distance. For now we show the application of the test set
and MBAR to search the PME parameter space consisting of strict Coulomb cut off,
Fourier spacing, Ewald tolerance, and order of interpolation.
distance for a given Fourier spacing. Each new row contains plots for a different Ewald
tolerance. Each new column contains plots for a different order of interpolation. Plots
with order of interpolation of 3 and the plots with Ewald tolerances of 1E-02, 1E-10 and
1E-08 are noisy and do not converge well. Even orders of interpolation 4 and 6 perform
better than the odd orders of interpolation 3 and 5. Order of interpolation of 4 will be
better than that of 6 as higher order of interpolation results in higher computational cost.
The subplot at row 2, column 2 with order of interpolation of 4 and a Ewald tolerance of
1e-04 has the best convergence amongst all subplots. In this subplot, free energy
estimates even with high Fourier spacing (0.2 nm) and low cut off (0.8 nm) are sub 0.5
kJ/mol. G (G)) for strict parameters, highest cut off (1.3 nm), finest Fourier
spacing (0.04 nm) converge at -0.208 0.032 kJ/mol.
75
Figure 14. Accuracy in methane solvation free energy estimates as a function of different
PME parameters.
Figure 14 has been corrected for this constant bias such that convergence happens at
0kJ/mol. This is better seen when the subplot at row2, column 2 with order 4 and ewald
76
tolerace 1E-4 is expanded in Figure 20. We see that as we reduce the cut off the estimates
become noisier, i.e. have high uncertainty (G)). One obvious reason is that the phase
space associated with small cutoffs and large Fourier spacings are very different from the
phase space associated with the simulation parameters used in equilibrium simulations.
Since these have poor phase space overlap, this indicates that these parameters are not at
all good for simulation. To test this hypothesis we need to first have a measure of extent
of overlap. The paper by Wu and Kofke41 explains the measurement of phase space
overlap between two states using (a) total energy distribution method and (b) relative
entropy measurements. We have used method (a) to visualize the extent of overlap. We
plot the distribution of total energies of initial state (=0) for methane solvation at 4
different PME parameter sets containing 4 parameters [order of interpolation, Ewald Tol.,
Fourier Spacing, Cut-off ].
Figure 15. Inaccuracy in free energy estimates increases with increasing phase space
overlap.
77
Figure 15 contains histograms of total energies of Methane solvation initial state. Only
the black and the red curves show a good overlap and the corresponding Gs, 0.2018
and 0.212 kJ/mol are much smaller to the green and the blue curves which have very low
overlap with the black and red curves. Gs corresponding to the green and the blue
curves are very big, 31.623 and -18.800 kJ/mol, hence these PME parameters are not at
all good for simulations.
We can also calculate the mean square error in G (G)). The deviation of G
from the converged value will give the bias and we know the uncertainty (G)).
Therefore we can just square the bias and add it to the square of the uncertainty to get the
mean square error.
MSEG = (G- (converged val. of G) )2 + ((G)))2
MSEG can serve as a measure of reliability of the parameter sets to predict accurate and
precise free energy estimates.
converge at 42.452 0.089 kJ/mol. The experimental value of the molar enthalpy of
vaporization is 40.79 kJ/mol at 300K. The bias from the experimental results can be
explained in the light of approximations we made while calculating Hvap using
simulations. We had assumed ideal gas behavior for gas phase and approximated the
velocities using the equipartition theorem and the degrees of freedom of water molecules.
Similar to MSEG we can also calculate MSEHvap. The deviation of estimated Hvap from
the converged values gives the bias. Squaring and adding the bias and the uncertainty will
give us the MSEHvap .
MSEHvap = (Hvap (converged val. of Hvap))2 + ((Hvap))2
MSEHvap predicts the reliability of the parameter sets to predict accurate and precise molar
enthalpy of vaporization of water. This also ensures that the parameters are good enough
to simulate the right bulk properties.
79
80
81
Figure 18. Simulation time for methane solvation as a function of different PME
parameters.
82
Figure 19. Subplot at row 2 column 2 in Figure 14 is expanded to explore exactly how
G estimates of methane solvation vary for different Fourier spacings and cutoffs
converge.
Figure 20. Subplot at row 2 column 2 in Figure 17 is expanded to explore exactly how
Hvap estimates vary for different Fourier spacings and cutoffs converge.
83
Figure 21. Subplot at row 2 column 2 in Figure 19 is expanded to explore exactly how
simulation time for methane solvation varies for different Fourier spacings and cutoffs.
Simulation time increases with decreasing Fourier spacing as more and more wave
vectors are included in the calculation of the reciprocal sum in Fourier space. The
difference in time consumption is relatively small, not more than 0.5 hr/ns, for Fourier
spacings 0.12 nm to 0.20 nm at same cutoff. But the computational cost significantly
increases for Fourier spacing below 0.10 nm. Time consumption also increases with
increasing cut-off as number of particles increase in the calculation of the direct sum in
real space. Time consumption increases slowly till a cutoff of 0.9 nm but then increases
rapidly with increasing cutoff. Using a MATLAB script we searched for the parameters
which resulted in smallest simulation time for a given RMSE. We chose only RMSE G
and RMSEHvap for our analysis as H estimates have high uncertainties. For RMSEG
84
and RMSEHvap less than 0.5 kJ/mol we found 5 possible candidates of parameter sets as
given in Table 9.
Table 9. Lead parameter set for methane solvation.
H
MSE Hvap
MSEG
kJ/mol
kJ/mol
0.125
0.173
1.934
0.100
0.072
0.115
4
5
Time
Order
Ew.
F.Sp.
Cut-off
hr/ns
Inter.
Tol.
nm
nm
0.051
4.341
1E-4
0.12
0.8
-0.621
0.155
4.514
1E-4
0.12
0.9
0.139
-3.710
0.347
4.688
1E-4
0.12
0.8
0.095
0.034
0.272
0.212
4.688
1E-4
0.12
0.8
0.091
0.034
-2.005
0.220
4.688
1E-4
0.14
0.9
Rank
kJ/mol kJ/mol
Similar analysis was done for Dipole inversion sand Anthracene solvation.
Table 10. Lead parameter sets for dipole inversion.
MSEG
Time
Order
Ew.
F.Sp.
Cut-off
kJ/mol
kJ/mol
kJ/mol
hr/ns
Inter.
Tol.
nm
nm
3.600
98.123
-36.609
7.576
1e-4
0.145
0.8
3.698
98.562
-37.000
7.828
1e-4
0.145
0.8
3.553
97.838
-36.312
8.586
1e-6
0.145
1.0
3.563
97.669
-36.109
9.217
1e-6
0.105
0.9
3.640
98.341
-36.784
9.217
1e-4
0.185
0.8
Rank
85
Time
Order
Ew.
F.Sp.
Cut-off
kJ/mol
kJ/mol
kJ/mol
hr/ns
Inter.
Tol.
Nm
nm
0.052
15.998
-1.622
8.964
1e-4
0.12
0.8
0.277
19.004
-1.941
9.343
1e-4
0.1
0.8
0.140
13.128
-1.529
9.596
1e-4
0.14
0.8
0.064
13.849
-1.608
9.722
1e-4
0.14
0.9
0.085
15.726
-1.746
9.974
1e-4
0.1
0.9
Rank
The lead candidates parameter sets involve order of interpolation of 3, but we would
discard these leads because subplots with order of interpolation 3 did not have good
convergence. For an order of interpolation of 4 the parameter set which has lowest
MSEG, a minimum difference between G and H or a minimum H, if H is
not well converged as in the case of anthracene solvation and dipole inversion, would be
the best parameter set . Using this rule rank 4 in Table 9 would be the set of parameters
with the fastest simulation time for methane solvation. For dipole inversion parameter set
with rank 2 is fastest. But dipole inversion, simulations with cut-off of 0.8 nm fail to run
due to a GROMACS error with twin range cut-off's and SHAKE the virial and the
pressures are incorrect. Therefore parameter sets with order of interpolation 4, ewald
tolerance 1E-4, fourier spacing of 0.14 nm and cut-off 0.9 is predicted as the next best in
line. Similarly for anthracene solvation, rank 4 in Table 11 would be the best set of
parameters.
86
We checked whether we get the same G as predicted for the lead parameters by doing
equilibrium simulation for methane solvation. The simulations were run for 6 ns and
samples corresponding to first 1.5 ns were discarded to account for equilibration. We
have compared the free energy estimates from the equilibrium simulations done at order
of interpolation of 4, an Ewald tolerance of 1E-8, fourier spacing of 0.12 nm and a cutoff
of 0.9 nm with switch starting at 0.88 nm and our leads in methane solvation: order of
interpolation of 4, an Ewald tolerance of 1E-4 , fourier spacing of 0.12 nm and a strict
cutoff of 0.8 nm as well as 0.9 nm.
Table 12. Comparison of lead parameter sets with the parameter set used in simulation
done to test free energy estimators for methane solvation.
G
Parameter set
G(G)
Hvap(Hvap)
(Pred.)
Time
kJ/mol
kJ/mol
(Actual)
(hr/ns)
kJ/mol
Set used for sim. in chapter 3
9.0930.096
42.4340.083
3.328
0.212
8.8000.095
42.3830.090
2.483
0.293
Lead 2
0.204
9.0190.096
42.4340.089
2.779
0.074
87
In Table 12 we have reported G which was predicted using the formalism explained in
the section 4.1.1 along with the G which was evaluated by doing equilibrium
simulations using the lead PME parameter sets. We see that the lead parameter set 1 of
Table 12, has pretty good agreement between actual and predicted G but lead 2 G
prediction suffers from overestimation. The last column has the time consumption
corresponding to a parameter set. These times are calculated using a small 10 ps runs for
the all the three sets using the same starting configuration using the latest version of
GROMACS V4.5.3 run on a single core of Intel Xenon 5410 processor. This is done to
remove any mismatch
configuration, version of code and processing power of the machine. The comparison of
simulation times in Table 12 cannot be done with those in Table 9 because (a) they were
generated using single point simulations i.e. just a single step (b) they have different
starting configuration and were run with an older version of the GROMACS code on a
different machine. Lead 1 parameter set is 25% faster than parameter set used in
simulation in chapter 3. We gain speed with a minimum loss in accuracy in either free
energy estimate or enthalpy of vaporization of water. Lead 2 parameter set is only 16%
faster compared to parameter set used in simulation in chapter 3 set as opposed but has
enhanced accuracy in free energy estimate compared to lead 1 parameter set i.e. fastest
parameter set.
4.3 Conclusions
We have used the benchmark test set and free energy estimates calculated using MBAR
as tools for exploring the PME parameter space. This study provides an alternate method
88
89
References:
(1)
Davies, J. W. Glick, M.; Jenkins, J. L. Curr. Opin. Chem. Biol. 2006, 10, 343-
351.
(2)
(4)
(5)
(6)
Duren, T. Bae, Y.; Snurr, R. Chem. Soc. Rev. 2009, 38, 1237-1247.
(9)
(13)
90
(14)
Engineers, Part C: Journal of Mechanical Engineering Science 1989-1996 (vols 203210) 1996, 210, 91-94.
(15)
(16)
(17)
(18)
del lamo, J. C. Marsden, A. L.; Lasherasa, J. C. Rev. Esp. Cardiol. 2009, 62,
781-805.
(20)
(22)
(23)
(25)
91
(26)
(29)
184114.
(30)
Efron, B.; Tibshirani, R. J. An Introduction to the Bootstrap; 1st ed. Chapman and
Hall/CRC, 1994.
(31)
(32)
(33)
(34)
(35)
(36)
(37)
(38)
(39)
(41)
(42)
(43)
(44)
(45)
(46)
(48)
(49)
(50)
Xiao, Y.; Li, D. Math. Method Oper. Res. 2008, 67, 443-454.
(51)
(53)
(54)
1117.
(55)
(56)
(57)
(58)
(59)
93
Appendix:
Si ( ) ai bi ( i ) ci ( i )2 di ( i )3 1 i K 1
(A1)
Here K is the total number of intermediate states, creating K-1 intervals for splining. Each
spline of a given interval has its own set of coefficients ai, bi, ci and di which can be
computed by standard linear algebra methods using the conditions that define a natural
cubic spline.
Si (i ) ai (U ( ) / )i
Si (i1 ) (U ( ) / )i1
1 i K 1
1 i K 1
(A2)
(A3)
(A4)
(A5)
S1'' (1 ) 0
(A6)
S K'' 1 (K ) 0
(A7)
When we integrate piece wise over all the intervals we get the total free energy change.
K 1 i 1
G d Si ( )
(A8)
bi
c
d
(i1 i ) 2 i (i1 i )3 i (i1 i ) 4
2
3
4
(A9)
i 1 i
K 1
G ai (i1 i )
i 1
94
a1 A1,1 A1,K (U ( ) / )1
aK 1 AK 1,1 AK 1,K
(U ( ) / ) K
(A10)
Here Ai,j are the weights in a weight matrix for ai. Similarly bi, ci, and di are expressed as
linear weighted sums of ( U()/
respective K-1 x K matrices. There exist a unique solution for a, b , c and d. The A, B, C
and D matrices are all of rank K-1 and are invertible.
We can then finally combine these into a single weight matrix.
Let us say hi = i+1 - i
Then Eq. A9 can be written as a linear weighted sum.
K 1 K
h2
h3
h4
G hi Ai , j i Bi , j i Ci , j i Di , j (U ( ) / ) j
2
3
4
i 1 j 1
(A11)
G Wi , j (U ( ) / ) j
i 1 j 1
(A12)
Here
Wi , j hi Ai , j
hi2
h3
h4
Bi , j i Ci , j i Di , j
2
3
4
95
(A13)
Once we have the weights we can calculate the overall free energy change using Eq. A12
and the uncertainty estimate using the following equation.
K 1 K
102 Wi ,2j 2j
(A14)
i 1 j 1
96
A2 Supplementary information
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
-0.029
0.2160.002
0.2010.014
0.2160.012
7.47.4
TI3
-0.030
0.2190.002
0.2070.013
0.2200.012
6.06.9
DEXP
-0.001
0.5920.093
0.6480.041
0.5790.083
-8.615.5
IEXP
0.058
0.6290.120
0.6770.040
0.6170.111
-7.118.5
UBAR
-0.004
0.5400.092
0.6010.044
0.5300.083
-10.316.6
BAR
-0.035
0.1710.002
0.2040.016
0.2190.012
-16.16.6
RBAR
-0.034
0.1720.001
0.2060.016
0.2190.012
-16.66.7
MBAR
-0.034
0.2170.002
0.2020.016
0.2180.012
-7.28.4
GDEL
-0.200
0.3710.004
0.3890.029
0.3720.021
-4.77.1
GINS
0.139
0.3700.004
0.3740.025
0.3680.021
-1.06.7
97
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
-0.021
0.278 0.004
0.257 0.017
0.277 0.016
8.2 7.1
TI3
-0.023
0.297 0.005
0.278 0.017
0.297 0.017
7.1 6.6
DEXP
-0.394
2.048 0.497
3.305 0.231
2.287 0.768
-38.0 15.7
IEXP
0.425
2.052 0.471
3.239 0.229
2.262 0.716
-36.7 15.2
UBAR
-0.377
2.024 0.487
3.230 0.193
2.245 0.737
-37.3 15.5
BAR
0.000
0.367 0.005
0.363 0.026
0.383 0.020
1.2 7.5
RBAR
0.002
0.370 0.005
0.368 0.026
0.387 0.026
0.5 7.2
MBAR
-0.011
0.384 0.005
0.355 0.026
0.383 0.020
8.0 7.9
GDEL
-0.082
0.935 0.020
0.894 0.057
0.925 0.051
4.6 7.0
GINS
0.055
0.762 0.014
0.833 0.067
0.773 0.041
-8.5 7.5
98
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
-6.433
0.1780.003
0.2080.018
0.1780.010
-14.37.7
TI3
-6.454
0.1820.003
0.2140.016
0.1820.010
-15.16.7
DEXP
-6.508
0.4320.172
0.5240.053
0.4250.164
-17.633.9
IEXP
-6.506
0.2110.005
0.2360.020
0.2090.011
-10.97.9
UBAR
-6.473
0.1550.002
0.2160.019
0.1860.009
-28.46.5
BAR
-6.531
0.1280.002
0.2070.018
0.1750.009
-38.45.3
RBAR
-6.529
0.1280.002
0.2080.017
0.1750.009
-38.55.2
MBAR
-6.517
0.1710.002
0.1990.018
0.1710.009
-13.87.9
GDEL
-9.957
0.1860.004
0.2380.019
0.2190.013
-22.26.4
GINS
-14.631
0.3330.006
0.3180.022
0.3020.018
4.77.4
99
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
-6.841
0.292 0.007
0.302 0.021
0.292 0.014
-3.3 7.0
TI3
-6.237
0.288 0.006
0.307 0.023
0.288 0.014
-6.2 7.3
DEXP
-12.399
1.774 0.581
3.652 0.387
2.057 1.162
-51.4 16.7
IEXP
-5.571
1.267 0.332
1.682 0.134
1.373 0.503
-24.7 20.6
UBAR
-6.221
0.828 0.084
0.808 0.050
0.847 0.088
2.5 12.1
BAR
-6.436
0.335 0.008
0.361 0.028
0.357 0.019
-7.1 7.4
RBAR
-6.439
0.338 0.008
0.362 0.027
0.356 0.026
-6.7 7.3
MBAR
-6.439
0.357 0.008
0.357 0.028
0.357 0.019
0.1 8.2
GDEL
-25.257
0.315 0.009
0.364 0.027
0.402 0.026
-13.3 6.8
GINS
-201.380
4.390 0.146
6.016 0.396
6.462 0.729
-27.0 5.4
100
Table S 5. Free energy estimated using large number of samples and large number of
intermediate states for dipole inversion.
((G)
((G)
((G)
Method
(G))450,full
(G))450,sp
(G))51
TI
-0.0300.022
-0.0230.027
-0.1280.093
TI3
-0.0290.022
-0.0250.029
-0.1230.093
DEXP
-0.0120.063
-0.5660.632
-0.1130.094
IEXP
0.0670.069
0.4400.455
-0.1610.101
UBAR
-0.0040.057
-0.3320.585
-0.1300.092
BAR
-0.0320.022
-0.0010.038
-0.1300.092
RBAR
-0.0320.022
0.0090.038
-0.1440.093
MBAR
-0.0300.022
-0.0050.038
-0.1300.092
GDEL
-0.0560.036
-0.1010.097
-0.1060.094
GINS
-0.0020.035
0.0890.075
-0.1520.101
101
Table S 6. Bias in free energy estimates due number of samples and number of
intermediate states for dipole inversion are presented here. None of the methods show
significantly large bias for this molecular test set.
(Bias1
Method (Bias1))450,full
(Bias2
(Bias1
(Bias2
(Bias2))51,full
(Bias1))450,sp
(Bias2))51,sp
TI
-0.0010.031
0.0970.095
0.0010.039
0.1060.097
TI3
-0.0060.031
0.0880.095
0.0010.041
0.0980.097
DEXP
0.0020.087
0.1030.112
0.1670.667
-0.2850.231
IEXP
-0.0030.094
0.2250.120
-0.0100.501
0.5920.234
UBAR
-0.0120.079
0.1140.107
-0.0160.621
-0.2180.228
BAR
0.0010.028
0.0980.094
0.0000.053
0.1290.099
RBAR
-0.0020.028
0.1100.094
-0.0130.053
0.1410.100
MBAR
-0.0020.031
0.0980.095
-0.0020.054
0.1230.100
GDEL
0.0070.053
0.0580.102
0.0160.135
0.0210.133
GINS
-0.0130.050
0.1370.107
-0.0360.107
0.2050.127
102
Table S 7. RMSEs and statistical uncertainties in RMSEs in free energy change of dipole
inversion of IEXP, DEXP, UBAR GDEL, and GINS are significantly higher (indicating
lower reliability) compared to other methods specifically in the sparse set.
((RMSE1)
((RMSE2)
((RMSE1)
((RMSE2)
0.2840.080
0.2970.091
0.3680.097
0.3780.116
TI3
0.2900.082
0.3020.090
0.3950.104
0.4040.117
DEXP
0.8340.281
0.8390.284
3.5911.615
3.5971.624
IEXP
0.8870.290
0.9060.318
3.4261.769
3.4591.799
UBAR
0.7700.273
0.7790.277
3.5321.545
3.5361.551
BAR
0.2510.094
0.2650.103
0.4940.150
0.5030.172
RBAR
0.2520.093
0.2700.106
0.5000.151
0.5120.177
MBAR
0.2840.082
0.2980.092
0.5060.144
0.5140.166
GDEL
0.5250.151
0.5260.156
1.2500.346
1.2500.347
GINS
0.4850.146
0.4990.164
1.0690.366
1.0760.403
103
Table S 8. Free energies estimated u large number of samples and large number of
intermediates states for anthracene hydration free energy.
((G)
((G)
((G)
Method
(G))450,full
(G))450,sp
(G))51,full
TI
-6.4220.016
-6.8620.028
-6.3060.100
TI3
-6.4430.016
-6.2580.027
-6.2850.100
DEXP
-6.4330.048
-7.6911.700
-6.3670.113
IEXP
-6.5270.019
-6.4190.266
-6.1920.096
UBAR
-6.4680.016
-6.3030.089
-6.2420.098
BAR
-6.5260.016
-6.4820.033
-6.2760.099
RBAR
-6.5240.016
-6.4840.033
-6.2730.100
MBAR
-6.5120.015
-6.4830.033
-6.2370.095
GDEL
-9.9300.020
-25.2390.041
-6.7110.110
GINS
-14.6520.030
-201.7680.681
-6.5950.096
104
Table S 9. GDEL and GINS have largest bias in free energy estimates for anthracene
solvation due to number of samples and number of intermediate states even for full set.
For sparse set GDEL and GINS have even larger bias compared to full set. DEXP and
IEXP also show significantly large biases in sparse set.
(Bias1
(Bias2
(Bias1
(Bias2
Method (Bias1))450,full
(Bias2))51,full
(Bias1))450,sp
(Bias2))51,sp
TI
-0.0090.024
-0.1250.102
0.0210.040
-0.5350.104
TI3
-0.0100.024
-0.1680.102
0.0210.039
0.0480.104
DEXP
-0.0790.067
-0.1450.122
-4.7081.710
-6.0320.218
IEXP
0.0190.029
-0.3160.098
0.8480.296
0.6210.162
UBAR
-0.0050.022
-0.2310.099
0.0820.122
0.0210.129
BAR
-0.0080.020
-0.2570.100
0.0460.047
-0.1600.105
RBAR
-0.0040.020
-0.2560.101
0.0450.047
-0.1660.106
MBAR
-0.0050.023
-0.2800.097
0.0440.049
-0.2020.101
GDEL
-0.0280.028
-3.2470.112
-0.0180.052
-18.5460.114
GINS
0.0250.045
-8.0320.102
0.3880.810
-194.7850.450
105
Table S 10. High RMSEs and statistical uncertainties in RMSEs for GDEL and GINS in
both the full and the sparse set indicate low reliability. DEXP, IEXP, UBAR also
become unreliable in the sparse set.
((RMSE1)
Method (RMSE1))full
((RMSE2)
((RMSE1)
((RMSE2)
(RMSE2))full
(RMSE1))sp
(RMSE2))sp
TI
0.2570.100
0.2790.118
0.4040.124
0.6370.241
TI3
0.2640.099
0.3030.127
0.4010.126
0.4030.126
DEXP
0.6360.310
0.6510.300
5.9152.096
6.9582.297
IEXP
0.2970.110
0.4150.164
2.1250.875
2.0540.855
UBAR
0.2420.110
0.3220.141
1.1250.314
1.1230.309
BAR
0.2180.108
0.3170.156
0.4690.160
0.4910.168
RBAR
0.2180.107
0.3160.156
0.4710.163
0.4960.167
MBAR
0.2460.096
0.3550.148
0.4840.156
0.5180.170
GDEL
0.2850.109
3.2510.241
0.4540.160
18.5510.364
GINS
0.4440.128
8.0410.321
6.9482.711
194.7296.032
106
107
Figure S 1. Plots comparing the distribution of dipole inversion free energies with the
Gaussians for full set for dipole inversion.
108
Figure S 2. Plots comparing the distribution of dipole inversion free energies with the
Gaussians for sparse set for dipole inversion.
109
110
111
112
113
114
115
116
117
Figure S 12. Simulation time for anthracene solvation as a function of different PME
parameters.
118
Figure S 13. Subplot row 2, column 2 in Figure S10 is expanded to explore exactly how
G estimate of anthracene solvation vary for different Fourier spacings and cutoffs.
Figure S 14. Subplot at row 2, column 2 in Figure S12 is expanded to explore exactly
how simulation time for anthracene solvation varies for different Fourier spacings and
cutoffs.
119
benchmark
test
set
is
available
for
distribution
and
use
at
This is intended to be the first version of our benchmark test set. It is not comprehensive,
and will require significant further expansion to be more useful to a wider range of
researchers. Future versions will be developed in response to feedback. For example,
future versions of the benchmark will ideally include molecules with long correlations for
internal motion, and free energies of well studied ligand-binding systems and
increasingly tractable protein ligand binding systems, such as T4 lysozyme.
120
combination rules makes it difficult to compare force field parameters directly without
significant manipulation of input files. Even the short range van der Waals and Coulomb
interactions can differ as each MD program uses different schemes to perform cutoffs
over charge groups straddling the cutoff. We have therefore reported two different single
point energies from single point simulations; one with
A3.2.1. Agreement in combination rule: AMBER can use only use LorentzBertholot
combination rules (arithmetic mean for ij and geometric mean for ij), while DESMOND
121
and GROMACS can also use the geometric combination rule (using geometric mean for
both for ij and ij). The simulation tests performed here were performed with the
geometric combination rule. In order to obtain equivalent energies in AMBER, we
tailored i and j such that the combined ij using LorentzBertholot rules match the ij
obtained from geometric combining rules.
inversion, there are only two particle types in each simulation: m (corresponding to
solute) and o (corresponding to water oxygen). We calculate an effective mE as follows:
by geometric combining rules, (mo)G = (m * 0)1/2 and with LorentzBertholot rules
(mo)LB = (mE+ o)/2. Setting (mo)G = (mo)LB, we get mE = 2(mo)G - o. For both of
these cases, there are no solute-solute terms, so the total energy remains the same.
However, for anthracene solvation we have 3 particle types: CH (corresponding to
aromatic carbon with two aromatic carbon neighbors and one hydrogen neighbor), C
(corresponding to aromatic carbon with three aromatic carbon neighbors) and o
(corresponding to water oxygen). We have now 3 ij terms to match Co, CHo, CHC.
Among the three it is possible only to match any two. Since anthracene has a nearly rigid
structure, deviations in CHC will alter the 1-4 intra-molecular interaction by
approximately 7 kJ/mol. Aside from slight twisting allowed by improper dihedrals, will
remain essentially fixed for all configurations of the system. We therefore choose to
match CHO, and CO, rather than CHC by calculating calculate CE and a CHE.
A3.2.2. Agreement in Cutoff
All simulation packages use slightly different potential tapering functions, so it is
impossible to match simulations between different packages using tapering alone. Even
122
using strict cutoffs, however, there are difficulties. GROMACS uses a group based
cutoff, so cutoffs of equal length in DESMOND and AMBER are not equivalent.
Because of the implementation of charge group dependent cutoffs with GROMACS, we
have used a switching potential with a very small switching distance (0.000001 nm),
which approximates to strict cutoff in GROMACS; switching distances any smaller do
not further change the energy.
A3.2.3. Agreement in PME Parameters:
Particle Mesh Ewald implementations are sufficiently different between the codes that
cutoffs that are in theory equivalent give energies that may differ by up to 40-50 kJ/mol.
However, if longer cutoffs are used, the differences are significantly reduced, to the 0.5-2
kJ/mol level.
A3.3. Best Matches to Parameters used in the Benchmark Set Tests:
Energies were calculated by running single point MD runs with AMBER and
DESMOND parameters which best approximate parameters used in our molecular
dynamics simulations using GROMACS (Table 1).
123
Table A 1. Single point simulation parameters for MD_sim_parm energies for methane
solvation, dipole inversion, anthracene solvation. The second set of parameters for
number of grid points is for dipole inversion, which had a larger box.
GROMACS
PME
(switch 0.88nm, cutoff
0.9nm)
Fourier Spacing 0.12nm
(nkx 32, nky 32 nkz 32)
(nkx 36, nky 36 nkz 36)
Order of spline 4
Ewald_tolerance 1.0e-08
Vdw
(switch 0.8nm, cutoff
0.9nm)
AMBER
PME
(cutoff = 0.9 nm)
(nfft1 32, nfft2 32, nfft3
32)
(nfft1 36, nfft2 36, nfft3
36)
Order of spline 4
Ew_coeff 0.43
Vdw
(cutoff = 0.9 nm)
DESMOND
PME
(taper 0.88nm, cutoff
0.9nm)
(n_k = [32 32 32 ])
(n_k = [36 36 36 ])
order = [4 4 4 ]
r_spread = 4.0
vdw
(taper 0.88nm, cutoff
0.9nm)
124
Table A 2. Single point simulation parameters for high cutoff energies for methane
solvation and anthracene solvation.
GROMACS
PME
(switch 1.19999999nm,
cutoff 1.2nm)
Fourier Spacing 0.06nm
(nkx 54, nky 54 nkz 54)
Order of spline 6
Ewald tol. 10e-08
vdW
(switch 1.19999999nm,
cutoff 1.2nm)
AMBER
DESMOND
PME
(cutoff = 1.2 nm)
PME
( cutoff 1.2nm)
n_k = [50 50 50 ]
Order of spline 6
Ew_coeff 0.43
order = [6 6 6 ]
r_spread = 4.0
vdW
(cutoff = 1.2 nm)
vdW
(cutoff 1.2nm)
Table A 3. Single point simulation parameters for high cutoff energies for dipole
inversion.
GROMACS
PME
(cutoff 1.6nm)
Fourier Spacing 0.06nm
(nkx 60, nky 60 nkz 60)
Order of spline 6
Ewald tol. 10e-08
vdw
(switch 1.49999999nm,
cutoff 1.5nm)
AMBER
PME
(cutoff = 1.5 nm)
DESMOND
PME
(cutoff 1.5nm)
n_k = [60 60 60 ]
Order of spline 6
Ew_coeff 0.43
order = [6 6 6 ]
r_spread = 4.0
Vdw
(cutoff = 1.5 nm)
Vdw
(cutoff 1.5nm)
One file
a breakdown of the potential energy into its components: bond energy, angle energy,
dihedral energy, Lennard-Jones short range energy, Lennard-Jones dispersion correction
energy beyond the cutoff, total Lennard-Jones energy, 1-4 Lennard Jones interaction
energy, Coulomb short range interaction energy, Coulomb interaction energy in
reciprocal space, total Coulomb interaction energy and 1-4 Coulomb interaction energy.
Some MD packages do not print out all the energy components in their output, or add two
or three components together into a single term. Comparisons are therefore not always
possible between all the energy components for all packages.
A3.6. File Organization:
The organization of the distribution is as follows.
Methane_Solvation/
gros/
ms.mdp
shortcutoff.mdp
longcutoff.mdp
Dipole_Inversion/
gros/
di.top
shortcutoff.mdp
longcutoff.mdp
Anthracene_Solvation/
gros/
as.top
shortcutoff.mdp
longcutoff.mdp
AMBER/
Methane_Solvation/
crds/
ms.prmtop
shortcutoff_md.in
longcutoff_md.in
Dipole_Inversion/
crds/
di.prmtop
shortcutoff_md.in
longcutoff_md.in
Anthracene_Inversion/
crds/
as.prmtop
shortcutoff_md.in
longcutoff_md.in
DESMOND/
Methane_Solvation/
cmss/
shortcutoff.cfg
longcutoff.cfg
Dipole_Inversion/
cmss/
shortcutoff.cfg
longcutoff.cfg
Anthracene_Inversion/
cmss/
shortcutoff.cfg
longcutoff.cfg
127