MS Thesis On Benchmark Test Set For Free Energy Calculations Using Molecular Simulations

Using a molecular benchmark set to compare free energy estimators
and explore electrostatic potential parameter space
Himanshu Paliwal
B. Tech. Chemical Engineering, Institute of Technology Banaras Hindu University, 2002
A Thesis presented to the Graduate Faculty

of the University of Virginia in Candidacy for the Degree of
Master of Science
Department of Chemical Engineering
University of Virginia
July, 2011
APPROVAL SHEET
This thesis is submitted in partial fulfillment of the
requirements for the degree of
Master of Science (Chemical Engineering)
______________________________________________________
Author
This thesis has been read and approved by the Examining Committee:
______________________________________________________
Michael R. Shirts, Thesis Advisor
______________________________________________________
John OConnell, Committee Chairperson
______________________________________________________
Erik J. Fernandez, Committee Member
Accepted for the School of Engineering and Applied Science:
______________________________________________________
Kathryn Thornton, Dean, School of Engineering and Applied Science
July, 2011
ABSTRACT
There is a significant need for improved tools to validate thermo-physical quantities

computed via molecular simulation. In this paper we present the initial version of a
benchmark set for testing methods of calculating free energies of molecular
transformation in solution. This set is based on molecular changes common to many
molecular design problems, such as insertion and deletion of atomic sites and changing
atomic partial charges. We use this benchmark set to compare the statistical efficiency,
reliability and quality of uncertainty estimates for a number of published free energy
methods, including thermodynamic integration, free energy perturbation, the Bennett
acceptance ratio (BAR), and its multistate equivalent MBAR. We identify MBAR as the
consistently best performing method, though other methods are comparable in reliability
and accuracy in many cases. We demonstrate that assumptions of Gaussian distributed
errors in free energies are usually valid for most methods studied. We demonstrate that
bootstrap error estimation is a robust and useful technique for estimating statistical
variance for all free energy methods studied. We also use MBAR to explore electrostatic
potential parameters space in order to find the set of parameters resulting in shortest
simulation time for a given accuracy in the free energy estimate of the transformations in
benchmark set and enthalpy of vaporization of water. This benchmark set is provided in a
number of different file formats with the hope of becoming a useful and general tool for
method comparisons.
CONTENTS
ACKNOWLEDGEMENTS .............................................................................................. I
LIST OF FIGURES ......................................................................................................... II
LIST OF TABLES ........................................................................................................ VII
1. Introduction ................................................................................................................... 1
2. Molecular systems of benchmark set .......................................................................... 7
2.1 Minimal test system .................................................................................................. 7
2.2 Charge mutation system ............................................................................................ 7
2.3 Large molecule mutation system .............................................................................. 8
3. Comparing free energy methods based on statistical tests, using benchmark set 10
3.1 Free energy methods and error propagation ........................................................... 10
3.1.1 Thermodynamic integration ............................................................................. 10
3.1.2 Exponential averaging ..................................................................................... 12
3.1.3 Bennett acceptance ratio .................................................................................. 15
3.1.4 Unoptimized Bennett acceptance ratio ............................................................ 16
3.1.5 Range-based Bennett acceptance ratio............................................................. 16
3.1.6 Multistate Bennett acceptance ratio ................................................................. 17
3.1.7 Gaussian estimate of exponential averaging .................................................... 17
3.2 Building molecular systems, setting up simulation framework and designing
statistical experiments. .................................................................................................. 18
(A) System preparation and simulation parameters. ..................................................... 19
3.2.1 values and spacing between intermediate states for free energy calculations
................................................................................................................................... 19
3.2.1.1 Full set ................................................................................................... 20
3.2.1.2 Sparse set ............................................................................................... 21
3.2.2 Generating an ensemble of uncorrelated configurations ................................. 22
(B) Statistical tests. ....................................................................................................... 23
3.2.3 Quantifying accuracy and precision in uncertainty estimate of an estimator .. 23
3.2.3.1 Sample standard deviation ........................................................................ 23
3.2.3.2 Analytical estimate.................................................................................... 24
3.2.3.3 Bootstrap estimate ..................................................................................... 25
3.2.4 Quantifying bias of free energy estimates ....................................................... 26
3.2.4.1 Bias due to number of samples ................................................................. 27
3.2.4.2 Bias due to number of intermediate states ................................................ 28
3.2.5 Quantifying reliability of a free energy estimator............................................ 28
3.2.6 Validating the Gaussian distributions of the free energy differences .............. 31
3.3 Results and discussion ............................................................................................ 31
3.3.1 Validation of uncertainty estimates ................................................................. 32
3.3.2 Analysis of bias in free energy estimates ......................................................... 40
3.3.3 Overall reliability of a free energy estimator ................................................... 47
3.3.4 Testing the Gaussian distribution of free energies ........................................... 52
3.3.5 Convergence properties of free energy estimates ............................................ 55
3.4. Conclusions ............................................................................................................ 58
4. Exploring computational efficiency of multiple electrostatic potential parameter

sets using data sampled at a single set........................................................................... 61
4.1 Methods................................................................................................................... 64
4.1.1 Difference between free energy estimates calculated with two different PME
parameter sets............................................................................................................ 64
4.1.2 Difference in enthalpy of transformation between two PME parameter sets .. 68
4.1.3 Enthalpy of vaporization of water calculated at new set of PME parameters. 70
4.2 Results and Discussion ........................................................................................... 75
4.3 Conclusions ............................................................................................................. 88
References: ...................................................................................................................... 90
Appendix: ........................................................................................................................ 94
A1 Thermodynamic integration using cubic splines .................................................... 94
A2 Supplementary information..................................................................................... 97
A3 Availability of datasets for other simulation packages ......................................... 120
Readme for free energy benchmark V1.0 ............................................................... 121
A3.1. Benchmark Test Set Energies ................................................................... 121
A3.2. Difficulties in Exact Energy Matching ..................................................... 121
A3.2.1. Agreement in combination rule.......................................................... 121
A3.2.2. Agreement in Cutoff .......................................................................... 122
A3.2.3. Agreement in PME Parameters .......................................................... 123
A3.3. Best Matches to Parameters used in the Benchmark Set Tests ................. 123
A3.4. Long Cutoff Parameters ............................................................................ 124
A3.5. Format of output........................................................................................ 125
A3.6. File Organization....................................................................................... 126
ACKNOWLEDGEMENTS
I would like to express my heartfelt gratitude to my advisor Prof. Michael R. Shirts.
Without his help, support and vision, this work would not have been possible. I would
like to thank my lab members, Kai, Christoph, Arjan, Joe, Tri and Jon. Their comments
and discussions have contributed in bringing this effort to its final form.
I thank UVA ITC and UVA XCG for their computing resources support. I thank Karolina
from the UVA Computer Science department. Her help during the beta test of cross
campus grid gave me ideas about setting up the framework for running high through-put
simulations. I thank James Watney at D. E. Shaw Research for significant help in running
Desmond.
Last but not the least I thank my parents, my family and my wife Priya for all the moral
support they have provided me throughout the work.
LIST OF FIGURES
Figure 1. (a) In the coupled state or solvated state both intermolecular and intramolecular
interactions for anthracene are turned on. (b) In the decoupled state or vacuum state the
intermolecular interactions with water molecules are turned off.
Figure 2. Free energy differences of transitions in the direction of increasing and

decreasing entropy should be added separately to get the overall free energy for a dipole
inversion.
14
Figure 3. (a) Gaussians are plotted on mutually perpendicular axes. Both have the G
calculated using a method (here TI for methane solvation) as mean and RMSE1 and
RMSE2 as their standard deviations. (b) These are fused to generate a bivariate Gaussian
plot. (c) Top view of the bivariate Gaussian plot.
30
Figure 4. Uncertainty estimates (sample standard deviation, analytical, bootstrap) are

plotted for all the methods in three different test cases for full set. Consistent free
energy methods have bars equal in height, and the most accurate methods have the
shortest bars.
36

plotted for six methods in three different test cases for sparse set. Four methods
(DEXP, IEXP, GINS and GDEL) are not graphed because they did not converge properly
for any of the three uncertainty estimates.
38
Figure 6. Bias plots for different test cases for the full set. DEXP, TI and TI3 show
numerically significant bias for methane, while all methods show moderate bias with
respect to number of states for anthracene.
45
II
Figure 7. Bias plots for different test cases or sparse set. DEXP and IEXP show large
biases both due to number of samples and due to number of intermediate states.
46
Figure 8. Bivariate Gaussian plots for UA methane solvation. Note how TI and TI3 fail
in reliability test for sparse set. GDEL and GINS are not at all reliable. MBAR and
BAR are accurate but the estimate of the precision in BAR is misleading as it
underestimates uncertainty, as shown in Figures 4 and 5. SP after the method name
indicates the results of the sparse set.
49
Figure 9. Bivariate Gaussian plots for dipole inversion. TI is reliable for this molecule,
but DEXP, IEXP, GDEL and GINS again are the least accurate and precise of all the
methods studied.
50
Figure 10. Bivariate Gaussian plots for Anthracene hydration free energy. The effect of
bias due to intermediate states is evident in almost all methods as all spreads are elliptical
in vertical direction. TI3 and MBAR both appear reliable, especially for the full set.
51
Figure 11. Each subplot has a comparison between the distribution of estimated free
energies from 100 repetitions (in blue), Gaussian with the mean G and standard
deviation (G) from 100 repetitions (in black), Gaussian with the mean (G)450ns and
standard deviation ((G)450ns) (in magenta), Gaussian with the mean (G)51states and
standard deviation ((G)51states ) (in cyan) for the full set for methane solvation.
53
III
standard deviation ((G)51states ) (in cyan) for sparse set for methane solvation.
54
Figure 13. Free energies estimated using large number of intermediate states and a large
number of samples converge to a single value for all free energy estimators.
56
Figure 14. Accuracy in methane solvation free energy estimates as a function of different
PME parameters.
76
Figure 15. Inaccuracy in free energy estimates increases with increasing phase space
77
overlap.
Figure 16. Molar enthalpy of vaporization of water as a function of different PME

parameters.
80
Figure 17. Accuracy in methane solvation enthalpy estimates as a function of different

PME parameters.
81
Figure 18. Simulation time for methane solvation as a function of different PME
parameters.
82
Figure 19. Subplot at row 2 column 2 in Figure 14 is expanded to explore exactly how
G estimates of methane solvation vary for different Fourier spacings and cutoffs
converge.
83
Hvap estimates vary for different Fourier spacings and cutoffs converge.
83
simulation time for methane solvation varies for different Fourier spacings and cutoffs. 84
IV
Figure S 1. Plots comparing the distribution of dipole inversion free energies with the
Gaussians for full set for dipole inversion.
108
Gaussians for sparse set for dipole inversion.
109
Figure S 3. Plots comparing the distribution of estimated anthracene solvation free

energies with the Gaussians for full set.
110

energies with the Gaussians for sparse set.
111
Figure S 5. Accuracy in free energy estimates of dipole inversion as a function of

different PME parameters.
112
Figure S 6. Accuracy in the estimate of enthalpy for dipole inversion process as a

function of different PME parameters.
113
Figure S 7. Simulation time for dipole inversion as a function of different PME

parameters.
114
Figure S 8. Subplot row 2, column 2 in Figure S5 is expanded to explore exactly how

G estimate of dipole inversion vary for different Fourier spacings and cutoffs.
115

simulation time for dipole inversion varies for different Fourier spacings and cutoffs. 115
Figure S 10. Accuracy in anthracene solvation free energy estimate as a function of
116
Figure S 11. Accuracy in anthracene solvation enthalpy estimates as a function of

117
Figure S 12. Simulation time for anthracene solvation as a function of different PME
parameters.
118
G estimate of anthracene solvation vary for different Fourier spacings and cutoffs. 119
Figure S 14. Subplot at row 2, column 2 in Figure S12 is expanded to explore exactly
how simulation time for anthracene solvation varies for different Fourier spacings and
cutoffs.
119
VI
LIST OF TABLES
Table 1. Statistical uncertainty calculated using three different approaches (analytical

(G), sample standard deviation (G), bootstrap (G)bs) for UA methane
solvation using the full set. All quantities are in kJ/mol. G is not the ensemble
average but the average over 100 repetitions.
35

solvation using sparse set. All quantities are in kJ/mol.
37
Table 3. Free energy estimates and corresponding uncertainty estimates in the large
number of samples (450 ns) and large number of intermediate states (51 states) for UA
methane solvation. Bootstrap estimates are reported as they are better than analytical
estimates. All quantities are in kJ/mol.
41
Table 4. Bias estimates due to number of samples and number of lambda states for full
and sparse sets for UA methane solvation. All quantities are in kJ/mol.
42
Table 5. RMSEs and statistical uncertainties in RMSEs in UA methane solvation free

energy estimate are largest for GINS and GDEL in sparse set while the acceptance ratio
methods consistently show low RMSEs and the corresponding uncertainty in the RMSEs.
All quantities are in kJ/mol.
43
Table 6. Time consumed by different methods to calculate the free energy 201 times of
anthracene solvation.
57
Table 7. Summary of all statistical tests are presented.
60
Table 8. PME parameters used to generate input sets.
74
VII
Table 9. Lead parameter set for methane solvation.
85
Table 10. Lead parameter sets for dipole inversion.
85
Table 11. Lead parameter sets for anthracene solvation.
86
Table 12. Comparison of lead parameter sets with the parameter set used in simulation
done to test free energy estimators for methane solvation.
87
Table S 1. Statistical uncertainty calculated using three different approaches (analytical

(G), sample standard deviation (G), and bootstrap (G)bs) for dipole inversion
using the full set. All quantities are in kJ/mol.
97

(G), sample standard deviation (G), bootstrap (G)bs) for dipole inversion
using the sparse set. All quantities are in kJ/mol.
98

(G), sample standard deviation (G), bootstrap (G)bs) for anthracene solvation
99

using sparse set. All quantities are in kJ/mol.
100
Table S 5. Free energy estimated using large number of samples and large number of
intermediate states for dipole inversion.
101
Table S 6. Bias in free energy estimates due number of samples and number of
intermediate states for dipole inversion are presented here. None of the methods show
significantly large bias for this molecular test set.
VIII
102
Table S 7. RMSEs and statistical uncertainties in RMSEs in free energy change of dipole
inversion of IEXP, DEXP, UBAR GDEL, and GINS are significantly higher (indicating
lower reliability) compared to other methods specifically in the sparse set.
103
Table S 8. Free energies estimated u large number of samples and large number of
intermediates states for anthracene hydration free energy.
104
Table S 9. GDEL and GINS have largest bias in free energy estimates for anthracene
solvation due to number of samples and number of intermediate states even for full set.
For sparse set GDEL and GINS have even larger bias compared to full set. DEXP and
IEXP also show significantly large biases in sparse set.
105
Table S 10. High RMSEs and statistical uncertainties in RMSEs for GDEL and GINS in
both the full and the sparse set indicate low reliability. DEXP, IEXP, UBAR also
become unreliable in the sparse set.
106
Table A 1. Single point simulation parameters for MD_sim_parm energies for methane
solvation, dipole inversion, anthracene solvation. The second set of parameters for
number of grid points is for dipole inversion, which had a larger box.
124
Table A 2. Single point simulation parameters for high cutoff energies for methane
solvation and anthracene solvation.
125
Table A 3. Single point simulation parameters for high cutoff energies for dipole
inversion.
125
IX
1. Introduction
Simulation and theory communities have developed substantial interest in using free
energy calculations for molecular design problems. Specifically, free energy calculations
can guide experimental screening techniques for measuring biological interaction
energies and offer the potential of a faster and cheaper way to get thermodynamic
information over large chemical spaces in a variety of molecular contexts.1 For example,
drug design2 requires prediction of binding affinities, tautomers, protonation states,
membrane permeabilities, and solubilities, all of which involve free energy calculations.36
Similarly, free energy calculations could become useful tools in material design
problems ranging from improved protein selectivity and stability on chromatographic

surfaces7 to tailoring metal organic frameworks8 for applications such as gas storage and
separation. Such potential extends to the design of new nanomaterials such as therapeutic
dendrimers, hetero-polymers and hyperbranched polymers for molecular recognition,
imaging, sensing/signaling and controlled payload delivery.9-13 However, substantial
roadblocks to routine use of molecular simulations as a complement to experiment are
confusion over suitability of methods for different molecular problems, as well as lack of
rigorous, validated understanding about the reliability of free energy calculations and
other observables estimated using statistical methods.
Other computational fields have successfully benchmarked and tested computational

methods to improve the reliability and thus the utility of simulations. For example the
field of computational fluid dynamics has also grappled with issues of reliability and
1
standardization of simulations. During the late 90s, substantial research efforts in the
field of computational fluid dynamics (CFD) were focused on establishing validation
benchmarks to improve the reliability of CFD simulations in various design
applications.14-16 This research helped to bring down costs, increase data fidelity, and
reduce design cycle time in the early development phases of new airplanes,17 Formula 1
cars,18 treatment and diagnosis of cardiovascular diseases19,20 and off-shore oil rigs.21
Similar validation benchmarks were developed for simulations in the nuclear industry, to
improve the reliability in nuclear reactor safety, underground nuclear waste storage, and
nuclear weapon safety.22 For molecular simulation to play a similarly useful role in
molecular engineering design,23 benchmarks and validation sets must be established. In
this paper we provide tools for improved standardization of molecular simulations
through the first version of a benchmark set for free energy calculations.
There are a large number of free energy methods available,23-30 which by early 2011 have
been cited collectively over 4600 times. As of the time of writing, 20% of those citations
occurred within the last 18 months.1 However, the simulation field lacks consensus in
choosing a method most appropriate for a given molecular design situation. At least
three fundamental issues contribute to this confusion.
These numbers were generated by adding the primary citations for each of the listed
methods (23-30) and searches for Thermodynamic Integration and Free Energy
Perturbation in ISI Web of Science, as the original papers for these methods are older
than the ISI database.
2
First, there is a lack of standard test cases for rigorous comparisons between different free
energy methods. Studies of new methods frequently use relatively trivial model systems,
such as a one- or two-dimensional analytically solvable potential energy function, or a
small Lenard-Jones sphere in water. Such test cases may not be representative of the
problems encountered in actual molecular changes. Alternatively, papers describing new
methods may use extremely complicated systems, such as protein-ligand binding systems
that are hard to converge and therefore make it difficult to accurately gauge true gains in
efficiency. Both of these scenarios yield little knowledge about whether a given method
will be useful in answering actual molecular design questions.
Second, computing
thermo-physical properties by molecular simulation involves stochastic sampling of

molecular configurations, and all comparisons must deal with the fact that repeated
independent measurements will have associated statistical error; unlike in quantum
mechanics, it will never be possible to match calculation to fourteen decimal places.24
Finally, direct comparisons between methods can be difficult because of the differences
between simulation code bases. Free energy calculation capabilities are recent additions
to most large-scale molecular simulation codes, and therefore most codes support usually
only a small subset of available free energy methods.
As a first step towards helping solve these problems, we propose the first version of a
molecular test set comprising realistic systems undergoing challenging molecular
transformations. We then use this test set to test the reliability of different methods for
estimating free energy differences. Although the molecular design applications listed in
the introduction seem very different, there are features common to all free energy
3
calculations performed for these applications. All involve determining the preference of
a molecule to partition between two environments, which can be calculated by way of a
difference in the free energies of molecular transformation between these two
environments. For example, we might wish to design a solute preferentially solvated by
a protein when compared to solvent (pure water) or a different complex medium (another
protein), as in the case of drug design. Alternately we might design a solvent which
preferentially solvates a given solute in a mixture; such as, designing ionic liquids25 for
sequestering CO2. These molecular transformations primarily involve either growing or
deleting atoms, changing the size or dispersion interaction between atoms, or altering the
partial charge on mutation sites. Any benchmark test set must include examples of these
transformations which are simultaneously challenging enough to push new methods and
yet possible to evaluate with sufficiently high precision to give meaningful comparisons
between methods in a reasonable amount computer time.
The most important features of a property estimation method to understand are the
statistical errors inherent in the method, both bias and statistical uncertainty, and the
reliability of the methods estimate of the property. Without such data, we cannot trust
our calculations or compare two different calculations for validation purposes. Studies
commissioned by U.S. science funding agencies on future directions for simulation based
engineering and science have emphasized the fundamental need for improved uncertainty
verification and validation.26,27 Almost all estimators of statistical quantities, like free
energy and ensemble average calculation methods, have some bias, a systematic
deviation from the true answer that would be obtained with perfect sampling.
4
Additionally, computing a given observable from independently selected samples gives

different estimates of the observable; this variation is the statistical uncertainty of the
estimate. Most free energy methods include estimates of this statistical uncertainty; but of
course, these uncertainty estimates are themselves statistical quantities and have variation
from sample to sample, and must be validated.
A few valuable studies have compared28,29 free energy methods, but not necessarily in a
systematic way. We use our proposed benchmark set to directly compare the estimated
uncertainties with the sample uncertainty. We compute the change in the mean square
error as a function of the number of intermediate states and number of samples to capture
both bias and uncertainty. We test whether the distribution of free energy estimators is
indeed Gaussian, a condition usually assumed when using statistical uncertainty estimates
to calculate error. Finally, we evaluate the bootstrap method30 as a method for computing
statistical uncertainty, as this method can be easily implemented for all of free energy
algorithms described in this thesis, and indeed generally for most statistical estimates of
observables.
Bias and uncertainty associated with the free energy algorithm are not the only sources of
errors in free energy estimates. Simulation parameters associated with the calculations of
non-bonded interactions, like electrostatic potential, also contribute to the errors in free
energy estimates. These parameters specify the extent of approximations allowed during
force and potential calculations. Molecular dynamics simulations require accurate force
calculations at each step. Similarly, Monte Carlo simulations require accurate potentials
5
at each step for generating samples. The selection criterions for these simulation
parameters are not well defined and there still exists a challenge as to how these can be
tuned efficiently for an entirely new system. The most common technique is to tune the
parameters such that for a given accuracy in force calculations the selection results in
minimum computational cost31-33. We have introduced a new way of looking into this
problem. We explore the electrostatic potential parameter space using our benchmark test
set. We search for the set of parameters which results in the least computational cost for a
given accuracy in free energy estimates of systems in benchmark set, estimates of molar
enthalpy of vaporization of water and estimates of enthalpy of transformation of each
system in benchmark set.
In this paper, we first explain the molecular test set and the rationale behind the
molecular choices. Next, we use this set to test and compare the accuracy, precision and
reliability of ten free energy methods. We then present a summary of the method
comparison, with much of the data presented as supplementary material because of its
length, and present our recommendations for methods for performing free energy
calculations. We then get into a discussion about the parameter space search and explain
our methodology and protocols to explore it and finally present our recommendations for
electrostatic potential parameters for performing molecular simulations.
2. Molecular systems of benchmark set
The systems in this benchmark set are designed to represent alchemical changes, or
changes of molecular identity, common to most molecular design applications.
Alchemical transformations frequently require the deletion or introduction of atoms and
large changes in the partial charges. Changes in torsional, angle, or dihedral parameters
usually result in smaller changes in phase space, as do small changes in dispersion
strength atomic radius, or charge. We therefore focus on atomic introduction/deletion and
large changes in partial charge.
2.1 Minimal test system: OPLS UA methane in TIP3P water (MS): The solvation of a
Lennard-Jones (LJ) sphere, representing methane, is perhaps the simplest molecular free
energy system that can be truly defined as molecularly realistic. There are no bonds,
angles, or torsions terms, nor are there solute/solvent charge terms. This system
represents a minimal test of whether the free energy method is at all valid or applicable
for molecular systems. We examine the transformation of coupling the sphere into water,
which corresponds to the solvation free energy of this molecule Figure 1 (a).
2.2 Charge mutation system: dipole inversion with OPLS UA ethane molecule in
TIP3P water (DI): We chose two LJ methyl spheres, tethered together, with +1/-1
charges and the bond length of a C-C bond as illustrated in Figure 1(b). This setup avoids
computing free energies of ions directly, as changing the total charge of a system with
periodic boundary conditions is not always handled completely correctly in many codes.35
7
This test measures whether the method can handle large water rearrangements around
charges. The system is a null transform; the free energy change is zero as the final state is
identical to the initial state by symmetry.
2.3 Large molecule mutation system: absolute hydration free energy of UA

anthracene in TIP3P water: In our third test, we compute the solvation of anthracene
via the decoupling of the intermolecular interactions from water. This system tests
whether the method can handle introduction or deletion of multiple atomic sites
efficiently. Importantly, there are no internal ligand degrees of freedom to complicate the
analysis. Force field parameters are taken from Pitera & Van Gunsteren.36 Originally, we
chose a null transformation of anthracene to anthracene via a benzene intermediate, but
the simpler solvation problem was eventually chosen because of the difficulty of
supporting such transformations in other codes and the difficulty of interpreting the
statistics of multiple transformations; a key requirement of the benchmark set is to be
simple to use. For this test set, we have used a single topology scheme, slowly turning
off the interactions between the solute and the solvent but keeping the intramolecular
interactions in the solute turned on as shown in Figure 1 (c) and (d) . The free energy
change of this transformation corresponds to the desolvation free energy of anthracene,
but we report the solvation free energy for ease of interpretation.
Figure 1. (a) Methane solvation (b) Dipole inversion (c) In the coupled state or solvated
state both intermolecular and intramolecular interactions for anthracene are turned on. (d)
In the decoupled state or vacuum state the intermolecular interactions with water
molecules are turned off.
3. Comparing free energy methods based on statistical tests, using benchmark set.
We use the benchmark set to test a total of ten free energy methods.
In the following
explanations, we assume that the simulations are performed in the isothermal-isobaric

ensemble, and thus the Gibbs free energy G is the quantity of interest; the Helmholtz
free energy can also be computed if the simulations are performed in the canonical
ensemble. U indicates the generalized potential energy, which in the case of the isobaricisothermal ensemble is actually U+PV, is 1/kBT, and is a coupling parameter
connecting the initial and final states in an arbitrary manner.
Brackets indicate an
ensemble average over the appropriate ensemble.
3.1 Free energy methods and error propagation

3.1.1 Thermodynamic integration24 using a trapezoid rule (TI) and a cubic spline37
(TI3):
For TI, we compute the ensemble average of the derivative of potential energy function
with respect to a coupling parameter for a system i.e. ( U()/ )i at all values and
the corresponding variances 2i of the ( U()/ )i distributions.
2i x 2 x
, where x = (U ( ) / )i
(1)
The ( U()/ )i values at different intermediates are interpolated and then integrated
to get an overall free energy change.
G10 G( 1) G( 0)
10
(U ( ) / )
(2)
For TI we have used a linear interpolation, which leads to the standard trapezoid rule to
integrate the total free energy. For TI3, the ( U()/ )i vs i curve is fit piecewise to a
natural cubic spline and then integrated analytically using the coefficients of the cubic
equation (see Appendix 1 for the derivation). Both the trapezoidal and the cubic spline
integration can be expressed in the form of weighted sum of individual ( U()/ )i
K
G10 Wi (U ( ) / )i
(3)
i 1
Here the Wis are the respective weights corresponding to each state and K is the total
number of intermediate states. The variance of this estimate of free energy can be
calculated by the following variance propagation formula:
K
102 Wi 2 i2
(4)
i 1
Occasionally in the literature, the variance of the free energy over each interval i to i+1 is
computed individually, and then propagated into the total variance by the sum of squares.
This is incorrect, since the variance of each interval is then correlated to the variance of
its neighbors. For example, the free energy difference between state 1 and state 2 and
between state 2 and state 3 both contain statistical information from state 2. A number
of alternative thermodynamic integration schemes have been proposed.37,38 However,
such schemes require some knowledge of the magnitude of the statistical uncertainty for
optimality. Other schemes use nonlinear fits to two different functional forms separately
describing Lennard-Jones and Coulomb contributions to the free energy.39 Such schemes
are not particularly flexible and introduce integration bias that is difficult to quantify. By
using cubic splines, we can obtain a higher order formula independent of functional form
11
of dU/d, while propagating error using the same formalism as is used in standard TI
(Eq. 4).
3.1.2 Exponential averaging (EXP) in two forms: deletion (DEXP) and insertion
(IEXP)28,40:
In exponential averaging schemes, the free energy change Gij is calculated using the
exponential average of the difference of the potential energies Uij between two states i
and j over one of the ensembles. The free energy difference as a function of potential
energy difference Uij and N samples is then
Gij
1 N
ln exp[ U ( xn )ij ]
N n1
(5)
This averaging is performed using samples from state i to compute potential energy
differences Uij from state i to state j. The free energy of the reverse process can be
computed using samples from state j and computing potential energy differences to state
i. Since the labels themselves are arbitrary, to remove ambiguity in the direction we will
describe such computations as being either deletion or insertion. We will call Uij
taken in the direction of decreasing entropy as an insertion step, and Uij taken in the
direction of increasing entropy as a deletion step, as inspired by Wu et al.41 Hence the
free energy method using a Uij which steps in the direction of increasing entropy in Eq.
5 is labeled as deletion exponential averaging (DEXP), and the free energy method which
uses Uij which steps in the direction of decreasing entropy in Eq. 5 is labeled as
insertion exponential averaging (IEXP).
In both cases, the variance 2ij between two
adjacent intermediate states can be estimated using standard point estimation theory as:
12
1
x
N x
2
ij
, x exp[ U ( xn )ij ]
In both the exponential averaging methods the overall free energy change G10
(6)
is
the sum of intermediate free energy changes Gij, and so the variance 210 is simply the
sum of the associated variances 2ij. In some cases the changes from the i state to the i-1
state and i+1 states might both be deletion or insertion cases; in this case, all the sampling
performed at i, and the two estimates of the free energy difference will not be statistically
independent. Complicated molecular changes will frequently involve both addition and
subtractions of phase space, and thus will fall somewhere in between these two general
schemes.
When we calculate the overall free energy change using a method which has inherent
directionality, like exponential averaging, then given our definitions, we need to make it
sure that the direction of entropy change remains intact throughout the process in order to
interpret the whole transformation as a deletion or an insertion process. Methane
solvation and anthracene solvation involve moving molecules from vapor to liquid phase,
resulting in a decrease in total entropy. However, in the dipole inversion case, we have a
symmetric transformation, and thus deletion and insertion happen within a single process.
In dipole inversion, going from a very large magnitude dipole to a small uncharged
intermediate, we have an increase of entropy as the water around the particle becomes
less structured, and thus we use the term deletion. From the intermediate uncharged
intermediate state to the reversed -/+ dipolar state (during the second half of the
inversion), we have the reverse process, and we use the term insertion consistent with the
13
entropy direction definition. To use the terminology IEXP, GINS, DEXP, and GDEL
pathways, we therefore need to combine mixed halves of what would typically be called
the forward and reverse pathways as illustrated in Figure 2.
Figure 2. Free energy differences of transitions in the direction of increasing and

decreasing entropy should be added separately to get the overall free energy for a dipole
inversion.
Although these particular sums are nonzero, they provide a consistent definition of the
statistical variance of insertion and deletion.
The statistical variance for symmetric
transformations will simply be the average of the variance for the deletion and insertion
processes.
14
3.1.3 Bennett acceptance ratio (BAR)42:

The Bennett acceptance ratio uses samples of the potential energy both from i to j and j to
i to obtain a provably minimum variance estimate of the free energy difference.
Calculation of free energy change between any two intermediate states through BAR
requires self-consistent solution of the two equations:
Nj
1 k 1 1 exp[ (U kj C )]
1 N
Gij ln Ni
C ln j
Ni

i
l 1 1 exp[ (U l C )]
C Gij
N
ln j
Ni
1
(7)
(8)
The first equation is true for any constant C, but when Eqs. (7) and (8) are solved selfconsistently, then Gij will have minimized variance. There exist a large number of ways
to solve the equations self-consistently which is beyond the scope of this paper. The
variance 2ij in free energy change can be estimated from:
2
1 f ( x) i
1
2
1 2
2
N i f ( x)
N j
i
2
ij
f 2 ( x)
1
f ( x) 2
(9)
Where f(x) is the Fermi function 1/(1+x) and x = (U-C). The total free energy change
is the sum over changes between consecutive intermediate states. Typically, the variance
in the full free energy is computed by assuming independent error and summing the
variance for consecutive intermediate states. However, the assumption that the errors add
independently is not correct, since the free energy difference from i-1 to i and from i to
i+1 states both depend on the potential energy at i, so their variances are not independent.
15
3.1.4 Unoptimized Bennett acceptance ratio (UBAR):

Eq. 9 in Section 3.4 is valid for any initial estimate of the free energy, though choices of
C not given by the implicit equation (Eq. 9) will not have minimum variance. If we make
the choice of C=-1ln(Nj/Ni), we no longer need to self consistently solve equations,
which can avoid saving, reading, and reprocessing all of the data, potentially saving
significant resources, at a cost of statistical efficiency.
We instead accumulate the
averages in Eqs. 7 through 9 as the simulation progresses. If each intermediate free

energy is relatively near zero, the free energy estimate will be close to optimal. Such an
estimator is directly equivalent to the minimum variance version of transition state Monte
Carlo, where Barker acceptance probability43 is used.44
3.1.5 Range-based Bennett acceptance ratio (RBAR):

If we keep track of the ensemble averages in Eqs. 11 and 12 for a range of trial values of
C, we will obtain a number of estimates of the best estimate free energy.44 Of these
estimates, the one that is closest to the input C will be the least biased and will have
minimum variance.
By choosing this particular estimate, we are essentially pre-
calculating the self-consistent solution. To apply this method, a range of starting values
of C is chosen. This trial C is fed as an initial guess and C is calculated using a single
iteration of Eq. 11, with corresponding G and then calculated. Accumulated averages
are maintained for each choice of G. A decent estimate of the range of C and G is
therefore a requirement for using this method. In some cases, it may end up being more
costly than BAR, as accumulated averages must be maintained for a certain number of
16
trial free energy values, instead of simply performing 5-10 self-consistent iterations.
However, the advantage of what we will call in this paper RBAR (range-based Bennett
acceptance ratio) is that data from each simulation step does not need to be retained for
post-processing as is required with BAR, and only the accumulated averages need to be
maintained.
3.1.6 Multistate Bennett acceptance ratio (MBAR)45:

MBAR is a method to find the free energies of K states simultaneously by minimizing the
KK matrix of variances of the free energy differences of these K states simultaneously.
The derivation of MBAR is a straightforward if mathematically difficult extension of the
derivation BAR to more than two states considered simultaneously. This can be a
significant improvement over BAR, which minimize variances of the free energy
differences for two states at a time. For MBAR, the equation:
Gi
exp[ U i ( xkn )]
Nk
ln
k 1 n1
N
k ' 1
k'
(11)
exp[ Gk ' U k ' ( xkn )]
is solved self-consistently for each Gi. Gij = G(j) - G(i) gives the free energy change
between two states i and j. The statistical variance of Gij, ij2 , is calculated using Eqs. 8
and 12 in the paper by Shirts and Chodera.45
3.1.7 Gaussian estimate of exponential averaging in two forms: deletion (GDEL) and
insertion (GINS)29:
17
If 2U = U2-U2 is finite, and we approximate the Uij distribution as a Gaussian,

the free energy can be expressed as a sum of moments of the energy differences46 by:
Gij U
ij
2U
ij
(12)
The variance over N samples of this free energy difference is given by:

2
ij
2U
N
ij
2 4U
ij
2( N 1)
(13)
If the distribution Uij is close to Gaussian, then this estimation method can minimize the
statistical effect of rare events, resulting in a more efficient and substantially simpler
estimate method. To remove ambiguities with respect to direction of the process, we use
the same convention of deletion and insertion as described for exponential averaging. In
Eq. 12, when Uij is in the direction of increasing entropy, we refer to this as the
Gaussian estimate with deletion or GDEL, and if we use Uij in the direction decreasing
entropy, we refer to this estimate as the Gaussian estimate with insertion or GINS.
Summing the free energy changes between intermediates again gives the total free energy
changes. Total variance is calculated assuming independent sampling at each state, which
is not an approximation here, as each calculation depends on samples from only one state.
The total free energy is calculated by summing over the free energy changes between
neighboring states.
3.2 Building molecular systems, setting up simulation framework and designing

statistical experiments.
18
(A) System preparation and simulation parameters.

Topologies for UA methane, dipole inversion, anthracene solvation test systems were
created by a combination of automated tools (Dundee PRODRG47, OpenEye libraries,48
ACPYPE49) and manual editing, and are available on the Alchemistry.org website,
http://www.alchemistry.org. Starting configurations were generated using GROMACS
4.0.4. The automated topologies were solvated using GROMACS genbox, and these
solvated systems were minimized with the L-BFGS50 (Low memory-BroydenFletcher
GoldfarbShanno) minimization method, followed by steepest descent minimization. All
systems were then equilibrated at constant volume at 300K for 100 ps, using Langevin
dynamics with a time step of 0.002 fs.51,
52
All hydrogen-containing bonds were
constrained using the SHAKE algorithm to a relative tolerance of 10-12. The systems were
then equilibrated at constant pressure at 1 atm using a Parrinello-Rahman barostat43 and a
Nose-Hoover thermostat44 for 100 ps. A coupling time constant of 5 ps was used for both
thermostat and barostat. A switching function was used for both PME and van der Waals
potentials. The PME switch started at 0.88 nm with a coulomb cutoff distance of 0.9 nm
for electrostatics. Other PME parameters were: Fourier spacing of 0.12 nm, 4th order Bspline interpolation and a Ewald tolerance of 10-8. A van der Waals switch at 0.8 nm and
cutoff distance of 0.9 nm was used. A long range van der Waals dispersion correction
was used for both energy and pressure.
3.2.1 values and spacing between intermediate states for free energy calculations:
Initial simulations (5 ns long, including 0.5 ns equilibration) were performed with 21
equally spaced values to guide the selection of the values for the main study.
19
Intermediate states were chosen so that each window contributed almost equally to the
total error, specifically insuring that the maximum variance among all windows was no
larger than the twice of the variance among all windows.
In order to examine the
change in bias, statistical error, and mean square error as a function of the spacing
between coupling parameter values, we choose two sets of states for each model: a
full set, and a sparse set.
3.2.1.1 Full set:

For the full intermediate set, states were chosen such that the total uncertainty in MBAR
was no greater than 0.1kJ/mol for methane solvation and no greater than 0.22 kJ/mol for
dipole inversion and anthracene solvation over the 5 ns initial simulation. For UA
methane solvation (MS), 8 intermediates were selected: = [0.0, 0.2, 0.4, 0.5, 0.6, 0.7,
0.8, 1.0] where = 0 denotes fully interacting UA methane in water and = 1 denotes
ghost UA methane, where there are no interactions with the solvent. For consistency of
sign between test cases, the reported results are in terms of the reverse process, the
solvation of methane. For dipole inversion (DI), we include 11 intermediate states: =
[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]. Here = 0 denotes a starting +/configuration of the dipole; = 1 denotes the reversed configuration of dipole i.e. -/+,
with = 0.5 a state with zero partial charges. For anthracene solvation (AS), the full set
has 15 total states, with = [0.0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85,
0.9, 1.0]. = 0 denotes fully interacting united atom anthracene in water; = 1 denotes
the vacuum state anthracene with no interactions with water. Again this corresponds to a
desolvation process, and in tables and charts, we report the free energy of hydration,
20
which is simply the reverse process and so includes a sign reversal. For RBAR, this
spacing means that the largest free energy between intervals is approximately 35 kJ/mol,
meaning we must use range of -40 to 40 kJ/mol, with increments of 1 kJ/mol.
3.2.1.2 Sparse set:

For the sparse sets, states were chosen such that the total uncertainty in MBAR was no
greater than 0.25 kJ/mol for MS and 0.4kJ/mol for DI and AS over the 5 ns initial
simulation with no fewer than 3 total states. For methane solvation, there are only three
states [0, 0.5, 1]. For dipole inversion, the sparse set was generated by picking every
alternate along with = 0.5 (which represents zero net charge) [0, 0.2, 0.4, 0.5, 0.6, 0.8,
1.0], reducing the number of states from 11 to 7. For the AS test set every third was
chosen to create the sparse set [0.0, 0.2, 0.5, 0.7, 0.85, 1.0], reducing the number of
states from 15 to 6. Note that we did not need to run the separate simulations for the
sparse states; we merely select for analysis only a subset of the states from the full set.
For RBAR, this spacing means that the largest free energy between intervals is
approximately 66 kJ/mol, meaning we must use a range of -70 to 70 kJ/mol, with
increments of 1 kJ/mol. In order to also test the bias with respect to number of states, for
each model we conduct a set of 5 ns simulations at 51 equally spaced states, with a
spacing 0.02 between two neighboring states.
21
3.2.2 Generating an ensemble of uncorrelated configurations

The most important use of the benchmark test set in this paper is to compare estimates of
the statistical error of different estimators of the free energy with direct sample error
obtained by repeating the experiment N times. For this purpose, we started by generating
100 uncorrelated starting configurations. Using the =0 state from the 5 ns test runs, we
used the GROMACS program g_analyze to compute the autocorrelation time of the
potential energy, kinetic energy, total energy, coulomb interactions, and dUpot/d for all
three systems. We used block averaging53 using the GROMACS g_analyze program to
compute the autocorrelation times. The autocorrelation time of potential energy was
chosen since it was the longest correlation time of all the observables examined. The
autocorrelation times of potential energy for UA methane solvation, dipole inversion,
anthracene solvation were 25 ps, 30 ps, and 25 ps respectively.
We selected configurations separated by 2 ns as our uncorrelated starting points. 2 ns is

more than 50 times longer than the 30 ps autocorrelation time of the potential energy.
Each of the three test systems were run for 20 ns and configurations were printed out
after every 2 ns. From each of these 10 parent configurations, 20 ns simulation were run
with different random seeds to generate a Maxwell-Boltzmann velocity distribution, and
configurations were printed out every 2 ns, giving a total of 100 uncorrelated starting
configurations for each system. These configurations were generated using a beta version
of GROMACS 4.5 also used for subsequent free energy configurations. Velocity Verlet
(option md-vv) was used as the integrator algorithm. A Nose-Hoover thermostat and the
MTTK54 barostat was used to control temperature and pressure, respectively.
22
The starting co-ordinate files for each intermediate state simulation, other than the first
state, were generated by running short 10 ps equilibration runs in series. The first state at
=0 used one of the 100 uncorrelated starting configurations as the starting co-ordinate
file. The starting configuration for any other state used the final configuration of the
previous equilibrated state. After this initial equilibration round, 5 ns of NPT simulation
were then performed for each initial configuration and each separate state. Data
corresponding to first 500 ps was discarded as equilibration. The remaining 4.5 ns of
equilibrium data for each model at each intermediate state were used for all subsequent
calculations.
(B) Statistical tests.

3.2.3 Quantifying accuracy and precision in uncertainty estimate of an estimator.
We estimate uncertainty for each free energy estimator in three ways. First, we compute
the sample standard deviation G2 - G2 from Gs computed from the series of
100 uncorrelated simulation runs described above. Second, we compute the analytical
estimates of error corresponding to each of the methods. Finally, we use the bootstrap
estimator for the standard deviation30.
3.2.3.1 Sample standard deviation: To compute the sample standard deviation, we take
the simulations started from 100 initial configurations and compute free energy
differences from each simulation to obtain a distribution of free energy differences. We
23
then directly compute the sample standard deviation corresponding to each individual
estimator from:
N
(G )
( G
i 1
Gi ) 2
N 1
(14)
Where N = 100, and where G is the mean over the 100 values of Gis. Crucially, the
standard deviation computed from a finite sized sample is itself a statistical quantity, and
must therefore have an associated uncertainty. Rigorously, in order to compute the
sample standard deviation of the uncertainty ((G)), we would need to repeat our 100
simulation experiment 100 times. Instead, we have used the bootstrap method (described
in the bootstrap estimate section) to estimate ((G)). From this exercise we finally get
G and (G) ((G)). G here is not an ensemble average but the average
over 100 repetitions.
3.2.3.2 Analytical estimate: Each free energy estimator has an associated uncertainty
estimator as discussed in previous sections, namely the square root of the estimated
variance of the total free energy. From 100 uncorrelated starting configurations, we will
obtain not only 100 Gs, but 100 error estimates from the methods analytical
uncertainty estimate. We denote the average and standard deviation of these estimated
uncertainties over all 100 independent runs as (G) ((G)), and call these the
analytical uncertainty estimate and the standard deviation of the analytical estimate
distribution.
24
3.2.3.3 Bootstrap estimate: For each of the 100 independent free energy calculations, we
also calculate a bootstrap error estimate, which is a rigorous and robust statistical
estimation technique.30 The bootstrap error is constructed as follows: From each set of
potential energy differences or dU/d values, we generate N bootstrap sets from the
original sample of molecular simulation data. To generate a bootstrap set, we first
subsample the data using an estimate of the autocorrelation time to obtain N statistically
uncorrelated values. For each bootstrap set, we draw N samples with replacement from a
set of uncorrelated measurements. For example, if our set was the integers {3,6,8,9},
then a bootstrap set would consist in randomly selecting each of the four numbers for
times; {3,3,8,9}, {9, 6, 6, 3}, and {8, 8, 8, 8} would all be valid sets, though clearly the
last one would be the rarest. We repeat this sub-sampling process many times; in this
case, we draw 200 bootstrap sets. For each of these 200 bootstrap sets, we compute the
free energy and uncertainties using the estimators as if they were the original data set.
This gives us 200 Gs, one for each of the 200 bootstrapped sets; standard rules of thumb
suggest using 50-200 bootstrapped sets to get robust estimates of uncertainty.30 The
average of these 200 bootstrapped Gbss gives Gbs. The bootstrap estimate of the
error, (G)bs is the sample standard deviation of the 200 Gbs values for each of the
initial configurations. The average (G)bs over all 100 initial configurations is the
bootstrap error estimate. The statistical uncertainty of this bootstrap uncertainty estimate
is estimated by computing the sample standard deviation over the 100 (G)bs, and is
denoted by ((G)bs).
25
A good free energy estimator should have an analytical uncertainty estimate that is
consistent with the sample standard deviation. If the analytical uncertainty estimate is not
accurate, then we have no way of knowing how accurate our free energy estimate is when
we run only a single calculation. This can be very problematic; for example, the analytic
estimate of the uncertainty of EXP diverges from the true estimate well before the error
in EXP itself.28 The smaller the difference between the direct sample standard deviation
error estimate (G) and the predicted or analytical error estimate (G), the better
we know the methods ability to predict error without having to run multiple trials. If the
statistics are well-behaved, the bootstrap estimate of the statistical uncertainty should also
agree with the sample estimate of the uncertainty. If we can show that the bootstrap
estimate agrees with the sample estimate of the uncertainty, then bootstrap error can
substitute for sample uncertainty estimates even when the analytical estimate fails.
3.2.4 Quantifying bias of free energy estimates

The average estimate from a statistically biased estimator, even if repeated an infinite
number of times, will still deviate from the true estimate by the amount of the bias.
There are typically two types of bias in the free energy estimators considered here.
Asymptotically unbiased estimators have bias with finite number of samples but
converge to a true answer in the limit of large numbers of samples. An example is the
nave estimator of the variance in the average of a set of numbers, which is var(a) = 1/N
[(A Ai)2]. This estimate can be shown to always be slightly too large. If N is
replaced by N-1, however, the estimator becomes unbiased. In the limit of very large N,
26
the difference is meaningless. Thus the nave estimator of the variance is not unbiased,
but it is asymptotically unbiased. Thermodynamic integration does not have asymptotic
bias, because at each intermediate state, the simple average of dU/d is unbiased for any
numbers of samples. Exponential averaging, BAR, and MBAR are only asymptotically
unbiased, though the bias of exponential averaging is usually significantly higher than
that of BAR and MBAR;28 careful design of the pathway can minimize this large bias to
some extent.55 However, unlike the simple case of the estimator for the variance of an
average, there exist no known unbiased versions of these estimators.
Bias also occurs due to using a limited number of intermediate states, because of lack of
either phase space overlap between intermediates in exponential averaging, as occurs in
acceptance ratio methods and exponential averaging, or because of numerical integration
error occurring in thermodynamic integration methods. Even for a large benchmark set
like the present study, computational expense and storage limits make it very difficult to
approach the number of sample limit and the number of intermediate states limit
simultaneously. Therefore, we estimate the two contributions to bias independently. For
asymptotic bias, we compare the results from combining a fixed amount of data in either
one large data set, or a series of shorter data sets. For bias as a function of number of
intermediate states, we vary the number of intermediates given fixed length simulations
to investigate bias as a function of number of intermediate states.
3.2.4.1 Bias due to number of samples: Data from all 100 5ns runs is stitched into a
single trajectory to mimic a single long trajectory. The data corresponding to
27
equilibrations was not included while estimating free energies, and since each simulation
contained 4.5 ns, stitching results in 450 ns total simulation data. The difference between
free energy estimates computed with 450 ns of data (G)450 and the same data used as an
average of 100 4.5 ns trajectories G is the bias due to number of samples.
3.2.4.2 Bias due to number of intermediate states: We also ran a set of simulations
with 51 states for each model as discussed above. We can estimate the bias due to the
number of intermediate states as the difference between free energy estimates computed
using 51 states (G)51, G estimated using the full states, and G estimated using the
sparse states. Besides having low and consistent error estimates, ideal free energy
estimation methods would show little or no bias in these tests. In many cases, we are
limited by the statistical uncertainty in determining the bias with high accuracy, since it is
computationally too demanding to generate 500 ns of simulation for all 51 states for all
test sets. In these cases, we can only determine if the bias is statistically insignificant
with respect to the statistical uncertainty. In the case of bias with respect to number of
samples, in most cases, asymptotic bias scales with 1/N while statistical uncertainty
scales with 1/N1/2, so statistical variance is usually dominant.
3.2.5 Quantifying reliability of a free energy estimator

Root mean square error (RMSE) is the square root of the sum of squared differences
between the sample and the true answer. Alternatively, we can write it as the sum of the
variance estimate (2) and the square of the bias. Mean square error is therefore the most
28
useful overall measure of the reliability of any statistically estimated observable.

Calculating a true RMSE requires collecting an infinite number of samples at an infinite
number of intermediate states to obtain the true answer. Since it is computationally
impossible to reach this limit, and computationally expensive to approach it, we use the
free energy estimate G450 from the 450 ns run to approximate the unbiased limit of free
energies given a fixed set of intermediates, and the free energy estimate G51 from 51
state run as the unbiased limit of the free energy estimate with respect to large numbers of
intermediate states, given a fixed amount of sampling.
From the two different biases
generated from these two reference states, we obtain two different estimates of mean
square error. Neither of these are true RMSEs because we lack the true reference answer.
However, these estimates of the RMSEs capture the combined effect of the statistical
error and the two different sources of bias. We define the two estimates of mean square
errors as MSE1 and MSE2:
MSE1i = i2 + Bias1i2, where Bias1i = (G)450 (Gest)i and 1 i 100
(15)
MSE2i = i2 + Bias2i2, where Bias2i = (G)51 (Gest)i and 1 i 100
(16)
RMSE1 and RMSE2 will be square roots of MSE1 and MSE2 respectively. The errors in
RMSE1 and RMSE2, (RMSE1) and (RMSE2) respectively, are the standard deviations
over the 100 RMSE1i and 100 RMSE2i. With the two quantities RMSE1 and RMSE2,
we can examine qualitative information about the reliability of the methods using all the
information from our experiments. We plot a bivariate Gaussian with variances equal to
RMSE1 and RMSE2 on mutually perpendicular axes, as shown in Figure 3(a), with the
analytical average of the free energy estimate G estimated by the method as the mean.
An example bivariate Gaussian RMSE plots is shown in Figure 3(b). Figure 3(c) shows
29
the top view of this Gaussian. The overall spread of the rings in Figure 3(c) is a measure
of overall reliability.
Figure 3. (a) Gaussians are plotted on mutually perpendicular axes. Both have the G
calculated using a method (here TI for methane solvation) as mean and RMSE1 and
RMSE2 as their standard deviations. (b) These are fused to generate a bivariate Gaussian
plot. (c) Top view of the bivariate Gaussian plot.
A poor estimator has either large and or unequal spreads in horizontal and vertical
directions, yielding a large circle or an ellipse. A good estimator has small and equal
spread in both the horizontal and vertical direction. Larger spread in one direction
indicates dominance of a single type of bias. In the above plot, the vertical axis
corresponds to the Gaussian with variance equal to RMSE2. Vertical spread indicates that
bias due to number of intermediate states dominates the uncertainty estimate, while
30
horizontal spread indicates bias due to number of samples dominates the uncertainty
estimate.
3.2.6 Validating the Gaussian distributions of the free energy differences.

When we express uncertainty in the free energy estimate in terms of a single number, the
variance or the statistical uncertainty, rather than as a distribution, we are implicitly
assuming that the errors are well described by a normal distribution, whose spread is
equal to the statistical variance.
Most analytical variance methods use propagation
estimates that are only rigorously true in the Gaussian limit. As long as the variances are
bounded (not infinite), the central limit theorem ensures that with enough samples,
variances will indeed converge to the Gaussian limit. However, for finite number of
samples, this assumption must be tested, not simply assumed, or else we run the risk of
underestimating the chance of large deviations (black swans) from the average value.
To test whether the shape of the free energy distribution is Gaussian or not, histograms of
free energy distributions are plotted against Gaussians using the free energy estimate as
mean and analytical uncertainty estimate as standard deviation for each method.
3.3 Results and discussion:

Results are presented in Tables 1-5 as well as in Figures 4-13. Tables 1 and 2 contain
results of the uncertainty analysis for full and sparse sets only for methane solvation.
Tables containing full results for dipole inversion and anthracene solvation can be found
in the supplementary material. Tables 3, 4 and 5 contain the free energy estimates
31
corresponding to 450 ns and 51 states runs, bias analysis, and reliability estimates of
free energy and uncertainty predictions for UA methane solvation, for full and sparse
sets. Figures 4-7 provide comparison of the accuracy and precision in free energy and
uncertainty predictions for all ten methods for the three test cases.
Tables of the
uncertainty analysis for the dipole inversion and anthracene hydration free energy test
cases are presented in the supplementary material as tables S1-S10. Bivariate Gaussian
plots presenting the analysis of reliabilities of free energy methods are presented in
Figures 8-10. Figures 11 and 12 compare the actual distribution of the free energies
(computed using 100 5ns simulations with full and sparse sets) with Gaussians using
the corresponding variance and mean estimates of the free energy methods, along with
the Gaussians of 450 ns trajectory solution and the 51 state simulation set for
comparison. In some plots, we omit GDEL and GINS, as their errors are significantly
larger than the scale of errors of the other methods in the plots.
3.3.1 Validation of uncertainty estimates. The first column in Table 1 is the free energy
change calculated as an average over the 100 uncorrelated repetitions. The next three
columns are the average of the analytical estimate of uncertainty over all repetitions
(G), the sample standard deviation (G) and the average bootstrap estimate of the
uncertainty over all repetitions (G)bs. The last column gives the percent deviation of
the analytical estimate of uncertainty from the sample standard deviation along with the
propagated error. Standard deviations {((G)),((G)), ((G)bs} for the error
estimate distributions are also reported.
32
We find that the analytic error estimate of most methods underestimates the true sample
standard deviation. In some of the methods this deviation is within either one or two
standard deviations, and thus likely within the statistical noise. Examining the data in
Tables 1, S2, and S3 for TI and TI3, analytical and sample estimates of uncertainty differ
by one or less standard deviation for all but the anthracene solvation with the full set,
where it differs by approximately two standard deviations, and is likely therefore noise.
For IEXP and DEXP, analytical and sample estimates also differ by one or in some cases
two standard deviations. However, analytical and sample uncertainty estimates for BAR,
RBAR and UBAR have differences of up to five standard deviations which indicates that
the BAR uncertainty estimates can be significantly inaccurate for multiple intermediate
states. In contrast, analytical uncertainty estimates with MBAR have less than one
standard deviation from sample uncertainty estimates. Over all methods, MBAR
analytical error estimates deviate least from the sample estimate.
The percentage deviation of analytical estimate from the sample uncertainty estimate is
listed explicitly in column six of Table 1. A negative percent deviation indicates
underestimation of uncertainty and a positive percent deviation indicates overestimation
of uncertainty by the analytical uncertainty estimate. The percentage deviation of the
analytical uncertainty estimate from sample uncertainty for TI is -78% and for MBAR
is -108%, in both cases, within the noise, while for BAR, the deviation from the true
uncertainty is -315%, which is clearly statistically significant. Large deviation of the
analytical estimate from sample standard deviation indicates poor accuracy of estimator
in estimating uncertainty. Similar deviations in analytical and sample estimates are seen
33
for UBAR and RBAR. As with BAR and its variants, analytical and sample estimates
GDEL and GINS are off by 305%.
Bootstrap uncertainty is clearly a robust alternative to the sample standard deviation for
all methods. Bootstrap estimates of the uncertainty (in column five of Table 1) are very
close to sample uncertainty (in column four of Table 1). Specifically in the case of BAR,
RBAR, UBAR, GDEL and GINS bootstrap estimates of uncertainty are better than the
analytical counterparts. In cases where the analytical estimate does not accurately predict
the sample standard deviation, such as with BAR, RBAR, and UBAR, the bootstrap
method provides a useful estimate of the error in the free energy without a need for
performing repeated sampling.
Figure 4 visually demonstrates the efficiency of free energy methods and the consistency
of uncertainty estimators. Short bars indicate precise free energy estimates. Equal height
bars indicate that the analytical and bootstrap uncertainties are consistent with the sample
standard deviation. For the full set MBAR, TI and TI3 predict free energies with the
highest precision and with the most reliable error estimate. BAR, RBAR and UBAR
analytical estimates have large deviations from the standard deviation estimate but the
bootstrap estimate closely matches sample standard uncertainty estimate. IEXP, DEXP
have the largest deviations particularly for large transformations like dipole inversion and
34

solvation using the full set. All quantities are in kJ/mol. G is not the ensemble
average but the average over 100 repetitions.
(G)
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
9.081
0.1070.002
0.1150.009
0.1060.006
-7.17.9
TI3
9.008
0.1110.003
0.1190.012
0.1100.007
-6.99.5
DEXP
8.984
0.3200.255
0.5560.122
0.3190.267
-42.547.5
IEXP
8.928
0.1040.002
0.1100.011
0.1030.006
-5.89.6
UBAR
8.936
0.0790.002
0.1060.010
0.0980.005
-25.37.5
BAR
8.933
0.0750.001
0.1090.010
0.0990.006
-31.36.3
RBAR
8.937
0.0750.001
0.1090.010
0.0990.006
-31.06.7
MBAR
8.929
0.0950.002
0.1060.010
0.0940.005
-9.98.6
GDEL
7.042
0.0930.002
0.1360.011
0.1210.008
-31.75.8
GINS
1.097
0.2530.008
0.3990.032
0.4000.169
-36.55.5
35

plotted for all the methods in three different test cases for full set. Consistent free
energy methods have bars equal in height, and the most accurate methods have the
shortest bars.
36

solvation using sparse set. All quantities are in kJ/mol.
(G)
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
2.545
0.175 0.007
0.177 0.014
0.175 0.011
-1.5 8.7
TI3
3.792
0.214 0.009
0.217 0.017
0.214 0.014
-1.5 8.7
DEXP
5.631
1.343 0.524
3.179 0.679
1.628 1.186
-57.8 18.8
IEXP
9.091
0.666 0.101
0.638 0.042
0.683 0.108
4.4 17.3
UBAR
8.954
0.344 0.018
0.358 0.024
0.351 0.024
-3.9 8.2
BAR
8.926
0.225 0.006
0.263 0.014
0.232 0.014
-14.4 4.9
RBAR
8.927
0.226 0.005
0.260 0.016
0.233 0.014
-13.2 5.6
MBAR
8.928
0.232 0.006
0.262 0.015
0.232 0.014
-11.4 5.4
GDEL
-3.833
0.112 0.004
0.189 0.014
0.200 0.016
-40.6 4.9
GINS
-1.68E+32
3E+30 34E+30
13E+32 10E+32
1E+32 15E+32
-99.7 2.7
37

plotted for six methods in three different test cases for sparse set. Four methods
(DEXP, IEXP, GINS and GDEL) are not graphed because they did not converge properly
for any of the three uncertainty estimates.
38
For the sparse set, as shown in Tables 2, S2 and S4 and Figure 5, TI and TI3 still show
the lowest percent deviation from sample standard deviation. However, the free energy
estimate of methane solvation is off by 6.5 kJ/mol for TI and 5 kJ/mol for TI3. GDEL
and GINS have clearly un-converged free energy and uncertainty estimates; the free
energy estimate of methane solvation for GDEL is off by 12 kJ/mol and for GINS is off
by ~1032 kJ/mol. This clearly indicates the failure of the Gaussian approximation of U.
The free energy estimate of methane solvation for DEXP differs from the converged
answer by 3 kJ/mol and its uncertainty estimate is 5 times larger than the largest
estimated uncertainty shown in Figure 5. IEXP differs from the converged answer by
only 0.2 kJ/mol but its uncertainty estimate is twice the largest plotted uncertainty.
MBAR again has the lowest and most consistent uncertainty estimates. However, unlike
with the full set, BAR and RBAR have uncertainty estimates which are as accurate as
MBAR to within statistical noise.
Bootstrap and analytical estimates of the error in the sparse set are slightly lower than
the sample standard deviation for acceptance ratio methods for methane solvation, though
not for the other two molecules. For this set of states, MBAR does not provide the
same clear advantage over BAR in estimating the uncertainty as with the full set. We
hypothesize that this advantage may only exist when the overlap between states is
somewhat high. However, even our full set uses relatively aggressive spacing for
typical free energy calculations. MBAR analytical estimates will generally have the
advantage over BAR analytical estimates as with MBAR it is not necessary to attempt to
determine whether the calculation is in the low overlap or high overlap regime.
39
3.3.2 Analysis of bias in free energy estimates: Statistical uncertainty is the most
important measure to quantify to understand the reliability of the free energy estimate but
understanding systematic bias is also important. By comparisons with free energy
estimate using large number of samples we can test whether or not the method is
asymptotically biased. By comparisons with the free energy estimate using large number
of intermediate states we can find how sensitive a methods accuracy is to the overlap
between intermediate states. Table 3 shows the free energy and uncertainty estimates
predicted by different methods for UA methane solvation for large number of samples
(450 ns trajectory) and large number of intermediate states (51 states). Tables 4 and 5
include estimates of both types of bias in free energy estimates, with respect to number of
samples and with respect to number of intermediate states, and the corresponding root
mean square error estimates for UA methane solvation.
For UA methane solvation,
MBAR, BAR, RBAR, UBAR, TI, and TI3 have very low bias in free energy estimates
with respect to number of samples. The acceptance ratio methods, MBAR, BAR, RBAR,
and UBAR have biases with respect to number of states within the statistical noise. TI
and TI3 show larger bias in free energy estimates with respect to number of states than
the other methods. DEXP and IEXP show large biases in free energy estimates both with
respect to number of samples and states. GDEL and GINS show the largest bias with
respect to number of states. All methods show a larger bias with respect to the number of
states compared to the bias with respect to number of samples. However, this may be
an artifact of the lower precision of bias determination as a function of states.
40
Table 3. Free energy estimates and corresponding uncertainty estimates in the large
number of samples (450 ns) and large number of intermediate states (51 states) for UA
methane solvation. Bootstrap estimates are reported as they are better than analytical
estimates. All quantities are in kJ/mol.
((G)
((G)
((G)
Method (G))450,full
(G))450,sp
(G))51,full
TI
9.0850.010
2.5410.017
8.9200.041
TI3
9.0160.010
3.7860.021
8.9230.041
DEXP
9.0830.083
12.6574.018
8.9210.043
IEXP
8.9320.010
8.9860.074
8.9280.040
UBAR
8.9390.010
8.9300.035
8.9220.040
BAR
8.9390.010
8.9200.023
8.9210.040
RBAR
8.9400.010
8.9230.024
8.9220.040
MBAR
8.9360.009
8.9210.023
8.9240.036
GDEL
7.0480.012
-3.8370.020
8.8470.043
GINS
1.1010.041
-1.5E+323.2E+29
8.8410.040
The subscript (450,full) denotes the free energy estimate for 450 ns and full lambda set.
(450,sp) denotes the same for sparse lambda set.
41
Table 4. Bias estimates due to number of samples and number of lambda states for full
and sparse sets for UA methane solvation. All quantities are in kJ/mol.
(Bias1
(Bias2
(Bias1
(Bias2
(Bias1))450,full
(Bias2))51,full
(Bias1))450,sp
(Bias2))51,sp
TI
-0.0050.015
0.1600.042
0.0050.024
-6.3740.045
TI3
-0.0050.015
0.0880.042
0.0060.030
-5.1310.046
DEXP
-0.1000.092
0.0620.059
-7.0444.021
-3.3080.150
IEXP
-0.0020.014
0.0020.041
0.0970.100
0.1550.078
UBAR
-0.0030.013
0.0140.041
0.0240.049
0.0320.053
BAR
-0.0030.012
0.0150.041
0.0090.032
0.0080.046
RBAR
-0.0050.012
0.0130.041
0.0040.033
0.0050.046
MBAR
-0.0070.013
0.0050.037
0.0070.033
0.0040.043
GDEL
-0.0030.015
-1.8020.044
0.0010.023
-12.6830.044
GINS
-0.0010.048
-7.7410.047
-1.4E+313.4+E30
-1.7E+323.4E+30
Method
42
Table 5. RMSEs and statistical uncertainties in RMSEs in UA methane solvation free

energy estimate are largest for GINS and GDEL in sparse set while the acceptance ratio
methods consistently show low RMSEs and the corresponding uncertainty in the RMSEs.
All quantities are in kJ/mol.
((RMSE1)
((RMSE2)
Method (RMSE1))full (RMSE2))full
((RMSE1)
((RMSE2)
(RMSE1))sp
(RMSE2))sp
TI
0.1480.054
0.2080.084
0.2370.076
6.3770.178
TI3
0.1530.059
0.1720.069
0.2900.093
5.1350.218
DEXP
0.5310.474
0.4900.512
7.5962.138
4.4781.837
IEXP
0.1420.051
0.1420.052
0.8970.271
0.9040.273
UBAR
0.1210.054
0.1210.055
0.4730.147
0.4730.148
BAR
0.1200.055
0.1200.056
0.3300.103
0.3300.103
RBAR
0.1200.055
0.1210.056
0.3310.102
0.3310.102
MBAR
0.1320.052
0.1320.052
0.3340.102
0.3340.102
GDEL
0.1520.065
1.8070.137
0.2010.090
12.6820.190
GINS
0.4270.205
7.7480.402
3.1E+321.5E+33 1.5E+321.5E+33
Figures 6 and 7 show the biases for different methods for all three test cases. When error
bars are larger than the bias bars, for example, for MBAR in methane solvation,
comparisons for accuracy between free energy methods become difficult because the bias
is lost in the statistical noise. DEXP and IEXP have significant bias for all the cases
43
except in methane solvation. For UA methane solvation TI has the longest bar indicating
the largest bias due to number of states.
MBAR, BAR, UBAR, RBAR have consistently less bias compared to other methods and
hence are more accurate for estimating free energy. TI, TI3 rank next in accuracy in
prediction of free energy estimates followed by IEXP, DEXP and finally GDEL, GINS.
For dipole inversion, DEXP and IEXP show the largest bias both with respect to results
from large number of samples and large number of intermediate states. All other methods
for dipole inversion, including GINS and GDEL show almost equal biases within
statistical error limits. For anthracene hydration free energies BAR, UBAR, RBAR,
MBAR show moderate biases for the full set compared to TI and TI3, but these results
are again likely to be noise (see Table S9). DEXP and IEXP as usual show high biases.
44
Figure 6. Bias plots for different test cases for the full set. DEXP, TI and TI3 show
numerically significant bias for methane, while all methods show moderate bias with
respect to number of states for anthracene.
45
Figure 7. Bias plots for different test cases or sparse set. DEXP and IEXP show large
biases both due to number of samples and due to number of intermediate states.
46
3.3.3 Overall reliability of a free energy estimator: The knowledge of bias and
uncertainty can now be put together to analyze the reliability of a method in estimating
the free energy. The bivariate Gaussian plots (Figures 8, 9 and 10) show the reliability of
a method in predicting the free energy and the uncertainty in the predicted free energy for
a given test case. For each figure, the first two columns contain reliability plots for full
state runs and the last two show the results of the sparse state runs.
Figure 8 for UA methane solvation, shows that MBAR, RBAR, BAR and UBAR have
small and equal spreads in both horizontal and vertical directions indicating that these are
reliable estimates of free energy both for sparse as well as full sets. TI and TI3 give
reliable estimates of free energy only in full state runs but are dominated by bias due to
number of intermediate states with the sparse set. IEXP has lower RMSE compared to TI
and TI3. GDEL and GINS are unreliable in both the full and sparse sets.
In Figure 9 for dipole inversion, free energy estimates from TI and TI3 are comparable in
reliability with MBAR, BAR, RBAR, UBAR. GDEL and GINS work for dipole
inversion given the poor performance in other methods. This can be explained in the light
of the work done by Hummer, Pratt and Garcia on free energy of ionic hydration, 56 in
which they found that that electrostatic potential energy distribution follows Gaussian
behavior.
47
In Figure 10, we see that GDEL, GINS, DEXP are unreliable estimators of anthracene
hydration free energy for both the full and sparse set, with both low accuracy and
precision in their predicted free energy and uncertainty estimates. Anthracene solvation
free energy is a harder problem and no method is as accurate as with the other two
molecular cases. TI, TI3, IEXP, BAR, RBAR, UBAR, and MBAR perform equally well
within noise for full set. In the sparse set IEXP and UBAR become significantly
worse than the other methods with TI being slightly worse. Improved performance of TI
relative to acceptance ratio methods is because the sparse set for anthracene solvation
case (4 states between end points) is not as aggressive as the methane solvation case
(only 1 state between end points).
48
Figure 8. Bivariate Gaussian plots for UA methane solvation. Note how TI and TI3 fail
in reliability test for sparse set. GDEL and GINS are not at all reliable. MBAR and
BAR are accurate but the estimate of the precision in BAR is misleading as it
underestimates uncertainty, as shown in Figures 4 and 5. SP after the method name
indicates the results of the sparse set.
49
Figure 9. Bivariate Gaussian plots for dipole inversion. TI is reliable for this molecule,
but DEXP, IEXP, GDEL and GINS again are the least accurate and precise of all the
methods studied.
50
Figure 10. Bivariate Gaussian plots for Anthracene hydration free energy. The effect of
bias due to intermediate states is evident in almost all methods as all spreads are elliptical
in vertical direction. TI3 and MBAR both appear reliable, especially for the full set.
51
3.3.4 Testing the Gaussian distribution of free energies: Asymptotic error estimate
methods assume normal distribution of error, as does the use of a standard deviation to
describe the error distribution. It is important that we check to see if this assumption is
actually valid. Figure 11 demonstrates graphically the distribution of free energy results
for each test case, comparing it with a Gaussian with the mean and the variance of the
distribution. The free energy distributions are plotted as histograms with the optimal bin
width calculated from Scotts formula57, with optimum bin width h = 3.5 / n(1/3), where n
is the number of samples i.e. 100, and is the standard deviation. For calculating the
optimum bin width for the first eight methods we have used the mean of the standard
deviations from the 100 repetitions predicted by MBAR. For GDEL and GINS, their own
mean of predicted uncertainties are used to calculate the optimum bin width because
uncertainties estimated with GDEL and GINS significantly drastically from that of
MBAR.
In Figure 11, the blue curve is the histogram of free energies and the black curve is the
Gaussian with the mean and the standard deviation estimated by the specified method.
The shapes of the blue curve should match the black curve within noise if the free energy
distribution is indeed Gaussian. Additionally two more Gaussians are plotted, one
(magenta) with the mean (G)450ns and standard deviation ((G)450ns), and the other
(cyan) with the mean (G)51states and standard deviation ((G)51states ). If the mean of the
blue curve matches the means of the cyan and magenta curves, it indicates low bias in
free energy estimate due to both number of samples and number of states. The tighter the
spread of black and blue curves, the more precise the free energy estimate will be.
52
standard deviation ((G)51states ) (in cyan) for the full set for methane solvation.
53
standard deviation ((G)51states ) (in cyan) for sparse set for methane solvation.
54
In Figures 11, 12, and S1-S4, we see that when the variances are well converged then
distributions of free energies from all methods except DEXP well approximate the
Gaussian, even when the variances are large. However, in several cases the variances are
so large that the comparisons become statistically meaningless. Besides DEXP, which
fails in all cases, for dipole inversion IEXP and UBAR are sufficiently noisy that the
comparison to a Gaussian is problematic. For the sparse set IEXP and UBAR fail
completely to be Gaussian. Similarly, for anthracene solvation IEXP and UBAR fail to be
Gaussian for the sparse set. The 51 state results are omitted from GDEL and GINS
plots as they lie outside the plot axis limits. Interestingly, we find that typically errors are
distributed normally with the variance given by the analytical estimate even if the bias in
the free energy is very large.
3.3.5 Convergence properties of free energy estimates: The true free energy estimate is
not necessarily the experimental value of the free energy change of the process, but
instead the infinite sampling limit of the particular choice of molecular model. We see
that with large number of intermediate states all methods converge to the same value
(Figure 13), whereas the 450 ns results with the sparse data set vary for different
methods, with as usual large deviations seen in GDEL and GINS, indicating significant
bias with respect to the overlap between states. Given sufficient sampling, increasing the
number of states appears to be the best way to obtain asymptotic convergence to the
true answer for the model.
55
Figure 13. Free energies estimated using large number of intermediate states and a large
number of samples converge to a single value for all free energy estimators.
3.3.6 Amount of time required for free energy estimation methods: We chose the
calculation of anthracene solvation over 4.5 ns with the full set as our test system to
compare computational time required by methods to compute free energies, because the
system has the largest number of intermediate states with large free energy changes
between states. We report the time required to calculate the free energies 201 times (for
the original set and for 200 bootstrap sets) to eliminate variability caused by
computational overhead in single calculations. The time required to read in data, make
bootstrap samples and perform bookkeeping required by all methods was 249.5 sec for all
methods, and subtracted from the total time to yield the computational time of each
method alone.
56
Table 6. Time consumed by different methods to calculate the free energy 201 times of
Method Time taken (s)
TI
0.2
TI3
5.8
DEXP
13.5
IEXP
11.5
UBAR
15.3
BAR
93.5
RBAR
1148.2
MBAR
4913.5
GDEL
4.0
GINS
4.3
From Table 6 it is evident that MBAR takes the longest of all methods as it processes
information from all the intermediate states to give an estimate of free energy and
uncertainty. RBAR is the next most computationally costly. RBAR takes more time
compared to BAR because multiple BAR calculations are performed over a range of free
energies at each intermediate stage if a large range of possible values for the selfconsistent constants are evaluated. UBAR takes less time compared to BAR because only
a single iteration is performed at each stage. DEXP and IEXP are similar in cost UBAR.
57
GDEL and GINS take less time compared to DEXP and IEXP. TI3 takes slightly longer
compared to TI as it fits a spline of higher degree, but both are much cheaper than any
others. However, the total time required even by MBAR is orders of magnitude less than
the time required to perform the sampling, so the higher cost of MBAR is not an obstacle
in most cases.
3.4. Conclusions:
In this paper we have proposed the first iteration of a set of test sets which can be used
for benchmarking free energy calculation methods for small molecule solvation. We have
demonstrated the utility of this test set by comparing ten equilibrium free energy methods
on three test cases for molecular solvation, with different spacing between intermediate
states.
We estimated the uncertainty in three different waysthe sample standard
deviation, the analytical estimate, and the bootstrap estimateas well as the uncertainty
in each of these estimates of the uncertainty. We also calculated biases in free energy
estimates at the large number of samples limit and large number of intermediate states
limit separately. We graphically demonstrated the effect of the variance and two separate
types of bias by bivariate Gaussian plots expressing the overall reliability of the methods.
We demonstrated that bootstrap sampling accurately predicts the properties of the sample
distribution observed from 100 independent simulations for all the free energy methods.
We also showed that the histogram of free energies from 100 independent simulations
has a Gaussian form for TI, TI3, BAR, RBAR, UBAR, MBAR, GDEL and GINS, but
that IEXP and DEXP deviate from the usual trend.
58
We have found that MBAR is the most reliable of all free energy estimators, showing
consistency in accuracy and precision in both free energy and uncertainty prediction. TI
and TI3 are better uncertainty estimators compared to BAR, UBAR, RBAR, with equal
performance to MBAR when sufficient intermediate states are included, but are biased
with respect to the number of intermediate states. When the U/ vs. curve has low
curvature, such as in dipole inversion both TI and TI3 are equally reliable. But when the
curve is non-linear i.e. when LJ spheres grow or disappear TI3 gives better estimates of
free energy than TI. BAR and RBAR have relatively negligible bias but their uncertainty
estimates are frequently underestimated by 25 to 30% when overlap between states is not
negligible. UBAR is often as good as BAR and RBAR, but can fail with low numbers of
intermediate states. IEXP and DEXP are less reliable than TI and acceptance ratio
methods, and should be avoided if samples can be collected from all intermediates in
both the forward or back direction, or if the derivative of the Hamiltonian along the
pathway can be computed. IEXP does works in some cases but in general IEXP and
DEXP give poor estimates for uncertainty and free energies. GINS and GDEL do not
compare well with the other methods in all the test systems except dipole inversion test
case. They only work if there is large number of intermediate states, or if the distributions
are inherently Gaussian. However, even here they are not as accurate or precise as the
other methods.
Computationally, MBAR is the most expensive, but the amount of time required is orders
of magnitude less than the time required for collecting data. UBAR takes less time
compared to RBAR and BAR and should only be considered as a quick and easy (but not
so reliable) alternative to BAR and RBAR. RBAR requires some knowledge of the
59
maximum free energy gap, but does not require storing all the energy data. BAR requires
storing energy data but requires no knowledge about the size of the maximum free energy
difference. GDEL and GINS take less time compared to DEXP and IEXP but they
heavily sacrifice accuracy and precision for speed in virtually all cases. Finally, TI takes
least time to estimate free energies and uncertainties but a little extra computation (fitting
cubic splines) in TI3 improves the accuracy in free energy estimate using thermodynamic
integration. However, the improvements from using cubic splines are not enough to
prevent TI3 from still having significant amounts of bias when there are few intermediate
states. We summarize these conclusions in Table 7.
Table 7. Summary of all statistical tests are presented.
Reliability of free energy estimate (high to low)
MBAR>BAR=RBAR>UBAR>TI3>TI>IEXP>DEXP>GDEL=GINS
Reliability of uncertainty estimate (high to low)
MBAR>TI3=TI>BAR=RBAR>UBAR>IEXP>DEXP>GDEL=GINS
Computational cost (high to low)
MBAR>RBAR>BAR>UBAR>IEXP=DEXP>GDEL=GINS>TI3>TI
Is distribution of estimated free energies Gaussian?
Yes for {MBAR, RBAR, BAR, UBAR, TI, TI3, GDEL, GINS}
No for {IEXP, DEXP}
Is bootstrap better than analytical uncertainty estimate?
Yes for {RBAR, BAR, UBAR, GDEL, GINS}
Both equally good for {MBAR,IEXP,DEXP,TI,TI3}
60
4. Exploring computational efficiency of multiple electrostatic potential parameter

sets using data sampled at a single set.
Simulation parameters associated with the calculation of non-bonded interactions namely

Coulomb and Lennard Jones interactions, inherently specify the amount of approximation
we want to introduce in force/potential calculations. Strict parameters increase simulation
time but there is a gain in accuracy of calculations. However, gains in accuracy become
insignificant after a certain limit. Most of the research effort focused on bringing down
the computational cost is associated with the non-bonded interactions, specifically by
tuning Particle Mesh Ewald58 (PME) parameters used for calculating electrostatic
potential. PME is a popular variant of the original Ewald technique. Ewald's trick is to
break down the slowly converging Coulomb potential to two rapidly converging sums,
one in real space and the other in Fourier space. While calculating Coulomb potential
using the Ewald sums, we assume that the real space sum dies off well before the
Coulomb cut off, Rc, and the contributions to the Fourier space sum is only due to the
first Nc wave vectors59. These approximations always introduce errors in any
thermodynamic property estimation in varying amounts depending on how much
accuracy is sacrificed to gain speed. This tradeoff between speed (how fast you can run
your simulations) and accuracy (how accurately the forces can be calculated) generates a
multidimensional optimization problem.
PME parameter space is multidimensional and different PME parameters affect

computational cost and accuracy in different ways. PME parameters sets can be
61
generated using different choices of order of interpolation, Ewald tolerance, Fourier

spacing, Coulomb cutoff, and switching distance before cutoff. Order of interpolation
refers to the charge interpolation function used in PME. Originally it was a Lagrange
interpolation. However, an enhanced PME utilizes the B-spline interpolation function,
which is smoother and allows higher accuracy by simply increasing the order of
interpolation. The smoothness of B-spline interpolation allows the force expressions to be
evaluated analytically, with high accuracy, by differentiating the real and reciprocal
energy equations, rather than using finite difference techniques. Ewald tolerance gives
the relative strength of the Ewald-shifted real space potential at the Coulomb cutoff. A
small tolerance will give a more accurate direct sum but the number of wave vectors for
the reciprocal sum has to be increased. Fourier spacing is the maximum grid spacing for
the Fast Fourier Transform grid when using PME. For Ewald, the highest magnitude of
wave vector to be used in the reciprocal space in each of the x, y, and z directions is
given by the product of grid spacing and box dimensions. Coulomb cutoff limits the
number of particles participating in the real sum to the ones falling within the cutoff
sphere. The switching distance before cutoff is like a buffer region, where with the help
of a polynomial function, the Coulomb potential is smoothly and gradually made zero
instead of an abrupt cutoff.
The main reason to explore the PME parameter space is to choose simulations that give
sufficient accuracy for the application, but are fast enough. Usually this is done through
the advice of experienced practitioners or by choosing defaults suggested in the
62
simulation package. Abraham31 and Wang32 propose rational ways of selecting PME
parameters. They tune the PME parameters by minimizing the computational cost for a
given accuracy in force calculations. Their technique involves full scale molecular
simulations at different sets of parameters; they iterate over parameter sets until they
reach a desired accuracy in force calculations. In this study, we introduce an alternate
method to select PME parameters which involves evaluating the computational efficiency
of multiple PME parameter sets with data sampled from a single set. The simulation is
run only once. New energies are evaluated by reading the same trajectory but with new
PME parameter values,. The new evaluated energies along with the sampled energies
can be used to evaluate observables at the new PME parameter sets using MBAR. The
aim is to find the set of PME parameters which, for a given level of error in our desired
observable, results in minimum simulation time.
We use the three molecular transformations defined in our benchmark set as test systems
since these represent a substantial set of alchemical changes. PME parameters good for
these three systems should be good for simulating analogous transformations.
The
observables we choose for this study are (a) free energy estimate of the three molecular
transformations (b) enthalpy change during molecular transformations which for methane
and anthracene solvation is simply the enthalpy of solvation (c) molar heat of
vaporization of water.
63
4.1 Methods
4.1.1 Difference between free energy estimates calculated with two different PME
parameter sets
A free energy estimator requires samples generated using equilibrium simulations
performed at the PME parameter set of interest. Calculating free energy using methods
discussed in Chapter 3 requires simulation at each and every set of PME parameters.
Carrying out these simulations will tremendously increase the computational costs and
make this process impractical. However, MBAR can estimate free energies and
equilibrium expectation values of enthalpies corresponding to different simulation inputs
without running new sets of equilibrium simulations. This is achieved by using the
samples from equilibrium simulations corresponding to just one simulation input set and
energies calculated for different simulation inputs using the same trajectory. Let us look
at the free energy estimating equation of MBAR again.
Gi
exp[ U i ( xkn )]
Nk
ln
k 1 n1
N
k ' 1
k'
exp[ Gk ' U k ' ( xkn )]
Here K is the total number of states and Nk is the total number of uncorrelated samples
available from an equilibrium simulation at state k. Gi is the free energy associated with
state i. Ui(xkn) is the potential energy of the nth sample belonging to equilibrium simulation
of state k but evaluated at state i. Now suppose we have equilibrium samples from a
simulation done with simulation input set Si1 which has K states defining full molecular
transformation. We can also calculate the potential energies using simulation set Si2,
which also has K equilibrium states. The coordinates are read from the trajectory and new
64
energies are calculated according to the new set of simulation inputs. We can then create
a 3 dimensional matrix of size (2K, 2K, Nk ) as defined below.
U 0 ( x0,1N 0 )
U 0 ( x K 1,1N K 1 )
U
U 0 ( x K ,1N K )
U 0 ( x 2 K 1,1N 2 K 1 )
U K 1 ( x0,1N 0 )
U K ( x0,1N 0 )
U K 1 ( x K 1,1N K 1 ) U K ( x K 1,1N K 1 )
U K 1 ( x K 1,1N K )
U K ( x K 1,1N K )
U K 1 ( x 2 K 1,1N 2 K 1 ) U K ( x 2 K 1,1N 2 K 1 )
N0

N K 1
N
NK

N 2 K 1
U 2 K 1 ( x0,1N 0 )
U 2 K 1 ( x K 1,1N K 1 )
U 2 K 1 ( x K 1,1N K )
U 2 K 1 ( x 2 K 1,1N 2 K 1 )
(17)
The top left quarter of the U matrix has energies calculated at K states using
configurations from K different equilibrium simulations. The top right quarter of the U
matrix has energies evaluated at K states but with different simulation inputs using the
same configurations from previous K equilibrium simulations. These form a new set of K
thermodynamic states. Notice that now we have a total of 2K thermodynamics states.
U0(xo.n) evaluated at simulation input set Si1 is different compared to U0(xo,n) evaluated at
simulation input set Si2. U0(xo,n) evaluated with Si2 in the U matrix is present as a
different thermodynamic state UK(x0,n). Similarly UK-1(xo.n) evaluated at simulation input
set Si1 is different compared to UK-1(xo,n) evaluated at simulation input set Si2, which in
the U matrix appears in the last column as a different thermodynamic state U2K-1(x0,n). The
same holds true for other intermediate states as well and hence we have now a total of 2K
thermodynamic states.
65
Since we only have samples from the first 0 to K-1 states, NK = NK+1== N2K-1=0. Notice
that since NK = NK+1== N2K-1=0 the energies in the bottom half of the U matrix do not
participate in the calculation of Gi in the MBAR formula. In fact we dont have these
energies, as we have not done any equilibrium simulations at simulation input set Si2 and
hence dont have samples or configurations. Still we can get estimates of free energy at
these states using the information we have. We can re-write the U and the N matrices as:
U 0 ( x k ,1N k )
U ( x
)
U K 1 k ,1N k
0
U K 1 ( x k ,1N k ) U K ( x k ,1N k )
U K 1 ( x k ,1N k ) U K ( x k ,1N k )
0
0
0
0
U 2 K 1 ( x k ,1N k )
U 2 K 1 ( x k ,1N k )
N0

N K 1
N
0
(18)
The U matrix and the N vector are the inputs to the MBAR Eq. 11 which is solved selfconsistently for Gi for all 2K states. Once we have solved for the free energies we can we
can easily write a matrix G which has pair wise free energy difference estimates.
G0 G0
G G K 1
G 0
G0 G K
G0 G2 K 1
G K 1 G0
G K 1 G K 1
G K 1 G K
G K 1 G2 K 1
G K G0
G K G K 1
GK GK
G K G2 K 1
66
G2 K 1 G0
G2 K 1 G K 1
G2 K 1 G K
G2 K 1 G2 K 1
(19)
(G0 G0 )
(G0 G K 1 )
(G )
(G0 G K )
(G0 G2 K 1 )
(G K 1 G0 )
(G K G0 )
(G K 1 G K 1 ) (G K G K 1 )
(G K 1 G K )
(G K G K )
(G K 1 G2 K 1 ) (G K G2 K 1 )
(G2 K 1 G0 )
(G2 K 1 G K 1 )
(G2 K 1 G K )
(G2 K 1 G2 K 1 )
The uncertainties in the pair wise difference estimates can be estimated using Eq. 12 by
Shirts and Chodera45. The free energy and the uncertainty estimates can be directly read
from these matrices for simulation input set Si1 (GK-1 - G0)(GK-1 - G0) and with
simulation inputs Si2 (G2K-1 GK)(G2K-1 GK). The difference in the free energy
estimates (G)Si2-Si1 is simply the difference between the two free energy estimates.
(G)Si2-Si1 =(G)Si1 - (G)Si2 = (GK-1 - G0) -(G2K-1 GK)
(20)
The variance estimate of (G)Si1-Si2, is not the trivial sum of the variances in (G)Si1,
var(G)Si1 and in (G)Si2, var(G)Si2, since the two free energy estimates, (G)Si2 and
(G)Si1, are correlated. We derive the variance equation for (G)Si2-Si1 using the original
covariance formula.
(G)Si2-Si1 = cov((GK-1 - G0) -(G2K-1 GK) , (GK-1 - G0) -(G2K-1 GK) )
= cov((GK-1 - G0), (GK-1 - G0)) + cov((G2K-1 GK), (G2K-1 GK))
- 2 cov((GK-1 - G0),(G2K-1 GK))
(21)
Where cov is the covariance. The above equation finally yields the following:
(G)Si1-Si2 = cov(GK-1, GK-1) + cov(G0, G0) + cov(G2K-1, G2K-1) + cov(GK, GK)
- 2 {cov(GK-1, G0) + cov(G2K-1, GK) + cov(GK-1, G2K-1) - cov(GK-1, GK)
- cov(G0, G2K-1) + cov(G0, GK)}
67
= var(GK-1) + var(G0) + var(G2K-1) + var(GK)

- 2 {cov(GK-1, G0) + cov(G2K-1, GK) + cov(GK-1, G2K-1) - cov(GK-1, GK)
- cov(G0, G2K-1) + cov(G0, GK)}
(22)
Each term in the above equation can be easily extracted from the asymptotic covariance
matrix, ij estimated by Eq. 8 in the paper by Shirts and Chodera.45
4.1.2 Difference in enthalpy of transformation between two PME parameter sets

We now show the protocol of estimating enthalpy of transformation, i.e., enthalpy
difference between the initial and the final states (H=1-H=0) (H=1-H=0).
MBAR can again be used to calculate equilibrium expectation values H=1 and H=0
and the associated uncertainties (H=1) and (H=0) using Eqs. 15 and 16 from the
paper by Shirts and Chodera.45 The energy term in the U matrix is equal to total internal
energy (Uint) plus the pressure times volume term (PV). The expectation value of Uint +
PV is equivalent to the enthalpy of a given state. We can use U as an input matrix to Eqs.
15 and 16 as it contains Uint + PV samples and we need expectation values of enthalpies
at the initial and the final state. Solving Eqs.15 and 16 we can get equilibrium expectation
values of enthalpies at all states and corresponding uncertainties in the expectation
values. We can also get the pair wise difference in expectation values between two states,
and associated uncertainties.
68
H 0
H 0
H K 1
K 1
H
H K
H K
H 2 K 1
H 2 K 1
H 0
H 0 H K 1
H 0 H K
H 0 H 2 K 1
H
( H 0 H 0)
( H H
)
0
K 1
(H )
( H 0 H K )
( H 0 H 2 K 1 )
H 0
H K 1 H K 1
H K 1 H K
H K 1 H 2 K 1
H
K 1
H 0
H K H K 1
H K H K
H K H 2 K 1
H
H 2 K 1 H K 1
H 2 K 1 H K
H 2 K 1 H 2 K 1
H
( H K 1 H 0 )
( H K H 0)
( H K 1 H K 1 ) ( H K H K 1 )
( H K 1 H K )
( H K H K )
( H K 1 H 2 K 1 ) ( H K H 2 K 1 )
2 K 1
( H 2 K 1 H 0 )
( H 2 K 1 H K 1 )
( H 2 K 1 H K )
( H 2 K 1 H 2 K 1 )
(23)
The difference, or the error between the enthalpy differences simulated with two different
sets of simulation inputs can then be easily extracted from the H matrix or the H vector.
(H)Si2-Si1 =(H)Si1 - (H)Si2 = (HK-1 -H0) - (H2K-1 - HK)
(24)
The uncertainty in (H)Si2-Si1 is again not trivial. It is derived using the same formalism
as was used for calculating (G).
(H)Si2-Si1 = cov((HK-1 - H0) -(H2K-1 HK) , (HK-1 - H0) -(H2K-1 HK) )
= cov((HK-1 - H0), (HK-1 - H0)) + cov((H2K-1 HK), (H2K-1 HK))
- 2 cov((HK-1 - H0),( H2K-1 HK))
69
(25)
(H)Si1-Si2 = var(HK-1 - H0) + var(H2K-1 HK)

- 2 {cov(HK-1, H2K-1) - cov(HK-1, HK) - cov(H0, H2K-1)
+ cov(H0, HK)}
(26)
The first two variance terms in Eq. 26 can be directly read from the (H) matrix. Unlike
in the case of (G) the covariance terms cannot be read directly from the covariance
matrix which is computed anew for the expectation values.
The estimator of uncertainty for equilibrium expectation values is given by Eq. 16 in the
paper by Shirts and Chodera.45 It has the following form

2
)
2 A cov(c A / ca , c A / ca ) A 2 (
AA
aa
Aa
Using Eq. 10 in the paper by Shirts and Chodera45 we can derive an expression for the
covariance terms of the form cov(cA/ca, cB/cb).

)
cov(c A / ca , cB / cb ) A B (
AB
Ab
ab
aB
(27)
The four covariance terms in Eq. 26 can be calculated using the above relationship. Thus
we can have (H (H)) for a pair of simulation input sets. For very strict
simulation parameters, these should also converge to a constant value.
4.1.3 Enthalpy of vaporization of water calculated at new set of PME parameters.

We now demonstrate the use of the equilibrium expectation value of the enthalpy at the
final state of methane solvation to compute the molar enthalpy of vaporization of water at
300K. The final state in methane solvation represents a ghost united atom (UA) methane
sphere floating in a water box. There is no potential energy associated with the UA
70
methane sphere in the final state, it has only kinetic energy. If we could separate the
kinetic energy associated with the ghost sphere from the total enthalpy of the box, all we
are left with is the enthalpy of water at 300K. The kinetic energy of a united atom
methane sphere can be approximated by using the equipartition theorem and the degrees
of freedom associated with the sphere. There are three degrees of freedom for a sphere
since it has 3 translational degrees of freedom, one in each x, y, and z direction.
Therefore K.E of a UA methane sphere is 1.5kbT, where kb is the Boltzmann constant
(0.00831657534 kJ/mol/K) and T is the temperature i.e. 300K. We are interested in
finding the enthalpy of vaporization at a new simulation input set Si2. Therefore we
choose to work with the enthalpy of the final state corresponding to Si2 which is 2K-1.
H( 893 water molecules @ 300K) = H2K-1 3/2 x kb x 300
(28)
H( 893 water molecules @ 300K) enthalpy corresponds to the energy of 893 water molecules so we
need to divide this enthalpy by the number of water molecules to make it independent of
system size.
Hwater @ 300K = ( H2K-1 3/2 x kb x 300) / 893
(29)
Enthalpy of vaporization is equal to the amount of enthalpy change from the initial liquid
state (i.e. Hwater @ 300K ) to the final gas state. If we assume ideal gas conditions, i.e. no
potential energy, then we can write the enthalpy of the gas state as a sum of kinetic
energy and the PV (pressure times the volume of the simulation box) term. Using the
equipartition theorem and the degrees of freedom for water molecule, we can write the
kinetic energy of water molecules at 300 K. A water molecule has 6 degrees of freedom,
3 of which are translational in the x, y, and z directions and the other 3 are rotational
71
about the x, y, and z axes. For a mole of ideal gas we can write PV = RT. Here R is the
gas constant and T is the temperature (300K).
Hvap = Hgas @300 Hliquid @300 = RT+ 6/2 x kb x 300 - Hwater @ 300K
= RT+ 6/2 x kb x 300 - ( H2K-1 3/2 x kb x 300) / 893
(30)
The error in Hvap can be estimated by the following expression:

var(Hvap) = var(RT+ 6/2 x kb x 300 - ( H2K-1 3/2 x kb x 300) / 893)
= var(H2K-1 / 893) since all other terms are constants
(Hvap) = ( var(H2K-1 / 893)
= (H2K-1) / 893
(31)
In this way we can calculate the Hvap(Hvap) for a new set of simulation inputs. Like
(G (G)) and (H (H)), Hvap(Hvap) should also converge to a value
within error at strict simulation parameters. It is possible that a set of simulation inputs
can yield converged (G (G)) with low computational cost but inaccurate
enthalpy estimates. We need simulation parameters which predict Hvap to within
reasonable (Hvap), so that we not only have reliable estimates of free energy but also
reliable estimates of enthalpy to get the bulk properties right, incurring minimal
computational costs.
We have studied 4 PME parameters Coulomb cut-off, Fourier spacing, Ewald tolerance
and, order of interpolation of beta spline. Later we will extend our study to other factors
such as switching distance and cut-off type. We have chosen parameters covering the
72
range from computationally cheap but inaccurate to accurate but computationally

expensive. Thus, when we vary the simulation inputs using these parameters, we hope
there will be a set where there exists a perfect balance between the accuracy of free
energy estimate and the computational cost.
Simulations performed in Chapter 3 to compare the free energy estimators, all had the
same PME parameters. A switch at 0.88 nm with a Coulomb cutoff of 0.9nm was
applied. A fourth order beta spline interpolation was used with a Fourier spacing of 0.12
nm and Ewald tolerance of 1e-08. We will refer this PME set as Si1. The energies from
these simulations form the top left quarter of the three dimensional U matrix defined in
Eq.18. We choose a new set of PME parameters from the set of values, given in Table 8.
For example, in methane solvation a strict coulomb cut off of 0.6 nm, with Fourier
spacing of 0.2 nm, Ewald tolerance of 1E-06 and order of beta spline interpolation of 4 is
a set of new simulation parameters. Let us call this simulation set Si2. We do not run new
equilibrium simulations at this set of input parameters but rather evaluate energies using
the trajectories from the set of 100 equilibrium simulations performed in Chapter 3. The
energies form the top right corner of the U matrix in Eq.18. We can now evaluate (G
(G))Si1-Si2 and (H (H))Si1-Si2 and
(Hvap(Hvap))Si2 using the protocols
described in section 4.1.1 , 4.1.2 and 4.1.3.
We could repeat the above process for other sets of simulation parameters as well. Table
8 shows that there are 1400 different sets with different permutation and combinations of
73
the PME parameters. For dipole inversion and anthracene solvation we have 1800
different sets of simulation inputs.
Table 8. PME parameters used to generate input sets.

Methane Solvation
order of interpolation
Ewald Tolernace
1.E-10 1.E-08
1.E-06
1.E-04
1.E-02
Fourier spacing (nm)
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
Coulomb cut off (nm)
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
Total number of simulation input sets
0.20
1440
Dipole inversion
Ewald Tolernace
1.E-10 1.E-08
1.E-06
1.E-04
0.045
0.065
0.085
0.105
0.6
0.7
0.8
0.9
1.E-02
0.125 0.145 0.165 0.185 0.205
1.0
1.1
1.2
1.3
1.4
1.5
1800
Anthracene solvation
Ewald Tolernace
1.E-10 1.E-08
1.E-06
1.E-04
0.040
0.060
0.080
0.100
0.6
0.7
0.8
0.9
1.E-02
0.120 0.140 0.160 0.188 0.200
1.0
1.1
1.2
1.3
1.4
1.5
1800
We use strict Coulomb cutoff instead of switched cutoff in our parameter search. The
switching distance would have introduced an extra dimension in our existing parameter
space. This would have increased the computational burden. Instead we chose to first find
the best set of parameters in 4 dimensions, i.e. for a given accuracy in the free energy and
the enthalpy estimates which set results in the minimum simulation time in a smaller
74
parameter space which varies in only four dimensions instead of five. Then we can
extend our study to find the best switching distance and the cutoff strategy, i.e. if we want
to use a shift function or a switch function to smoothly zero the potential at the cut-off
length. We could similarly extend our study to the van der Waal cut off length and the
van der Waal switching/shifting distance. For now we show the application of the test set
and MBAR to search the PME parameter space consisting of strict Coulomb cut off,
Fourier spacing, Ewald tolerance, and order of interpolation.
4.2 Results and Discussion

Figure 14 shows (G (G)) for methane solvation as a function of different PME
parameters. Each subplot
shows variation (G (G)) with respect to cut-off
distance for a given Fourier spacing. Each new row contains plots for a different Ewald
tolerance. Each new column contains plots for a different order of interpolation. Plots
with order of interpolation of 3 and the plots with Ewald tolerances of 1E-02, 1E-10 and
1E-08 are noisy and do not converge well. Even orders of interpolation 4 and 6 perform
better than the odd orders of interpolation 3 and 5. Order of interpolation of 4 will be
better than that of 6 as higher order of interpolation results in higher computational cost.
The subplot at row 2, column 2 with order of interpolation of 4 and a Ewald tolerance of
1e-04 has the best convergence amongst all subplots. In this subplot, free energy
estimates even with high Fourier spacing (0.2 nm) and low cut off (0.8 nm) are sub 0.5
kJ/mol. G (G)) for strict parameters, highest cut off (1.3 nm), finest Fourier
spacing (0.04 nm) converge at -0.208 0.032 kJ/mol.
75
Figure 14. Accuracy in methane solvation free energy estimates as a function of different
PME parameters.
Figure 14 has been corrected for this constant bias such that convergence happens at
0kJ/mol. This is better seen when the subplot at row2, column 2 with order 4 and ewald
76
tolerace 1E-4 is expanded in Figure 20. We see that as we reduce the cut off the estimates
become noisier, i.e. have high uncertainty (G)). One obvious reason is that the phase
space associated with small cutoffs and large Fourier spacings are very different from the
phase space associated with the simulation parameters used in equilibrium simulations.
Since these have poor phase space overlap, this indicates that these parameters are not at
all good for simulation. To test this hypothesis we need to first have a measure of extent
of overlap. The paper by Wu and Kofke41 explains the measurement of phase space
overlap between two states using (a) total energy distribution method and (b) relative
entropy measurements. We have used method (a) to visualize the extent of overlap. We
plot the distribution of total energies of initial state (=0) for methane solvation at 4
different PME parameter sets containing 4 parameters [order of interpolation, Ewald Tol.,
Fourier Spacing, Cut-off ].
Figure 15. Inaccuracy in free energy estimates increases with increasing phase space
overlap.
77
Figure 15 contains histograms of total energies of Methane solvation initial state. Only
the black and the red curves show a good overlap and the corresponding Gs, 0.2018
and 0.212 kJ/mol are much smaller to the green and the blue curves which have very low
overlap with the black and red curves. Gs corresponding to the green and the blue
curves are very big, 31.623 and -18.800 kJ/mol, hence these PME parameters are not at
all good for simulations.
We can also calculate the mean square error in G (G)). The deviation of G
from the converged value will give the bias and we know the uncertainty (G)).
Therefore we can just square the bias and add it to the square of the uncertainty to get the
mean square error.
MSEG = (G- (converged val. of G) )2 + ((G)))2
MSEG can serve as a measure of reliability of the parameter sets to predict accurate and
precise free energy estimates.
Figure 16 shows Hvap(Hvap) as a function of different PME parameters. Figure 17 shows

(H (H)) for methane solvation as a function of PME parameters for methane
solvation. Figure 18 shows computational costs (hr/ns) as a function of different PME
parameters for methane solvation. Figures 16, 17 and 18 share the same layout as Figure
14. In Figure 16 we see that Hvap(Hvap) has the best convergence for subplot (2,2) with
order of interpolation 4 and ewald tolerance 1E-4. An expanded plot for this subplot is
available in Figure 21. Bias as well as noise in the Hvap estimates increase as the cutoff
decrease and when the fourier spacing increase. For strict parameters, Hvap(Hvap)
78
converge at 42.452 0.089 kJ/mol. The experimental value of the molar enthalpy of
vaporization is 40.79 kJ/mol at 300K. The bias from the experimental results can be
explained in the light of approximations we made while calculating Hvap using
simulations. We had assumed ideal gas behavior for gas phase and approximated the
velocities using the equipartition theorem and the degrees of freedom of water molecules.
Similar to MSEG we can also calculate MSEHvap. The deviation of estimated Hvap from
the converged values gives the bias. Squaring and adding the bias and the uncertainty will
give us the MSEHvap .
MSEHvap = (Hvap (converged val. of Hvap))2 + ((Hvap))2
MSEHvap predicts the reliability of the parameter sets to predict accurate and precise molar
enthalpy of vaporization of water. This also ensures that the parameters are good enough
to simulate the right bulk properties.
H estimates for methane solvation in Figure 17 are noisier compared to estimates in

Figures 14 and 16. We cannot really make out much from these plots. However we can
see a distinct difference in the convergence characteristics of the sub plot (2,2) with order
of interpolation 4 and ewald tolerance 1E-4 with the best convergence.
79
Figure 16. Molar enthalpy of vaporization of water as a function of different PME

parameters.
80
Figure 17. Accuracy in methane solvation enthalpy estimates as a function of different

PME parameters.
81
Figure 18. Simulation time for methane solvation as a function of different PME
parameters.
82
G estimates of methane solvation vary for different Fourier spacings and cutoffs
converge.
Hvap estimates vary for different Fourier spacings and cutoffs converge.
83
simulation time for methane solvation varies for different Fourier spacings and cutoffs.
Simulation time increases with decreasing Fourier spacing as more and more wave
vectors are included in the calculation of the reciprocal sum in Fourier space. The
difference in time consumption is relatively small, not more than 0.5 hr/ns, for Fourier
spacings 0.12 nm to 0.20 nm at same cutoff. But the computational cost significantly
increases for Fourier spacing below 0.10 nm. Time consumption also increases with
increasing cut-off as number of particles increase in the calculation of the direct sum in
real space. Time consumption increases slowly till a cutoff of 0.9 nm but then increases
rapidly with increasing cutoff. Using a MATLAB script we searched for the parameters
which resulted in smallest simulation time for a given RMSE. We chose only RMSE G
and RMSEHvap for our analysis as H estimates have high uncertainties. For RMSEG
84
and RMSEHvap less than 0.5 kJ/mol we found 5 possible candidates of parameter sets as
given in Table 9.
Table 9. Lead parameter set for methane solvation.
H
MSE Hvap
MSEG
kJ/mol
kJ/mol
0.125
0.173
1.934
0.100
0.072
0.115
4
5
Time
Order
Ew.
F.Sp.
Cut-off
hr/ns
Inter.
Tol.
nm
nm
0.051
4.341
1E-4
0.12
0.8
-0.621
0.155
4.514
1E-4
0.12
0.9
0.139
-3.710
0.347
4.688
1E-4
0.12
0.8
0.095
0.034
0.272
0.212
4.688
1E-4
0.12
0.8
0.091
0.034
-2.005
0.220
4.688
1E-4
0.14
0.9
Rank
kJ/mol kJ/mol
Similar analysis was done for Dipole inversion sand Anthracene solvation.
Table 10. Lead parameter sets for dipole inversion.
MSEG
Time
Order
Ew.
F.Sp.
Cut-off
kJ/mol
kJ/mol
kJ/mol
hr/ns
Inter.
Tol.
nm
nm
3.600
98.123
-36.609
7.576
1e-4
0.145
0.8
3.698
98.562
-37.000
7.828
1e-4
0.145
0.8
3.553
97.838
-36.312
8.586
1e-6
0.145
1.0
3.563
97.669
-36.109
9.217
1e-6
0.105
0.9
3.640
98.341
-36.784
9.217
1e-4
0.185
0.8
Rank
85
Table 11. Lead parameter sets for anthracene solvation.

MSEG
Time
Order
Ew.
F.Sp.
Cut-off
kJ/mol
kJ/mol
kJ/mol
hr/ns
Inter.
Tol.
Nm
nm
0.052
15.998
-1.622
8.964
1e-4
0.12
0.8
0.277
19.004
-1.941
9.343
1e-4
0.1
0.8
0.140
13.128
-1.529
9.596
1e-4
0.14
0.8
0.064
13.849
-1.608
9.722
1e-4
0.14
0.9
0.085
15.726
-1.746
9.974
1e-4
0.1
0.9
Rank
The lead candidates parameter sets involve order of interpolation of 3, but we would
discard these leads because subplots with order of interpolation 3 did not have good
convergence. For an order of interpolation of 4 the parameter set which has lowest
MSEG, a minimum difference between G and H or a minimum H, if H is
not well converged as in the case of anthracene solvation and dipole inversion, would be
the best parameter set . Using this rule rank 4 in Table 9 would be the set of parameters
with the fastest simulation time for methane solvation. For dipole inversion parameter set
with rank 2 is fastest. But dipole inversion, simulations with cut-off of 0.8 nm fail to run
due to a GROMACS error with twin range cut-off's and SHAKE the virial and the
pressures are incorrect. Therefore parameter sets with order of interpolation 4, ewald
tolerance 1E-4, fourier spacing of 0.14 nm and cut-off 0.9 is predicted as the next best in
line. Similarly for anthracene solvation, rank 4 in Table 11 would be the best set of
parameters.
86
We checked whether we get the same G as predicted for the lead parameters by doing
equilibrium simulation for methane solvation. The simulations were run for 6 ns and
samples corresponding to first 1.5 ns were discarded to account for equilibration. We
have compared the free energy estimates from the equilibrium simulations done at order
of interpolation of 4, an Ewald tolerance of 1E-8, fourier spacing of 0.12 nm and a cutoff
of 0.9 nm with switch starting at 0.88 nm and our leads in methane solvation: order of
interpolation of 4, an Ewald tolerance of 1E-4 , fourier spacing of 0.12 nm and a strict
cutoff of 0.8 nm as well as 0.9 nm.
Table 12. Comparison of lead parameter sets with the parameter set used in simulation
done to test free energy estimators for methane solvation.
G
Parameter set
G(G)
Hvap(Hvap)
(Pred.)
Time
[Order, E.Tol, F.Sp., Cut, Switch]
kJ/mol
kJ/mol
(Actual)
(hr/ns)
kJ/mol
Set used for sim. in chapter 3
9.0930.096
42.4340.083
3.328
[ 4, 1E-8, 0.12 nm, 0.9nm, 0.88nm ]

Lead 1
0.212
8.8000.095
42.3830.090
2.483
[ 4, 1E-4, 0.12 nm, 0.8nm, 0nm ]
0.293
Lead 2
0.204
9.0190.096
[ 4, 1E-4, 0.12 nm, 0.9nm, 0nm ]
42.4340.089
2.779
0.074
87
In Table 12 we have reported G which was predicted using the formalism explained in
the section 4.1.1 along with the G which was evaluated by doing equilibrium
simulations using the lead PME parameter sets. We see that the lead parameter set 1 of
Table 12, has pretty good agreement between actual and predicted G but lead 2 G
prediction suffers from overestimation. The last column has the time consumption
corresponding to a parameter set. These times are calculated using a small 10 ps runs for
the all the three sets using the same starting configuration using the latest version of
GROMACS V4.5.3 run on a single core of Intel Xenon 5410 processor. This is done to
remove any mismatch
in simulating conditions which can arise due to starting
configuration, version of code and processing power of the machine. The comparison of
simulation times in Table 12 cannot be done with those in Table 9 because (a) they were
generated using single point simulations i.e. just a single step (b) they have different
starting configuration and were run with an older version of the GROMACS code on a
different machine. Lead 1 parameter set is 25% faster than parameter set used in
simulation in chapter 3. We gain speed with a minimum loss in accuracy in either free
energy estimate or enthalpy of vaporization of water. Lead 2 parameter set is only 16%
faster compared to parameter set used in simulation in chapter 3 set as opposed but has
enhanced accuracy in free energy estimate compared to lead 1 parameter set i.e. fastest
parameter set.
4.3 Conclusions
We have used the benchmark test set and free energy estimates calculated using MBAR
as tools for exploring the PME parameter space. This study provides an alternate method
88
to select PME parameters. An order of interpolation of 4, Ewald tolerance of 1E-4, a

Fourier spacing of 0.12 nm and cut-off of 0.9 nm seems to be an optimal (fast yet
accurate) choice of PME parameters for simulations involving large scale water
rearrangements (dipole inversion) as well as transformation which involve growth or
deletion of multiple atomic sites. The protocols discussed in sections 4.1.1, 4.1.2 and
4.1.3 are not only applicable for PME parameters but also for other simulation inputs. For
example, we could tune van der Waal cut off and switching distances. We could also tune
force fields parameters to get correct bulk properties.
89
References:
(1)
Davies, J. W. Glick, M.; Jenkins, J. L. Curr. Opin. Chem. Biol. 2006, 10, 343-
351.
(2)
Merz, K. M. Ringe, D.; Reynolds, C. H. Drug Design: Structure- and Ligand-
Based Approaches; Cambridge University Press, 2010.

(3)
Mobley, D.; Dill, K. Structure. 2009, 17, 489-498.
(4)
Bai, H.; Lai, L. Acta Physico-Chimica Sinica 2010, 26, 1988-1997.
(5)
Deng, Y.; Roux, B. J. Phys. Chem. B 2009, 113, 2234-2246.
(6)
Huang, N. Kalyanaraman, C. Bernacki, K.; Jacobson, M. P. Phys. Chem. Chem.
Phys. 2006, 8, 5166-5177.

(7)
Zamolo, L. Salvalaglio, M. Cavallotti, C. Galarza, B. Sadler, C. Williams, S.
Hofer, S. Horak, J.; Lindner, W. J. Phys. Chem. B 2010, 114, 9367-9380.

(8)
Duren, T. Bae, Y.; Snurr, R. Chem. Soc. Rev. 2009, 38, 1237-1247.
(9)
Nel, A. E. Madler, L. Velegol, D. Xia, T. Hoek, E. M. V. Somasundaran, P.
Klaessig, F. Castranova, V.; Thompson, M. Nat. Mater. 2009, 8, 543-557.

(10)
Liong, M. Lu, J. Kovochich, M. Xia, T. Ruehm, S. G. Nel, A. E. Tamanoi, F.;
Zink, J. I. ACS Nano 2008, 2, 889-896.

(11)
Shi, X. Wang, S. Shen, M. Antwerp, M. Chen, X. Li, C. Petersen, E. Huang, Q.
Weber, W.; Baker, J. Biomacromolecules 2009, 10, 1744-1750.

(12)
Nishide, H. Advanced Nanomaterials; Wiley-VCH, 2010.
(13)
Yuan, H.; Zhang, S. Appl. Phys. Lett. 2010, 96, 033704.
90
(14)
Fisher, E. H.; Rhodes, N. ARCHIVE: Proceedings of the Institution of Mechanical
Engineers, Part C: Journal of Mechanical Engineering Science 1989-1996 (vols 203210) 1996, 210, 91-94.
(15)
Roache, P. J. Annu. Rev. Fluid Mech. 1997, 29, 123-160.
(16)
Oberkampf, W. L.; Trucano, T. G. Prog. Aerosp. Sci. 2002, 38, 209-272.
(17)
Johnson, F. Tinoco, E.; Yu, N. Comput. Fluids. 2005, 34, 1115-1151.
(18)
Kellar, W. Savill, A.; Dawes, W. High-Performance Computing And Networking,
Proceedings 1999, 1593, 90-98.

(19)
del lamo, J. C. Marsden, A. L.; Lasherasa, J. C. Rev. Esp. Cardiol. 2009, 62,
781-805.
(20)
Botar, C. C. Vasile, T. Sfrangeu, S. Clichici, S. Agachi, P. S. Badea, R. Mircea,
P.; Cristea, M. V. In 20th European Symposium on Computer Aided Process

Engineering; Elsevier, 2010; Vol. Volume 28, pp. 205-210.
(21)
Sampson, B. Prof. Eng. 2007, 20, 37-37.
(22)
Oberkampf, W.; Trucano, T. Nucl. Eng. Des. 2008, 238, 716-743.
(23)
Historical Perspective and Current Outlook for Molecular Dynamics As a
Chemical Engineering Tool - Industrial & Engineering Chemistry Research (ACS

Publications).
(24)
Cencek, W.; Szalewicz, K. Int. J. Quantum Chem. 2008, 108, 2191-2198.
(25)
Gurkan, B. Goodrich, B. Mindrup, E. Ficke, L. Massel, M. Seo, S. Senftle, T. Wu,
H. Glaser, M. Shah, J. Maginn, E. Brennecke, J.; Schneider, W. J. Phys. Chem. Lett.

2010, 1, 3494-3499.
91
(26)
A Report of the National Science Foundation Blue Ribbon Panel on Simulation-
Based Engineering Science: Revolutionizing Engineering Science through Simulation,

National Science Foundation. 2006.
(27)
WTEC Panel Report on International Assessment of Research and Development
in Simulation-Based Engineering and Science. 2009.

(28)
Shirts, M. R.; Pande, V. S. J. Chem. Phys. 2005, 122, 144107.
(29)
Ytreberg, F. M. Swendsen, R. H.; Zuckerman, D. M. J. Chem. Phys. 2006, 125,
184114.
(30)
Efron, B.; Tibshirani, R. J. An Introduction to the Bootstrap; 1st ed. Chapman and
Hall/CRC, 1994.
(31)
Abraham, M. J.; Gready, J. E. J. Comput. Chem. 2011, 32, 2031-2040.
(32)
Wang, H. Dommert, F.; Holm, C. J. Chem. Phys. 2010, 133, 034117.
(33)
Petersen, H. G. J. Chem. Phys. 1995, 103, 3668.
(34)
Jorgensen, W. L.; Tirado-Rives, J. J. Am. Chem. Soc. 1988, 110, 1657-1666.
(35)
Hunenberger, P. H.; McCammon, J. A. J. Chem. Phys. 1999, 110, 1856.
(36)
Pitera, J.; Van Gunsteren, W. Mol. Simulat. 2002, 28, 45-65.
(37)
Bruckner, S.; Boresch, S. J. Comput. Chem. 2011, 32, 1320-1333.
(38)
Resat, H.; Mezei, M. J. Chem. Phys. 1993, 99, 6052.
(39)
Jorge, M. Garrido, N. M. Queimada, A. J. Economou, I. G.; Macedo, E. A. J.
Chem. Theory Comput. 2010, 6, 1018-1027.

(40)
Zwanzig, R. W. J. Chem. Phys. 1955, 23, 1915.
(41)
Wu, D.; Kofke, D. A. J. Chem. Phys. 2005, 123, 054103.
(42)
Bennett, C. H. J. Comput. Phys. 1976, 22, 245-268.

92
(43)
Barker, A. Aust. J. Phys. 1965, 18, 119-134.
(44)
Fenwick, M. K.; Escobedo, F. A. J. Chem. Phys. 2004, 120, 3066.
(45)
Shirts, M. R.; Chodera, J. D. J. Chem. Phys. 2008, 129, 124105.
(46)
Lelievre, T. Stoltz, G.; Rousset, M. Free Energy Computations: A Mathematical
Perspective; 1st ed. Imperial College Press, 2010.

(47)
Schuttelkopf, A.; van Aalten, D. Acta Crystallopgr. D. 2004, 60, 1355-1363.
(48)
OMEGA | OpenEye Scientific Software.
(49)
acpype - Project Hosting on Google Code.
(50)
Xiao, Y.; Li, D. Math. Method Oper. Res. 2008, 67, 443-454.
(51)
Allen, M. P.; Tildesley, D. J. Computer Simulation of Liquids; New Ed.
Clarendon Press, 1989.

(52)
Wu, X.; Brooks, B. R. Chem. Phys. Lett. 2003, 381, 512-518.
(53)
Hess, B. J. Chem. Phys. 2002, 116, 209.
(54)
Martyna, G. J. Tuckerman, M. E. Tobias, D. J.; Klein, M. L. Mol. Phys. 1996, 87,
1117.
(55)
Kofke, D. Mol. Phys. 2006, 104, 3701-3708.
(56)
Hummer, G. Pratt, L. R.; Garca, A. E. J. Phys. Chem. 1996, 100, 1206-1215.
(57)
Scott, D. W. Biometrika. 1979, 66, 605 -610.
(58)
Darden, T. York, D.; Pedersen, L. J. Chem. Phys. 1993, 98, 10089.
(59)
Schlick, T. Molecular Modeling and Simulation; 1st ed. Springer, 2002.
93
Appendix:
A1 Thermodynamic integration using cubic splines

We fit the ( U()/ )i vs. i curve piecewise to a series of cubic polynomials Si():
Si ( ) ai bi ( i ) ci ( i )2 di ( i )3 1 i K 1
(A1)
Here K is the total number of intermediate states, creating K-1 intervals for splining. Each
spline of a given interval has its own set of coefficients ai, bi, ci and di which can be
computed by standard linear algebra methods using the conditions that define a natural
cubic spline.
Si (i ) ai (U ( ) / )i
Si (i1 ) (U ( ) / )i1
1 i K 1
1 i K 1
Si' (i1 ) Si'1 (i1 ) 1 i K 2
(A2)
(A3)
(A4)
Si'' (i1 ) Si''1 (i1 ) 1 i K 2
(A5)
S1'' (1 ) 0
(A6)
S K'' 1 (K ) 0
(A7)
When we integrate piece wise over all the intervals we get the total free energy change.
K 1 i 1
G d Si ( )
(A8)
bi
c
d
(i1 i ) 2 i (i1 i )3 i (i1 i ) 4
2
3
4
(A9)
i 1 i
K 1
G ai (i1 i )
i 1
94
We need to write G in the form described in Eq. 3, as a weighted sum of ( U()/ )i

at each , so that we can propagate the error using Eq. 4. We must solve for the
coefficients ai, bi, ci, di in such a way such that they be expressed as a linear weighted
sum of individual ( U()/ )i.
a1 A1,1 A1,K (U ( ) / )1
aK 1 AK 1,1 AK 1,K
(U ( ) / ) K
(A10)
Here Ai,j are the weights in a weight matrix for ai. Similarly bi, ci, and di are expressed as
linear weighted sums of ( U()/
)i with Bi,j, Ci,j and Di,j as the weights in the
respective K-1 x K matrices. There exist a unique solution for a, b , c and d. The A, B, C
and D matrices are all of rank K-1 and are invertible.
We can then finally combine these into a single weight matrix.
Let us say hi = i+1 - i
Then Eq. A9 can be written as a linear weighted sum.
K 1 K
h2
h3
h4
G hi Ai , j i Bi , j i Ci , j i Di , j (U ( ) / ) j
2
3
4
i 1 j 1
(A11)
Eq. A11 can be further written as:

K 1 K
G Wi , j (U ( ) / ) j
i 1 j 1
(A12)
Here
Wi , j hi Ai , j
hi2
h3
h4
Bi , j i Ci , j i Di , j
2
3
4
95
(A13)
Once we have the weights we can calculate the overall free energy change using Eq. A12
and the uncertainty estimate using the following equation.
K 1 K
102 Wi ,2j 2j
(A14)
i 1 j 1
An implementation of this weighting for TI using GROMACS is included in the

examples section of the pymbar distribution at https://simtk.org/home/pymbar.
96
A2 Supplementary information

(G), sample standard deviation (G), and bootstrap (G)bs) for dipole inversion
(G)
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
-0.029
0.2160.002
0.2010.014
0.2160.012
7.47.4
TI3
-0.030
0.2190.002
0.2070.013
0.2200.012
6.06.9
DEXP
-0.001
0.5920.093
0.6480.041
0.5790.083
-8.615.5
IEXP
0.058
0.6290.120
0.6770.040
0.6170.111
-7.118.5
UBAR
-0.004
0.5400.092
0.6010.044
0.5300.083
-10.316.6
BAR
-0.035
0.1710.002
0.2040.016
0.2190.012
-16.16.6
RBAR
-0.034
0.1720.001
0.2060.016
0.2190.012
-16.66.7
MBAR
-0.034
0.2170.002
0.2020.016
0.2180.012
-7.28.4
GDEL
-0.200
0.3710.004
0.3890.029
0.3720.021
-4.77.1
GINS
0.139
0.3700.004
0.3740.025
0.3680.021
-1.06.7
97

(G), sample standard deviation (G), bootstrap (G)bs) for dipole inversion
using the sparse set. All quantities are in kJ/mol.
(G)
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
-0.021
0.278 0.004
0.257 0.017
0.277 0.016
8.2 7.1
TI3
-0.023
0.297 0.005
0.278 0.017
0.297 0.017
7.1 6.6
DEXP
-0.394
2.048 0.497
3.305 0.231
2.287 0.768
-38.0 15.7
IEXP
0.425
2.052 0.471
3.239 0.229
2.262 0.716
-36.7 15.2
UBAR
-0.377
2.024 0.487
3.230 0.193
2.245 0.737
-37.3 15.5
BAR
0.000
0.367 0.005
0.363 0.026
0.383 0.020
1.2 7.5
RBAR
0.002
0.370 0.005
0.368 0.026
0.387 0.026
0.5 7.2
MBAR
-0.011
0.384 0.005
0.355 0.026
0.383 0.020
8.0 7.9
GDEL
-0.082
0.935 0.020
0.894 0.057
0.925 0.051
4.6 7.0
GINS
0.055
0.762 0.014
0.833 0.067
0.773 0.041
-8.5 7.5
98

(G)
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
-6.433
0.1780.003
0.2080.018
0.1780.010
-14.37.7
TI3
-6.454
0.1820.003
0.2140.016
0.1820.010
-15.16.7
DEXP
-6.508
0.4320.172
0.5240.053
0.4250.164
-17.633.9
IEXP
-6.506
0.2110.005
0.2360.020
0.2090.011
-10.97.9
UBAR
-6.473
0.1550.002
0.2160.019
0.1860.009
-28.46.5
BAR
-6.531
0.1280.002
0.2070.018
0.1750.009
-38.45.3
RBAR
-6.529
0.1280.002
0.2080.017
0.1750.009
-38.55.2
MBAR
-6.517
0.1710.002
0.1990.018
0.1710.009
-13.87.9
GDEL
-9.957
0.1860.004
0.2380.019
0.2190.013
-22.26.4
GINS
-14.631
0.3330.006
0.3180.022
0.3020.018
4.77.4
99

using sparse set. All quantities are in kJ/mol.
(G)
(G)
(G)bs
% dev
Method
((G))
((G))
((G)bs)
(% dev)
TI
-6.841
0.292 0.007
0.302 0.021
0.292 0.014
-3.3 7.0
TI3
-6.237
0.288 0.006
0.307 0.023
0.288 0.014
-6.2 7.3
DEXP
-12.399
1.774 0.581
3.652 0.387
2.057 1.162
-51.4 16.7
IEXP
-5.571
1.267 0.332
1.682 0.134
1.373 0.503
-24.7 20.6
UBAR
-6.221
0.828 0.084
0.808 0.050
0.847 0.088
2.5 12.1
BAR
-6.436
0.335 0.008
0.361 0.028
0.357 0.019
-7.1 7.4
RBAR
-6.439
0.338 0.008
0.362 0.027
0.356 0.026
-6.7 7.3
MBAR
-6.439
0.357 0.008
0.357 0.028
0.357 0.019
0.1 8.2
GDEL
-25.257
0.315 0.009
0.364 0.027
0.402 0.026
-13.3 6.8
GINS
-201.380
4.390 0.146
6.016 0.396
6.462 0.729
-27.0 5.4
100
Table S 5. Free energy estimated using large number of samples and large number of
intermediate states for dipole inversion.
((G)
((G)
((G)
Method
(G))450,full
(G))450,sp
(G))51
TI
-0.0300.022
-0.0230.027
-0.1280.093
TI3
-0.0290.022
-0.0250.029
-0.1230.093
DEXP
-0.0120.063
-0.5660.632
-0.1130.094
IEXP
0.0670.069
0.4400.455
-0.1610.101
UBAR
-0.0040.057
-0.3320.585
-0.1300.092
BAR
-0.0320.022
-0.0010.038
-0.1300.092
RBAR
-0.0320.022
0.0090.038
-0.1440.093
MBAR
-0.0300.022
-0.0050.038
-0.1300.092
GDEL
-0.0560.036
-0.1010.097
-0.1060.094
GINS
-0.0020.035
0.0890.075
-0.1520.101
101
Table S 6. Bias in free energy estimates due number of samples and number of
intermediate states for dipole inversion are presented here. None of the methods show
significantly large bias for this molecular test set.
(Bias1
Method (Bias1))450,full
(Bias2
(Bias1
(Bias2
(Bias2))51,full
(Bias1))450,sp
(Bias2))51,sp
TI
-0.0010.031
0.0970.095
0.0010.039
0.1060.097
TI3
-0.0060.031
0.0880.095
0.0010.041
0.0980.097
DEXP
0.0020.087
0.1030.112
0.1670.667
-0.2850.231
IEXP
-0.0030.094
0.2250.120
-0.0100.501
0.5920.234
UBAR
-0.0120.079
0.1140.107
-0.0160.621
-0.2180.228
BAR
0.0010.028
0.0980.094
0.0000.053
0.1290.099
RBAR
-0.0020.028
0.1100.094
-0.0130.053
0.1410.100
MBAR
-0.0020.031
0.0980.095
-0.0020.054
0.1230.100
GDEL
0.0070.053
0.0580.102
0.0160.135
0.0210.133
GINS
-0.0130.050
0.1370.107
-0.0360.107
0.2050.127
102
Table S 7. RMSEs and statistical uncertainties in RMSEs in free energy change of dipole
inversion of IEXP, DEXP, UBAR GDEL, and GINS are significantly higher (indicating
lower reliability) compared to other methods specifically in the sparse set.
((RMSE1)
((RMSE2)
((RMSE1)
((RMSE2)
Method (RMSE1))full (RMSE2))full (RMSE1))sp (RMSE2))sp

TI
0.2840.080
0.2970.091
0.3680.097
0.3780.116
TI3
0.2900.082
0.3020.090
0.3950.104
0.4040.117
DEXP
0.8340.281
0.8390.284
3.5911.615
3.5971.624
IEXP
0.8870.290
0.9060.318
3.4261.769
3.4591.799
UBAR
0.7700.273
0.7790.277
3.5321.545
3.5361.551
BAR
0.2510.094
0.2650.103
0.4940.150
0.5030.172
RBAR
0.2520.093
0.2700.106
0.5000.151
0.5120.177
MBAR
0.2840.082
0.2980.092
0.5060.144
0.5140.166
GDEL
0.5250.151
0.5260.156
1.2500.346
1.2500.347
GINS
0.4850.146
0.4990.164
1.0690.366
1.0760.403
103
Table S 8. Free energies estimated u large number of samples and large number of
intermediates states for anthracene hydration free energy.
((G)
((G)
((G)
Method
(G))450,full
(G))450,sp
(G))51,full
TI
-6.4220.016
-6.8620.028
-6.3060.100
TI3
-6.4430.016
-6.2580.027
-6.2850.100
DEXP
-6.4330.048
-7.6911.700
-6.3670.113
IEXP
-6.5270.019
-6.4190.266
-6.1920.096
UBAR
-6.4680.016
-6.3030.089
-6.2420.098
BAR
-6.5260.016
-6.4820.033
-6.2760.099
RBAR
-6.5240.016
-6.4840.033
-6.2730.100
MBAR
-6.5120.015
-6.4830.033
-6.2370.095
GDEL
-9.9300.020
-25.2390.041
-6.7110.110
GINS
-14.6520.030
-201.7680.681
-6.5950.096
104
Table S 9. GDEL and GINS have largest bias in free energy estimates for anthracene
solvation due to number of samples and number of intermediate states even for full set.
For sparse set GDEL and GINS have even larger bias compared to full set. DEXP and
IEXP also show significantly large biases in sparse set.
(Bias1
(Bias2
(Bias1
(Bias2
Method (Bias1))450,full
(Bias2))51,full
(Bias1))450,sp
(Bias2))51,sp
TI
-0.0090.024
-0.1250.102
0.0210.040
-0.5350.104
TI3
-0.0100.024
-0.1680.102
0.0210.039
0.0480.104
DEXP
-0.0790.067
-0.1450.122
-4.7081.710
-6.0320.218
IEXP
0.0190.029
-0.3160.098
0.8480.296
0.6210.162
UBAR
-0.0050.022
-0.2310.099
0.0820.122
0.0210.129
BAR
-0.0080.020
-0.2570.100
0.0460.047
-0.1600.105
RBAR
-0.0040.020
-0.2560.101
0.0450.047
-0.1660.106
MBAR
-0.0050.023
-0.2800.097
0.0440.049
-0.2020.101
GDEL
-0.0280.028
-3.2470.112
-0.0180.052
-18.5460.114
GINS
0.0250.045
-8.0320.102
0.3880.810
-194.7850.450
105
Table S 10. High RMSEs and statistical uncertainties in RMSEs for GDEL and GINS in
both the full and the sparse set indicate low reliability. DEXP, IEXP, UBAR also
become unreliable in the sparse set.
((RMSE1)
Method (RMSE1))full
((RMSE2)
((RMSE1)
((RMSE2)
(RMSE2))full
(RMSE1))sp
(RMSE2))sp
TI
0.2570.100
0.2790.118
0.4040.124
0.6370.241
TI3
0.2640.099
0.3030.127
0.4010.126
0.4030.126
DEXP
0.6360.310
0.6510.300
5.9152.096
6.9582.297
IEXP
0.2970.110
0.4150.164
2.1250.875
2.0540.855
UBAR
0.2420.110
0.3220.141
1.1250.314
1.1230.309
BAR
0.2180.108
0.3170.156
0.4690.160
0.4910.168
RBAR
0.2180.107
0.3160.156
0.4710.163
0.4960.167
MBAR
0.2460.096
0.3550.148
0.4840.156
0.5180.170
GDEL
0.2850.109
3.2510.241
0.4540.160
18.5510.364
GINS
0.4440.128
8.0410.321
6.9482.711
194.7296.032
106
Figures S1-S4 contain subplots comparing the:

(1) Histogram (free energy distributions) generated from the ensemble of 100 free
energies from independent simulations (blue).
(2) Gaussian plotted using the free energy estimate as the mean and the analytical
estimate of the uncertainty as the standard deviation (black).
(3) Gaussian plotted with the free energy estimated using large number of samples
as the mean and the corresponding uncertainty as the standard deviation (magenta).
(4) Gaussian plotted with the free energy estimated using large number of
intermediate states as the mean and the corresponding uncertainty as the standard
deviation (cyan).
107
Gaussians for full set for dipole inversion.
108
Gaussians for sparse set for dipole inversion.
109

energies with the Gaussians for full set.
110

energies with the Gaussians for sparse set.
111
Figure S 5. Accuracy in free energy estimates of dipole inversion as a function of

112
Figure S 6. Accuracy in the estimate of enthalpy for dipole inversion process as a

function of different PME parameters.
113
Figure S 7. Simulation time for dipole inversion as a function of different PME

parameters.
114

G estimate of dipole inversion vary for different Fourier spacings and cutoffs.

simulation time for dipole inversion varies for different Fourier spacings and cutoffs.
115
Figure S 10. Accuracy in anthracene solvation free energy estimate as a function of

116
Figure S 11. Accuracy in anthracene solvation enthalpy estimates as a function of

117
Figure S 12. Simulation time for anthracene solvation as a function of different PME
parameters.
118
G estimate of anthracene solvation vary for different Fourier spacings and cutoffs.
Figure S 14. Subplot at row 2, column 2 in Figure S12 is expanded to explore exactly
how simulation time for anthracene solvation varies for different Fourier spacings and
cutoffs.
119
A3 Availability of datasets for other simulation packages

The
benchmark
test
set
is
available
for
distribution
and
use
at
http://www.alchemistry.org. It contains starting configurations and parameter files for all

100 uncorrelated starting configurations in three formats, corresponding to three different
molecular dynamics packages, GROMACS (*.gro, *.top & *.mdp), AMBER (*.inpcrd,
*.prmtop, *.in) and DESMOND (*.cms & *.cfg) as well as detailed instructions for the
test sets use. To ensure that the parameter files are correctly constructed, we have
calculated the single point energies of the 100 structures using the input files and the
corresponding MD package. We invite users of other simulation packages to contact us in
order to add energy comparisons to these simulation packages. More information is
provided as a README file attached in the next section.
This is intended to be the first version of our benchmark test set. It is not comprehensive,
and will require significant further expansion to be more useful to a wider range of
researchers. Future versions will be developed in response to feedback. For example,
future versions of the benchmark will ideally include molecules with long correlations for
internal motion, and free energies of well studied ligand-binding systems and
increasingly tractable protein ligand binding systems, such as T4 lysozyme.
120
Readme for free energy benchmark V1.0

A3.1. Benchmark Test Set Energies:
The benchmark set contains starting configurations and parameter files for all 100
uncorrelated starting configurations in three formats, corresponding to three different
molecular dynamics packages, GROMACS (*.gro, *.top & *.mdp), AMBER (*.inpcrd,
*.prmtop, *.in) and DESMOND (*.cms & *.cfg). Additionally, we have calculated the
single point energies of the 100 structures using the input files and the corresponding MD
package. Coordinates are matched to 0.00001 Angstroms, the highest precision that all of
the programs can distinguish.
A3.2. Difficulties in Exact Energy Matching:
Significantly, single point configuration energies calculated with typical run parameters
for different MD programs do not match each other. Each simulation package differs in
the way long-range energy corrections are implemented for both van der Waals and
Coulomb interactions, use different tapering functions.
Additionally, the choice of
combination rules makes it difficult to compare force field parameters directly without
significant manipulation of input files. Even the short range van der Waals and Coulomb
interactions can differ as each MD program uses different schemes to perform cutoffs
over charge groups straddling the cutoff. We have therefore reported two different single
point energies from single point simulations; one with
A3.2.1. Agreement in combination rule: AMBER can use only use LorentzBertholot
combination rules (arithmetic mean for ij and geometric mean for ij), while DESMOND
121
and GROMACS can also use the geometric combination rule (using geometric mean for
both for ij and ij). The simulation tests performed here were performed with the
geometric combination rule. In order to obtain equivalent energies in AMBER, we
tailored i and j such that the combined ij using LorentzBertholot rules match the ij
obtained from geometric combining rules.
For both methane solvation and dipole
inversion, there are only two particle types in each simulation: m (corresponding to
solute) and o (corresponding to water oxygen). We calculate an effective mE as follows:
by geometric combining rules, (mo)G = (m * 0)1/2 and with LorentzBertholot rules
(mo)LB = (mE+ o)/2. Setting (mo)G = (mo)LB, we get mE = 2(mo)G - o. For both of
these cases, there are no solute-solute terms, so the total energy remains the same.
However, for anthracene solvation we have 3 particle types: CH (corresponding to
aromatic carbon with two aromatic carbon neighbors and one hydrogen neighbor), C
(corresponding to aromatic carbon with three aromatic carbon neighbors) and o
(corresponding to water oxygen). We have now 3 ij terms to match Co, CHo, CHC.
Among the three it is possible only to match any two. Since anthracene has a nearly rigid
structure, deviations in CHC will alter the 1-4 intra-molecular interaction by
approximately 7 kJ/mol. Aside from slight twisting allowed by improper dihedrals, will
remain essentially fixed for all configurations of the system. We therefore choose to
match CHO, and CO, rather than CHC by calculating calculate CE and a CHE.
A3.2.2. Agreement in Cutoff
All simulation packages use slightly different potential tapering functions, so it is
impossible to match simulations between different packages using tapering alone. Even
122
using strict cutoffs, however, there are difficulties. GROMACS uses a group based
cutoff, so cutoffs of equal length in DESMOND and AMBER are not equivalent.
Because of the implementation of charge group dependent cutoffs with GROMACS, we
have used a switching potential with a very small switching distance (0.000001 nm),
which approximates to strict cutoff in GROMACS; switching distances any smaller do
not further change the energy.
A3.2.3. Agreement in PME Parameters:
Particle Mesh Ewald implementations are sufficiently different between the codes that
cutoffs that are in theory equivalent give energies that may differ by up to 40-50 kJ/mol.
However, if longer cutoffs are used, the differences are significantly reduced, to the 0.5-2
kJ/mol level.
A3.3. Best Matches to Parameters used in the Benchmark Set Tests:
Energies were calculated by running single point MD runs with AMBER and
DESMOND parameters which best approximate parameters used in our molecular
dynamics simulations using GROMACS (Table 1).
123
Table A 1. Single point simulation parameters for MD_sim_parm energies for methane
solvation, dipole inversion, anthracene solvation. The second set of parameters for
number of grid points is for dipole inversion, which had a larger box.
GROMACS
PME
(switch 0.88nm, cutoff
0.9nm)
Fourier Spacing 0.12nm
(nkx 32, nky 32 nkz 32)
(nkx 36, nky 36 nkz 36)
Order of spline 4
Ewald_tolerance 1.0e-08
Vdw
(switch 0.8nm, cutoff
0.9nm)
AMBER
PME
(cutoff = 0.9 nm)
(nfft1 32, nfft2 32, nfft3
32)
(nfft1 36, nfft2 36, nfft3
36)
Order of spline 4
Ew_coeff 0.43
Vdw
(cutoff = 0.9 nm)
DESMOND
PME
(taper 0.88nm, cutoff
0.9nm)
(n_k = [32 32 32 ])
(n_k = [36 36 36 ])
order = [4 4 4 ]
r_spread = 4.0
vdw
(taper 0.88nm, cutoff
0.9nm)
A3.4. Long Cutoff Parameters

As second series of single point energy calculations was performed using simulation
parameters that match as closely as possible. Single point energies were calculated with
increased cutoffs, smaller Fourier spacing, higher order spline interpolation, the same
Fourier space vectors nx, ny, nz, and same cutoff scheme followed in all three packages.
The GROMACS van der Waals switching function is used to obtain a strict, non-group
based cutoff. PME cutoff is kept slightly larger than that of van der Waals to match rlist
as required by GROMACS. PME cutoffs are as long as is possible with the box sizes for
all three programs before the simulation failing to run for one of the programs (in this
case, AMBER).
124
Table A 2. Single point simulation parameters for high cutoff energies for methane
solvation and anthracene solvation.
GROMACS
PME
(switch 1.19999999nm,
cutoff 1.2nm)
(nkx 54, nky 54 nkz 54)
Order of spline 6
Ewald tol. 10e-08
vdW
(switch 1.19999999nm,
cutoff 1.2nm)
AMBER
DESMOND
PME
(cutoff = 1.2 nm)
PME
( cutoff 1.2nm)
nfft1 50, nfft2 50, nfft3 50
n_k = [50 50 50 ]
Order of spline 6
Ew_coeff 0.43
order = [6 6 6 ]
r_spread = 4.0
vdW
(cutoff = 1.2 nm)
vdW
(cutoff 1.2nm)
Table A 3. Single point simulation parameters for high cutoff energies for dipole
inversion.
GROMACS
PME
(cutoff 1.6nm)
(nkx 60, nky 60 nkz 60)
Order of spline 6
Ewald tol. 10e-08
vdw
(switch 1.49999999nm,
cutoff 1.5nm)
AMBER
PME
(cutoff = 1.5 nm)
DESMOND
PME
(cutoff 1.5nm)
nfft1 60, nfft2 60, nfft3 60
n_k = [60 60 60 ]
Order of spline 6
Ew_coeff 0.43
order = [6 6 6 ]
r_spread = 4.0
Vdw
(cutoff = 1.5 nm)
Vdw
(cutoff 1.5nm)
A3.5. Format of output:

All energies correspond to an unconstrained start with the input coordinate files. These
energies are reported in two different files for each set of parameters.
One file
(final_onlypot.txt) contains only potential energies corresponding to each configuration

calculated in GROMACS, AMBER, and DESMOND. The second file (final_full.txt) has
125
a breakdown of the potential energy into its components: bond energy, angle energy,
dihedral energy, Lennard-Jones short range energy, Lennard-Jones dispersion correction
energy beyond the cutoff, total Lennard-Jones energy, 1-4 Lennard Jones interaction
energy, Coulomb short range interaction energy, Coulomb interaction energy in
reciprocal space, total Coulomb interaction energy and 1-4 Coulomb interaction energy.
Some MD packages do not print out all the energy components in their output, or add two
or three components together into a single term. Comparisons are therefore not always
possible between all the energy components for all packages.
A3.6. File Organization:
The organization of the distribution is as follows.
There are four .tgz files,
energy_comparisons.tgz, GROMACS.tgz, AMBER.tgz, DESMOND.tgz, containing the

single point energy evaluation results and files for generating the single point energy
evaluations using the three molecular simulation packages. The files are laid out as
follows:
energy_comparisons/
Methane_Solvation/
final_full_shortcutoff.txt
final_onlypot_shortcutoff.txt
final_full_longcutoff.txt
final_ onlypot_longcutoff.txt
Dipole_Inversion/
final_onlypot_longcutoff.txt
Anthracene_Inversion/
final_onlypot_longcutoff.txt
GROMACS/
126
Methane_Solvation/
gros/
ms.mdp
shortcutoff.mdp
longcutoff.mdp
Dipole_Inversion/
gros/
di.top
shortcutoff.mdp
longcutoff.mdp
Anthracene_Solvation/
gros/
as.top
shortcutoff.mdp
longcutoff.mdp
AMBER/
Methane_Solvation/
crds/
ms.prmtop
shortcutoff_md.in
longcutoff_md.in
Dipole_Inversion/
crds/
di.prmtop
shortcutoff_md.in
longcutoff_md.in
crds/
as.prmtop
shortcutoff_md.in
longcutoff_md.in
DESMOND/
Methane_Solvation/
cmss/
shortcutoff.cfg
longcutoff.cfg
Dipole_Inversion/
cmss/
shortcutoff.cfg
longcutoff.cfg
cmss/
shortcutoff.cfg
longcutoff.cfg
127

MS Thesis On Benchmark Test Set For Free Energy Calculations Using Molecular Simulations

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MS Thesis On Benchmark Test Set For Free Energy Calculations Using Molecular Simulations

Uploaded by

Copyright:

Available Formats

Using a molecular benchmark set to compare free energy estimators

and explore electrostatic potential parameter space

B. Tech. Chemical Engineering, Institute of Technology Banaras Hindu University, 2002

A Thesis presented to the Graduate Faculty

Department of Chemical Engineering

Accepted for the School of Engineering and Applied Science:

There is a significant need for improved tools to validate thermo-physical quantities

4. Exploring computational efficiency of multiple electrostatic potential parameter

A3.6. File Organization....................................................................................... 126

Figure 2. Free energy differences of transitions in the direction of increasing and

Figure 4. Uncertainty estimates (sample standard deviation, analytical, bootstrap) are

Figure 5. Uncertainty estimates (sample standard deviation, analytical, bootstrap) are

Figure 16. Molar enthalpy of vaporization of water as a function of different PME

Figure 17. Accuracy in methane solvation enthalpy estimates as a function of different

Figure S 3. Plots comparing the distribution of estimated anthracene solvation free

Figure S 4. Plots comparing the distribution of estimated anthracene solvation free

Figure S 5. Accuracy in free energy estimates of dipole inversion as a function of

Figure S 6. Accuracy in the estimate of enthalpy for dipole inversion process as a

Figure S 7. Simulation time for dipole inversion as a function of different PME

Figure S 8. Subplot row 2, column 2 in Figure S5 is expanded to explore exactly how

Figure S 9. Subplot row 2, column 2 in Figure S7 is expanded to explore exactly how

Figure S 11. Accuracy in anthracene solvation enthalpy estimates as a function of

Table 1. Statistical uncertainty calculated using three different approaches (analytical

Table 2. Statistical uncertainty calculated using three different approaches (analytical

Table 5. RMSEs and statistical uncertainties in RMSEs in UA methane solvation free

Table 7. Summary of all statistical tests are presented.

Table 8. PME parameters used to generate input sets.

Table 9. Lead parameter set for methane solvation.

Table 10. Lead parameter sets for dipole inversion.

Table 11. Lead parameter sets for anthracene solvation.

Table S 1. Statistical uncertainty calculated using three different approaches (analytical

Table S 2. Statistical uncertainty calculated using three different approaches (analytical

Table S 3. Statistical uncertainty calculated using three different approaches (analytical

Table S 4. Statistical uncertainty calculated using three different approaches (analytical

problems ranging from improved protein selectivity and stability on chromatographic

Other computational fields have successfully benchmarked and tested computational

thermo-physical properties by molecular simulation involves stochastic sampling of

Additionally, computing a given observable from independently selected samples gives

2. Molecular systems of benchmark set

2.3 Large molecule mutation system: absolute hydration free energy of UA

explanations, we assume that the simulations are performed in the isothermal-isobaric

ensemble average over the appropriate ensemble.

3.1 Free energy methods and error propagation

In both cases, the variance 2ij between two

Figure 2. Free energy differences of transitions in the direction of increasing and

The statistical variance for symmetric

3.1.3 Bennett acceptance ratio (BAR)42:

3.1.4 Unoptimized Bennett acceptance ratio (UBAR):

We instead accumulate the

averages in Eqs. 7 through 9 as the simulation progresses. If each intermediate free

3.1.5 Range-based Bennett acceptance ratio (RBAR):

By choosing this particular estimate, we are essentially pre-

3.1.6 Multistate Bennett acceptance ratio (MBAR)45:

exp[ Gk ' U k ' ( xkn )]

If 2U = U2-U2 is finite, and we approximate the Uij distribution as a Gaussian,

3.2 Building molecular systems, setting up simulation framework and designing

(A) System preparation and simulation parameters.

All hydrogen-containing bonds were

In order to examine the

3.2.1.1 Full set:

3.2.1.2 Sparse set:

3.2.2 Generating an ensemble of uncorrelated configurations

We selected configurations separated by 2 ns as our uncorrelated starting points. 2 ns is