Professional Documents
Culture Documents
Torchmd: A Deep Learning Framework For Molecular Simulations
Torchmd: A Deep Learning Framework For Molecular Simulations
Ishan Chokshi
BE19B018
July - November 2021
Contents
1 Objective 3
2 Motivation 3
3 Method/Procedure 4
3.1 TorchMD - End to end simulation . . . . . . . . . . . . . . . . . . 4
3.2 Training and simulation of coarse-grained Chignolin . . . . . . . . 6
4 Advantages 9
5 Disadvantages 10
6 Conclusion 10
7 Reference 10
1
1 Objective
A molecular dynamics engine based on PyTorch is proposed with a combination of
classical and machine learning potentials. This paradigm may be used to improve
on the empirical potentials of data-driven models now in use. This study describes
TorchMD’s capabilities, highlighting the functional forms that are supported as
well as a successful fitting technique for data-driven DNN potentials. This ap-
proach has also been shown to have a variety of practical applications, varying
from molecular level simulations, end-to-end parameter learning to neural network
potential training which can be applied in coarse-grained simulations.
2 Motivation
Molecular dynamics simulations have matured into a technology that can
be utilised to explore macromolecular structure-to-function interactions effi-
ciently. The current simulation times are close to those that are biologically
relevant. The amount of data collected on macromolecule dynamic char-
acteristics is sufficient to transform the traditional paradigm of structural
bioinformatics from investigating single structures to analysing conforma-
tional ensembles.
However, MD simulations faces two notable difficulties. First has been the
calculations of the parameters stored in the force field libraries which tra-
ditionally consumes a lot of time and requires significant modifications for
tuning those parameters.
There is a need to have a single framework that can leverage the neural
network and machine learning potentials, and enable quick and accurate
MD simulations.
2
3 Method/Procedure
3.1 TorchMD - End to End Simulation
TorchMD is a typical molecular dynamics programme at first sight. It in-
cludes a Langevin thermostat and NVT ensemble simulations. The Maxwell
Boltzmann distribution is used to determine the initial atomic velocities.
The velocity verlet method is used for integration. The reaction field ap-
proach is used to estimate long-range electrostatics and the L-BFGS method
is used for minimization.
In the next step, a forcefield file is loaded and the topology file is used
to extract the relevant parameters which will be used for simulation. The
force field file is in a yaml format. TorchMD allows readingof force field
parameters from the AMBER force field. It also allows a user to enter the
force field parameters manually into the yaml file which is easy to read. A
snapshot of a sample file for water is attached below:
3
Figure 2: A sample force field file for water in yaml format
In the next step, a System object is created which will contain the state of
the system during the simulation, including:
4
a The current atom coordinates
b The current box size
c The current atom velocities (obtained from Maxwell-Boltzmann distri-
butions)
d The current atom forces
In the next step, a Force object is created which will be used to evaluate the
potential on a given System state. From here we move on to the dynamics
part.
For performing the dynamics, we will create an Integrator object for inte-
grating the time steps of the simulation as well as a Wrapper object for
wrapping the system coordinates within the periodic cell.
In the next step, we minimize our system using the L-BFGS method. On
the side, a CSV file logger can also be created for the simulation which keeps
track of the energies and temperature.
After this the entire dynamics can be performed and a trajectory file can be
obtained.
5
(prior)
Vharmonic (r) = 4ϵr−6 + V0
In the next step, we caclulate a new term called delta-forces which can
be obtained by subtracting prior forces operating on the atoms from real
forces. Delta-forces were employed as a training input with coordinates.
The validation loss, training loss, and learning rate of the NNP is shown
below:
Figure 3: Plots of validation loss, training loss, and learning rate of the NNP
6
Figure 4: An input file containing the details for the simulations
Once the MD run is completed, analysis of the protein is done. In this case
an analysis of energy and RMSD is done. Energy is plotted using the pandas
library while the RMSD is plotted using the MetricRmsd projection from the
moleculekit library, with the initial PDB file used as the reference structure.
The RMSD plots of the training data simulation and the simulation carried
out by TorchMD are shown below:
7
Figure 5: RMS plots at different trajectories using standard MD simulations
(True) and PyTorch(Mirror)
4 Advantages
TorchMD is built in PyTorch, therefore PyTorch models including machine
learning coarse-grained potentials and ab initio neural network potentials
can be easily incorporated.
TorchMD has the capacity to execute end-to-end differentiable simulations,
with all of its parameters being differentiable. This allows performing ex-
periments which would otherwise take longer times to execute using other
codes that incorporate Molecular Dynamics.
It uses a YAML-based force-field format that is simple to read. If a user does
not want to manually insert all of the parameters in their own force field
file, TorchMD also enables reading force field parameters from the AMBER
force field.
Analytical gradients need not be caclulated using the forces because PyTorch
provides for automated differentiation. Forces may be calculated with a
single autograd PyTorch call on the system’s whole energy.
TorchMD can also execute a batch of similar simulations at the same time
by simply adjusting the random number generator seed and organising the
8
neural network potential into a batch for speed, recapturing, at least in part,
the efficiency of optimised molecular dynamics algorithms.
5 Disadvantages
Using TorchMD to train neural network potentials is computationally ex-
pensive.
At present, the force field file used by TorchMD does not incorporate hy-
drogen bond constraints and missing neighbor lists
6 Conclusion
There is another use of TorchMD which is the inference of force field pa-
rameters from an MD trajectory using the automatic differentiation feature
available in this package. But there is still more work required to improve
this feature of the framework.
7 Reference
Doerr, Stefan et al. “TorchMD: A Deep Learning Framework for Molecular Simu-
lations.” Journal of chemical theory and computation vol. 17,4 (2021): 2355-2363.
doi:10.1021/acs.jctc.0c01343