Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

TorchMD: A Deep Learning Framework

for Molecular Simulations


BT5420: Computer Simulations of Biomolecular Systems
Advanced Analysis Project

Ishan Chokshi
BE19B018
July - November 2021
Contents
1 Objective 3

2 Motivation 3

3 Method/Procedure 4
3.1 TorchMD - End to end simulation . . . . . . . . . . . . . . . . . . 4
3.2 Training and simulation of coarse-grained Chignolin . . . . . . . . 6

4 Advantages 9

5 Disadvantages 10

6 Conclusion 10

7 Reference 10

1
1 Objective
A molecular dynamics engine based on PyTorch is proposed with a combination of
classical and machine learning potentials. This paradigm may be used to improve
on the empirical potentials of data-driven models now in use. This study describes
TorchMD’s capabilities, highlighting the functional forms that are supported as
well as a successful fitting technique for data-driven DNN potentials. This ap-
proach has also been shown to have a variety of practical applications, varying
from molecular level simulations, end-to-end parameter learning to neural network
potential training which can be applied in coarse-grained simulations.

2 Motivation
ˆ Molecular dynamics simulations have matured into a technology that can
be utilised to explore macromolecular structure-to-function interactions effi-
ciently. The current simulation times are close to those that are biologically
relevant. The amount of data collected on macromolecule dynamic char-
acteristics is sufficient to transform the traditional paradigm of structural
bioinformatics from investigating single structures to analysing conforma-
tional ensembles.

ˆ However, MD simulations faces two notable difficulties. First has been the
calculations of the parameters stored in the force field libraries which tra-
ditionally consumes a lot of time and requires significant modifications for
tuning those parameters.

ˆ Another drawback is that it is computationally intensive, and despite tireless


efforts and breakthroughs in increasing the speed of Molecular Dynamics, it
still fails to meet the timescales of a number of critical applications.

ˆ Since the introduction of deep neural networks (DNN), machine learning


has grown even more enticing as it enables creation of complex functions
and calculating their gradients.

ˆ There is a need to have a single framework that can leverage the neural
network and machine learning potentials, and enable quick and accurate
MD simulations.

2
3 Method/Procedure
3.1 TorchMD - End to End Simulation
ˆ TorchMD is a typical molecular dynamics programme at first sight. It in-
cludes a Langevin thermostat and NVT ensemble simulations. The Maxwell
Boltzmann distribution is used to determine the initial atomic velocities.
The velocity verlet method is used for integration. The reaction field ap-
proach is used to estimate long-range electrostatics and the L-BFGS method
is used for minimization.

ˆ In the first step of MD simulations, the starting coordinates and input


topologies are read using the moleculekit library.

Figure 1: Code to input the starting coordinates and input topologies

ˆ In the next step, a forcefield file is loaded and the topology file is used
to extract the relevant parameters which will be used for simulation. The
force field file is in a yaml format. TorchMD allows readingof force field
parameters from the AMBER force field. It also allows a user to enter the
force field parameters manually into the yaml file which is easy to read. A
snapshot of a sample file for water is attached below:

3
Figure 2: A sample force field file for water in yaml format

ˆ In the next step, a System object is created which will contain the state of
the system during the simulation, including:

4
a The current atom coordinates
b The current box size
c The current atom velocities (obtained from Maxwell-Boltzmann distri-
butions)
d The current atom forces
ˆ In the next step, a Force object is created which will be used to evaluate the
potential on a given System state. From here we move on to the dynamics
part.
ˆ For performing the dynamics, we will create an Integrator object for inte-
grating the time steps of the simulation as well as a Wrapper object for
wrapping the system coordinates within the periodic cell.
ˆ In the next step, we minimize our system using the L-BFGS method. On
the side, a CSV file logger can also be created for the simulation which keeps
track of the energies and temperature.
ˆ After this the entire dynamics can be performed and a trajectory file can be
obtained.

3.2 Training and simulation of coarse-grained Chignolin


ˆ TorchMD-Net, a fully functional Python code for training neural network
potentials, is available from TorchMD. Coarse-grained simulations can also
be carried out using TorchMD.
ˆ In the first step, the pdb file is converted into a topology file (.psf file)
containing a coarse-grained system. In this example, the coarse-grained
system consists only of C-α atoms that are connected by ”bonds”.
ˆ Now, the training data is loaded. The training data consists of the md
simulations done of the chignolin protein and its trajectory. The neural
network potential is trained by extracting the parameters of a collection of
previous forces using training data. This is done to limit the amount of
space in the training data that the dynamics can visit. For simplicity, the
training of the neural network is limited to bonds and repulsions.
ˆ Prior potentials for the harmonic and non-bonded interactions were calcu-
lated using the following formulae:
(prior)
Vharmonic (r) = k(r − r0 )2 + V0

5
(prior)
Vharmonic (r) = 4ϵr−6 + V0

ˆ In the next step, we caclulate a new term called delta-forces which can
be obtained by subtracting prior forces operating on the atoms from real
forces. Delta-forces were employed as a training input with coordinates.
The validation loss, training loss, and learning rate of the NNP is shown
below:

Figure 3: Plots of validation loss, training loss, and learning rate of the NNP

ˆ The novel coarse-grained simulations of the C-α system of the chignolin


protein are carried out using a mix of force fields spanning past forces and
trained networks. The parameters of the simulation are given as an input in
the form of a YAML format file. A snapshot of this file is attached below:

6
Figure 4: An input file containing the details for the simulations

ˆ Once the MD run is completed, analysis of the protein is done. In this case
an analysis of energy and RMSD is done. Energy is plotted using the pandas
library while the RMSD is plotted using the MetricRmsd projection from the
moleculekit library, with the initial PDB file used as the reference structure.
The RMSD plots of the training data simulation and the simulation carried
out by TorchMD are shown below:

7
Figure 5: RMS plots at different trajectories using standard MD simulations
(True) and PyTorch(Mirror)

4 Advantages
ˆ TorchMD is built in PyTorch, therefore PyTorch models including machine
learning coarse-grained potentials and ab initio neural network potentials
can be easily incorporated.
ˆ TorchMD has the capacity to execute end-to-end differentiable simulations,
with all of its parameters being differentiable. This allows performing ex-
periments which would otherwise take longer times to execute using other
codes that incorporate Molecular Dynamics.
ˆ It uses a YAML-based force-field format that is simple to read. If a user does
not want to manually insert all of the parameters in their own force field
file, TorchMD also enables reading force field parameters from the AMBER
force field.
ˆ Analytical gradients need not be caclulated using the forces because PyTorch
provides for automated differentiation. Forces may be calculated with a
single autograd PyTorch call on the system’s whole energy.
ˆ TorchMD can also execute a batch of similar simulations at the same time
by simply adjusting the random number generator seed and organising the

8
neural network potential into a batch for speed, recapturing, at least in part,
the efficiency of optimised molecular dynamics algorithms.

5 Disadvantages
ˆ Using TorchMD to train neural network potentials is computationally ex-
pensive.

ˆ Since it uses PyTorch arrays and operations for force computations, it is


slower compared to other MD codes such as ACEMD.

ˆ At present, the force field file used by TorchMD does not incorporate hy-
drogen bond constraints and missing neighbor lists

6 Conclusion
ˆ There is another use of TorchMD which is the inference of force field pa-
rameters from an MD trajectory using the automatic differentiation feature
available in this package. But there is still more work required to improve
this feature of the framework.

ˆ The ability of TorchMD to carry out differentiation from end to end is a


characteristic that can be incorporated in various open source projects.

ˆ TorchMD can facilitate collaboration between the ML and MD sectors,


shortening standard cycle of training,validation, and evaluation, and en-
courage the use of data-driven techniques in MD simulations.

7 Reference
Doerr, Stefan et al. “TorchMD: A Deep Learning Framework for Molecular Simu-
lations.” Journal of chemical theory and computation vol. 17,4 (2021): 2355-2363.
doi:10.1021/acs.jctc.0c01343

You might also like