Professional Documents
Culture Documents
(W-6302) Neural Mode Jump Monte Carlo
(W-6302) Neural Mode Jump Monte Carlo
(W-6302) Neural Mode Jump Monte Carlo
ABSTRACT
Markov chain Monte Carlo methods are a powerful tool for sampling equilibrium configurations in complex systems. One problem these
methods often face is slow convergence over large energy barriers. In this work, we propose a novel method that increases convergence in
systems composed of many metastable states. This method aims to connect metastable regions directly using generative neural networks
in order to propose new configurations in the Markov chain and optimizes the acceptance probability of large jumps between modes in
the configuration space. We provide a comprehensive theory as well as a training scheme for the network and demonstrate the method on
example systems.
Nonequilibrium path sampling methods14–16 construct which could be obtained from, e.g., x-ray scattering or nuclear mag-
reversible moves between equilibrium states as a collection of small netic resonance (NMR) experiments from which starting points in
out-of-equilibrium trajectories. These moves are typically generated different conformations can be generated.31 Possible applications
by driving the system along the reaction coordinate, and a system range from proteins with multiple metastable states (e.g., Kinases)
specific protocol to generate the proposed state must be designed. to solid state systems with multiple phases, where configurations in
A recent approach uses machine learning methods in order to learn either phase can be easily generated, but observing the transition
such a protocol.17 is rare. Local displacements and neural moves are randomly alter-
In situations where states are disconnected and finding a reac- nated in a combined scheme to accelerate the convergence rate of
tion coordinate is challenging, different approaches have been devel- Markov chains. Configurations from different metastable states are
oped. Smart darting Monte Carlo18,19 is a promising method that used to train the networks, which are optimized to produce high
alternates local and long range moves from one region of the con- acceptance probability moves. Local exploration ensures ergodicity
figuration space to another that is arbitrary far. These moves are of the scheme, while neural moves accelerate convergence to equilib-
attempted between small spheres around local minima. In high rium, realizing an accurate and deep exploration of the configuration
dimensions, however, the fraction of the spheres in the total volume space.
becomes vanishingly small, and therefore, finding a sphere by ran-
dom exploration becomes unlikely. This problem is circumvented in
ConfJump20 by finding the closest energy minimum and attempting II. THEORY
long range moves by translation to another energy minimum. A sufficient condition to ensure that a Markov chain asymptot-
The generation of long range moves is challenging when the ically samples the equilibrium distribution is ergodicity and detailed
energy landscape is rough since the potential energy surface in the balance.32 Given the system in a configuration x, a new state y is
region surrounding local minima can drastically change among the added to the chain according to a transition probability p(x → y).
different minima. In this case, using trivial translation as long range The transition probability is defined to satisfy the condition of
moves would most likely cause large energy differences, and trial detailed balance
moves are likely to be rejected. In fact, in order to keep the energy
difference small, trial moves should be generated with a bijective π(x) p(x → y) = π(y) p(y → x), (1)
function that pairs points with similar energy values. Constructing
such bijection manually would require detailed knowledge of the
⎧
⎪ ⎫
⎪ π(y) pβα pprop (y → x) ⎪
βα
αβ ⎪ ALGORITHM 1. Neural MJMC sampling scheme.
pacc (x → y) = min⎨1, ⎬. (2)
⎪
⎪ ⎪
⎩ π(x) pαβ pprop (x → y) ⎪
αβ
⎭ input: ls = []: empty list for samples
{pα β} : proposal selection probabilities
We parameterize the neural proposal K αβ and its inverse K βα con-
{μα β}: proposal densities
necting the cores Ωα and Ωβ as a bijective function μαβ (⋅) pairing the
x ← x0 : starting point of sampling
states defined in the two cores, i.e., y = μαβ (x), μ−1
αβ (y) = x, ∀x ∈ Ωα , N iterations : number of generated samples
where x ∈ Ωα , y ∈ Ωβ . Thus, for each pair of different cores (Ωα , Ωβ ), σ local : standard deviation of local moves
a bijective function μαβ (⋅) is defined. The probability distribution of while i ≤ N iterations do
neural proposals is then represented with Dirac delta distributions draw proposal density K αβ from {pαβ }
and the acceptance specifies to /if α = β then //propose local move
w ← sample from N(0, 𝟙)
⎧
⎪ −1 ⎫ / y ← x + w ⋅ σ local
αβ ⎪ π(y) pαβ δ(x − μαβ (y)) ⎪ ⎪ /
pacc (x → y) = min⎨1, ⎬. (3)
⎪
⎪ π(x) pβα δ(y − μαβ (x)) ⎪
⎪ pacc ← pαα acc (x → y) [Eq. (5)]
⎩ ⎭ /else //propose neural move
y = μαβ (x)
Using the change of variable formula in the Dirac distribution / /p ← pαβ (x → y) [Eq. (4)]
δ(x − μ−1
αβ (y)) = ∣det Jμαβ (x)∣δ(y − μαβ (x)) with the Jacobian Jμαβ (x)
acc acc
of the function μαβ , the acceptance probability for neural moves can if r ∼ U(0, 1) < pacc then
/ /x←y
be simplified to
end
/ ls .append (x)
αβ π(y) pαβ
pacc (x → y) = min{1, ∣det Jμαβ (x)∣}. (4) / i←i+1
π(x) pβα
end
output: list of samples ls
In case the local proposal (α = β) is selected, the inverse move is only
possible with another local proposal K αα . Note that a local move may
leave the current core and the proposal probability for the inverse
max log Ex∼Ωα [ ppacc (x → y)ppacc (y → x)] E{log[ ppacc (x → y)ppacc (y → x)]} ≥ −β∣ΔF + ΔRij ∣. (8)
μαβ
≥ max E{log[ ppacc (x → y)ppacc (y → x)]} This result shows that we can use the freedom in the proposal
= max E[min(0, log f ) + min(0, − log f )] selection ratio to maximize the bi-directional acceptance.
μαβ
FIG. 3. Top: free energy along the distance between the dimer particles. The cor-
responding bands represent reference values obtained by umbrella sampling with
bistable dimer potential has a minimum in the closed and open the standard error given by their thickness. The neural network has been trained
configurations, which are separated by a high energy barrier (see at temperature T = T 0 ; then, simulations at different temperatures have been per-
Fig. 3, middle). Opening and closing of the dimer require a concerted formed using Neural MJMC. Simulations are run for 1.5 × 107 steps, and error
motion of the solvent particles, which makes it difficult to sample the bars are generated from several sampling runs. In this figure, we observe that
physical path connecting the two configurations (see Appendix B for Neural MJMC correctly samples the free energy along the reaction coordinate of
the system at different temperatures. Middle: dimer interaction potential E d as a
a more detailed description of the system). function of the dimer distance. Bottom: reference configurations in the closed (left)
The open and closed configurations serve as cores (see Fig. 3, and open (right) dimer configuration. The dimer particles are displayed in blue,
bottom) in Neural MJMC and are distinguished by the distance and solvent particles are displayed in gray. The strongly repulsive potential does
between the dimer particles. The neural network is trained on states not allow for significant overlaps between particles at equilibrium.
sampled independently in the closed and open configurations at four
different bias strengths with 105 samples for each well and bias. As
the system is invariant under permutation of solvent and dimer par-
ticles, neural moves would have to be learned independently for each for this system. This slowdown arises from the evaluation of the net-
permutation of the system, which is clearly unfeasible as the number work and the remapping of particles. As a reference value, we use
of permutation scales factorially with the number of particles. This umbrella sampling13 to sample the free energy along the dimer dis-
problem is circumvented by permutation reduction, i.e., relabeling tance. To this end, we use 20 umbrellas along the dimer distance and
the particles such that the distance to the reference configuration is compute configurations with MCMC. The free energy is calculated
minimized, which is realized using the Hungarian algorithm37 with using the Multistate Bennet Acceptance Ratio38 method.
the reference configurations as the target. Neural moves cause direct transitions between the two
Each neural network in the RNVP architecture consists of three metastable states and thus a rapid exploration of the configuration
hidden layers with 76 nodes. The transformation consists of a total of space. The convergence to the Boltzmann distribution is observed as
20 RNVP layers and contains ∼1.4 × 106 trainable parameters. Neu- shown in Fig. 3. An estimate for the crossing time with only local
ral MJMC is used to generate a single trajectory with 1.5 × 107 steps, moves can be found to be at the order of 1012 sampling steps at T = 1
where the probability of neural moves is set to 1%. In terms of com- from the Kramers problem,39 which makes exhausting simulations
putational performance, sampling with Neural MJMC is approxi- unfeasible. In Neural MJMC, many crossings of the energy barrier
mately a factor of four slower than MCMC with local displacements can be observed (Fig. 4, top). This is also reflected in the autocorre-
ΔS = SY − SX = −kB Ex∼pΩ [log∣det Jμαβ (x)∣], (A3) TABLE III. Parameters of the RNVP networks used in the experiments.
11 29
A. Laio and M. Parrinello, “Escaping free-energy minima,” Proc. Natl. Acad. Sci. M. S. Albergo, G. Kanwar, and P. E. Shanahan, “Flow-based generative models
U. S. A. 99(20), 12562–12566 (2002). for Markov chain Monte Carlo in lattice field theory,” Phys. Rev. D 100, 034515
12 (2019).
J. Zhang, Y. I. Yang, and F. Noé, “Targeted adversarial learning optimized
30
sampling,” J. Phys. Chem. Lett. 10(19), 5791–5797 (2019). K. A. Nicoli, S. Nakajima, N. Strodthoff, W. Samek, K.-R. Müller, and P. Kessel,
13 “Asymptotically unbiased generative neural sampling,” Phys. Rev. E 101(2),
G. M. Torrie and J. P. Valleau, “Nonphysical sampling distributions in Monte
Carlo free-energy estimation: Umbrella sampling,” J. Comput. Phys. 23(2), 187– 023304 (2020).
31
199 (1977). D. L. Parton, P. B. Grinaway, S. M. Hanson, K. A. Beauchamp, and J. D.
14
Y. Chen and B. Roux, “Constant-pH hybrid nonequilibrium molecular dynam- Chodera, “Ensembler: Enabling high-throughput molecular simulations at the
ics Monte Carlo simulation method,” J. Chem. Theory Comput. 11(8), 3919–3931 superfamily scale,” PLoS Comput. Biol. 12(6), e1004728 (2016).
32
(2015). W. K. Hastings, “Monte Carlo sampling methods using Markov chains and their
15
J. P. Nilmeier, G. E. Crooks, D. D. L. Minh, and J. D. Chodera, “Nonequilibrium applications,” Biometrika 57(1), 97–109 (1970).
candidate Monte Carlo is an efficient tool for equilibrium simulation,” Proc. Natl. 33
C. W. Gardiner, Handbook of Stochastic Methods for Physics, Chemistry and the
Acad. Sci. U. S. A. 108(45), E1009–E1018 (2011). Natural Sciences, Springer Series in Synergetics, 3rd ed. (Springer-Verlag, Berlin,
16
H. A. Stern, “Molecular simulation with variable protonation states at constant 2004), Vol. 13.
pH,” J. Chem. Phys. 126(16), 164112 (2007). 34
T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordi-
17
H. Wu, J. Köhler, and F. Noé, “Stochastic normalizing flows,” arXiv:2002.06707 nary differential equations,” in Advances in Neural Information Processing Systems
(2020). (Curran Associates, Inc., 2018), pp. 6571–6583.
18 35
I. Andricioaei, J. E. Straub, and A. F. Voter, “Smart darting Monte Carlo,” P. Mehta, M. Bukov, C.-H. Wang, A. G. R. Day, C. Richardson, C. K. Fisher,
J. Chem. Phys. 114(16), 6994–7000 (2001). and D. J. Schwab, “A high-bias, low-variance introduction to machine learning
19
K. Roberts, R. Sebsebie, and E. Curotto, “A rare event sampling method for for physicists,” Phys. Rep. 810, 1–124 (2019), part of the Special Issue: A high-bias,
diffusion Monte Carlo using smart darting,” J. Chem. Phys. 136(7), 074104 low-variance introduction to Machine Learning for physicists.
36
(2012). G. Voronoi, “Nouvelles applications des paramètres continus à la théorie
20
L. Walter and M. Weber, “ConfJump: A fast biomolecular sampling method des formes quadratiques. Deuxième mémoire. Recherches sur les parallélloèdres
which drills tunnels through high mountains,” Technical Report No. 06-26, ZIB, primitifs,” J. Reine Angew. Math. 1908, 198–287.
Takustr., Berlin, 2006. 37
H. W. Kuhn, “The Hungarian method for the assignment problem,” Nav. Res.
21
H. Shen, J. Liu, and L. Fu, “Self-learning Monte Carlo with deep neural Logostic Q. 2(12), 83–97 (1955).
networks,” Phys. Rev. B 97, 205140 (2018). 38
M. R. Shirts and J. D. Chodera, “Statistically optimal analysis of samples from
22
R. Habib and D. Barber, “Auxiliary variational MCMC,” in International multiple equilibrium states,” J. Chem. Phys. 129(12), 124105 (2008).
Conference on Learning Representations (ICLR, La Jolla, CA, 2019). 39
H. A. Kramers, “Brownian motion in a field of force and the diffusion model of
23
J. Song, S. Zhao, and S. Ermon, “A-NICE-MC: Adversarial training for chemical reactions,” Physica 7(4), 284–304 (1940).