A Highly Vectorised "Link-Cell" FORTRAN Code For The DL - POLY Molecular Dynamics Simulation Package

Computer Physics Communications 125 (2000) 167–192
www.elsevier.nl/locate/cpc
A highly vectorised “link-cell” FORTRAN code for the

DL_POLY molecular dynamics simulation package
Kholmirzo Kholmurodov a,1 , William Smith b , Kenji Yasuoka c , Toshikazu Ebisuzaki a
a Computational Science Division, Advanced Computing Center, The Institute of Physical and Chemical Research (RIKEN), Hirosawa 2-1,
Wako, Saitama 351-0198, Japan
b Daresbury Laboratory, Daresbury, Warrington, Cheshire, UK, WA4 4AD
c Department of Mechanical Engineering, Keio University, Yokohama 223-8522, Japan
Received 7 June 1999
Abstract
Highly vectorised FORTRAN subroutines, based on the link-cell algorithm for DL_POLY molecular dynamics simulation
package, are developed. For several specific benchmark systems the efficiency of the proposed codes on a Fujitsu VPP700/128E
vector computer has been tested. It is shown that in constructing the neighbor list and in calculating atomic forces our link-cell
method is significantly faster than the original code.  2000 Elsevier Science B.V. All rights reserved.
1. Introduction
The present paper is devoted to the optimization of the molecular dynamics (MD) simulation code DL_POLY
on a vector computer. DL_POLY is a FORTRAN package of subroutines and data files, designed for the MD study
of a wide range of physico-chemical and biological systems on a distributed memory parallel computer [1]. It
was written at Daresbury Laboratory by Smith and Forester under the auspices of the Engineering and Physical
Sciences Research Council (EPSRC) in the United Kingdom for the EPSRC’s Collaborative Computational Project
(CCP5) for the computer simulation of condensed phases. The code is documented in Ref. [2] and the algorithms
employed in the original code are described in Refs. [3–6]. DL_POLY is a general purpose MD simulation package,
but it was not designed specifically to run on vector machines, and consequently the efficiency of the code on such
machines may be enhanced. Hence, the main purpose of this reported optimization of the DL_POLY package
was to increase the performance of the code on vector computer. We have used a Fujitsu VPP700E/128 vector
machine and have tested the tuned DL_POLY code for several specific benchmark systems. Optimal performance
was found to require considerable re-coding. Analysis of the code for the different tests shows that a significant
fraction of the computer time (about 12 ) is spent calculating the forces. Many time consuming loops in the code are
not vectorisable in their original form and often the compiler was unable to recognize that certain loops are in fact
vectorisable (because the array elements in the loop can be assumed to be non-recursive).
1 Permanent address: Laboratory of Computing Techniques and Automation, Joint Institute for Nuclear Research, Dubna, Moscow region,
141980, Russia. E-mail: mirzo@atlas.riken.go.jp.
0010-4655/00/$ – see front matter  2000 Elsevier Science B.V. All rights reserved.
PII: S 0 0 1 0 - 4 6 5 5 ( 9 9 ) 0 0 4 8 5 - 3
168 K. Kholmurodov et al. / Computer Physics Communications 125 (2000) 167–192
The optimization work on the DL_POLY package has concentrated on the restructuring of the central “link-cells”
MD algorithm for constructing of the neighbor list and improving the procedure for calculating atomic forces. It
is worth noting that DL_POLY calculates the nonbonded pair interactions using a Verlet neighbor list [7,3], which
records the indices of all ’secondary’ atoms within a certain radius of each ‘primary’ atom; the radius being the
cut-off radius (rcut ) normally applied to the nonbonded potential function, plus an additional increment (1rcut ). The
neighbor list removes the need to scan over all pairs of atoms in the simulation at every timestep. The larger radius
(rcut + 1rcut ) means the same list can be used for several timesteps without requiring an update. The frequency
at which the list must be updated depends on the thickness of the region 1rcut . In DL_POLY the following two
methods are implemented to construct the neighbor list: the first is based on the Brode–Ahlrichs scheme [8] (viz.,
subroutine “parlst”) and is used when rcut is large in comparison with the simulation cell; the second uses the
link-cell algorithm [9,10] (viz., subroutine “parlink”) when rcut is relatively small. The performance of the above
schemes on vector machines is quite different depending on their degree of vectorisation. In the original version the
Brode–Ahlrichs scheme is relatively vectorisable. As regards the “link-cell” scheme, improving its efficiency on
vector computers is a challenging task and has been the subject of many studies [11,12]. A vectorised version of the
“link-cell” algorithm has been described by Rapaport [11] and practically realized in [13]. A partially vectorised
“link-cell” subroutine for DL_POLY package has been proposed by Liem in [13]. Nevertheless, a large proportion
of time (about 21% of CPU time) was still used in the tuned “parlink” subroutine which makes this subroutine a
potential target for further enhancement.
In this paper we present highly vectorised FORTRAN subroutines for the DL_POLY_2.11 (DL_POLY
version 2.11) package which are based on the above two algorithms. The structure of the codes we have developed
on the Fujitsu VPP700E/128 are described in Section 2. In Section 3 we show the efficiency of the new codes for
several of the DL_POLY benchmark systems. In Section 4 we outline a short summary.
2. Optimization of neighbor list generation
The basic structure of the MD FORTRAN codes (in the form of the conversion to vector source) are presented
in the Appendices A–E. In Appendix A, we present the section of the computer code “parlink” for the original
DL_POLY “link-cells” scheme. The “v” and “s” symbols before the FORTRAN statement designate whether each
FORTRAN statement is vectorisable. The subroutines “parlink” and “parlst” in DL_POLY construct the neighbor
lists which identify all potential interaction pairs according to the predefined distance criterion. These lists are
updated frequently to account for the dynamic nature of the system.
2.1. Optimization of “parlink” code
The poor performance of the “parlink” subroutine on a vector computer was basically due to the short vector
lengths of most loops. In many cases most of the execution time is consumed by this subroutine. As it seen from
Appendix A this subroutine is substantially non-vectorisable. In “parlink”, in its original form (Appendix A), the
inner loop (lines 286–430) cannot be vectorised because of several factors: the existence of two goto loops (labeled
200 and 300), a variable, lentry(ik) which is incremented conditionally within the loop and a vectorised short loop
(lines 389–393). Hence, one of the basic steps for restructuring of previously un-vectorisable loops here has been
included in the increasing of the vector lengths for central loops. In Appendix B we have shown the corresponding
section of the tuned “parlink” subroutine. To vectorise the “parlink” subroutine we have preliminarily constructed
the following two new lists:
list1(icell, i) = i 0 and list2(icell, j ) = j 0 .
It should be noted that the conventional link-cell method [9,10] goes through the following steps:
K. Kholmurodov et al. / Computer Physics Communications 125 (2000) 167–192 169
(1) MD cell is divided into contiguous cubic subcells (icell);

(2) all atoms are alloted in accordance to their coordinates to the appropriate subcell;
(3) the interaction between each included atom and its neighbors in the same subcell or in one of the neighboring
subcells (26 in three dimensions) is calculated.
Double counting of interactions is avoided by excluding half of the neighboring cells from consideration. The first
of the above two lists (vis. list1) assigns all the atoms to the appropriate cell (numbered icell) in accordance with
the “particle-cell” scheme, while the second list (list2) defines all the atoms in 13 of the cells neighboring cell icell.
From these two lists we can now construct the final list
list(i, j ) = j 0 ,
which contains all potential interacting pairs (i.e. assigns the atoms in “particle-particle” pairs). By defining a
maximum atom number over each cell we have increased the vector length in the main loop of “parlink”: now in
the main two do loops over subcells the loop do i = 1, natms is inner one.
In Appendix C we present another enhancement of the vectorised “parlink” subroutine. In the previous version
shown in Appendix B the atomic index i jumps over nodes, as it was originally designed in DL_POLY:
i = 1, 1 + M, 1 + 2M, 1 + 3M, . . .
(where M is the number of processors). So, lentry(ik) = lentry(ik) + 1 was not amenable to efficient vectorisation
of the largest loop do = 1, natms, and we were required to introduce some new do loops and temporary arrays
lentry− tmp(i), list− tmp(i, lentry− tmp(i)) = j . Now, we have introduced the blocking indices
iatm1 = idnode ∗ natms/mxnode + 1,
iatm2 = (idnode + 1) ∗ natms/mxnode,
so the loop
do i = iatm1, iatm2
and index
ik = i − iatm1 + 1
will provide some sequential ordering for lentry(ik) = lentry(ik) + 1. In such a modification the large loop
do i = iatm1, iatm2 can easily be vectorised.
General sketched flow diagram of the algorithms developed above is shown in the figure below. It should be
noted that for both proposed versions of the “parlink” subroutine we will be able to use all the complicated features
in the construction of the Verlet neighbor list which are present in the conventional DL_POLY (e.g., the tests
for ‘frozen’ atoms, or for excluded atoms for macromolecules, etc.). Similar modifications have been made in
the forces subroutines also, while retaining all the universality possessed by the original code. Originally, the
forces subroutines calculated only the interactions for one specific central atom per call. These now contain the
conventional double loop in each force subroutine, so all interactions can be evaluated with just one call to the
relevant subroutines.
2.2. Optimization of “parlst” code
In Appendix D the section of the “parlst” code for the original DL_POLY Brode–Ahlrichs scheme is shown. It
easy to see that the largest inner loop
do i = idnode + 1, last, mxnode
is not vectorised. As it was noted above in the original version the Brode–Ahlrichs algorithm (viz., subroutine
“parlst”) possesses a relatively higher (about 50%) vectorisation proportion. However, we have tuned this
Table
The optimizations of “parlst” and “parlink” codes
Subroutine Original (Percent) Optimized (Percent)
“parlst.f” 51 99
“parlink.f” 0.1 96
subroutine to obtain a fully vectorised one. The corresponding section of the modified “parlst” subroutine is
presented in Appendix E. By introducing a new logical array lchk1(ii) which does the running check of neighbor
list array capacity we can now reach a full vectorisation of the above largest inner loop.
We have tested both optimized codes (viz., “parlink” and “parlst” subroutines) for several DL_POLY benchmark
systems (Section 3). The results are in full agreement with the original, so the optimized codes are verified with
respect to the tested systems. A summary of the optimization results on the “parlink” and “parlst” codes is given in
the table above. The efficiency of the subroutine “parlink” which uses a significant amount of total CPU time has
now been improved and the vectorisation proportion is now at 96%.
3. The DL_POLY benchmark
In the present section we give the results of the DL_POLY optimization for several standard benchmarks supplied
with the code. These are summarized below:
Benchmark 1: Simulation of a sodium-potassium disilicate glass (1080 atoms, 10 time steps);
Benchmark 2: Simulation of metallic aluminium with Sutton–Chen potential at 300 K (19625 Al atoms, 100 time
steps);
Benchmark 3: Simulation of valinomycin in 1223 water molecules (3837 atoms, 100 time steps);
Benchmark 4: Dynamic Shell model water structure (768 atoms, 1024 sites, 1000 time steps);
Benchmark 5: Dynamic Shell model MgCl2 structure (768 atoms, 1280 sites, 500 time steps);
Benchmark 8: Simulation of a model membrane with 2 membrane chains, 202 solute molecules and 2746 solvent
molecules (3148 atoms, 500 time steps).
It should be noted that some of the above tests we have treated by rather a different manner (for example, we
did not use the “multiple time step” or “neutral groups” algorithms which are implemented in the conventional
package too). In our calculations the first two benchmark are treated by the “link-cell” scheme, the remainder four
benchmark by Brode–Ahlrichs one.
Benchmark 1
This simulation uses two and three body potentials and some of the pair potentials are read from the TABLE
file. Three body forces are assumed to be short ranged and therefore calculated using the “link-cell” algorithm.
The execution profile for the original code is listed in Table 1 and for the tuned version is given in Table 2.
For the analysis of the results output here we have used FJSAMP tools, “VL” denotes the average vector
length and “V− Hit” means the vectorisation proportion. Most of the CPU time is consumed by the subroutine
“thbfrc” which calculates the three body interaction potential and force. The “parlink” subroutine is used here
to construct the neighbor list. It is easy to see from Table 1 that this subroutine uses a small amount of time
in comparison with the time used to compute the interaction force and potential. However, the increasing of the
vectorisation proportion of “parlink” subroutine, as it seen from Table 2, essentially reduces the total computation
time (Table 3).
Table 1
Subroutine Count Percent VL V− Hit (Percent)
THBFRC− 10825 47.3 52 22.9
EWALD2− 135 0.6 295 49.6
SRFRCE− 80 0.3 246 46.2
PARLINK− 33 0.1 – 0.1
EWALD1− 8 0.0 1080 50.0
Table 2
THBFRC− 6976 47.1 53 35.2
SRFRCE− 53 0.4 257 98.1
EWALD2− 35 0.2 302 77.1
PARLINK− 30 0.2 872 96.7
EWALD1− 6 0.0 1080 50.0
Table 3
Benchmark 1
Software 1-node 2-node 4-node
Original 469.3 156.7 80.4
Optimized 303.4 154.7 79.5
Benchmark 2
Simulation of metallic aluminium which contains 19625 atoms and uses the many body Sutton–Chen interaction
potential. The execution profile is given in Table 4 for the original code and in Table 5 for the optimized one. Most
execution tume is consumed by 5 routines, these are “parlink”, “suttchen”, “denloc”, “images” and “srfrce”. As it
was noted in the previous section in the original codes the forces subroutines only calculate interactions for one
specific central atom per call. The subroutine “denloc” calculates the local atomic density around each atom while
“suttchen” calculates the actual interaction potential and forces on each atom. Subroutine “srfrce” evaluates short
ranged interactions and subroutine “images” find out the minimum image separation of a pair of interacting atoms.
For the tuned codes the conventional double loop is accepted for each force subroutine and all interactions can be
calculated with just one call to these subroutines. The optimization, as it seen from Table 6, has reduced execution
time significantly.
Benchmark 3
Simulation of valinomycin in 1223 water molecules (with a total 3837 interaction sites). The water is defined
as a rigid body (TIP4P molecules) and bond constraints are applied to all chemical bonds in the valinomycin.
Both short ranged Lennard-Jones interaction and long ranged Coulomb interaction are used in this simulation. The
electrostatics are handled by a shifted Coulombic potential. The execution profiles for the original and optimized
codes are given in Tables 7 and 8, respectively. The Table 9 shows us computation times for 1-, 2- and 4-nodes, so
we can easily see a significant improvement of the performance for optimized codes.
Table 4
SUTTCHEN− 6655 24.5 42 43.5
PARLINK− 5194 19.2 2048 0.2
DENLOC− 3656 13.5 51 46.0
SRFRCE− 1984 7.3 60 14.3
IMAGES− 1308 4.8 92 61.0
FORCES− 763 2.8 648 28.8
SCDENS− 592 2.2 322 43.4
Table 5
SUTTCHEN− 6812 28.4 61 43.2
DENLOC− 3843 16.0 49 45.8
SRFRCE− 2558 10.7 57 16.9
IMAGES− 1297 5.4 56 61.9
PARLINK− 1018 4.2 1015 98.1
FORCES− 768 3.2 701 36.6
SCDENS− 547 2.3 308 48.3
Table 6
Benchmark 2
Original 557.4 306.0 171.2
Optimized 263.0 158.7 107.0
Table 7
COUL4− 6070 45.1 362 50.1
SRFRCE− 3263 24.2 415 45.9
PARLST− 1523 11.3 1807 51.5
IMAGES− 782 5.8 464 90.3
NVTQ− H− 287 2.1 3 42.2
Table 8
SRFRCE− 2007 36.3 454 89.5
COUL4− 1289 23.3 380 86.0
IMAGES− 1247 22.6 463 84.4
NVTQ− H− 302 5.5 104 42.7
PARLST− 290 5.2 1078 99.1
Table 9
Benchmark 3
Original 274.2 156.7 80.4
Optimized 111.4 61.5 37.6
Table 10
COUL3− 8547 39.0 236 46.9
SRFRCE− 5887 26.9 246 43.7
FORCES− 1747 8.0 294 79.1
IMAGES− 1268 5.8 293 93.5
NVTQ− B1− 449 2.1 7 54.8
PARLST− 238 1.1 1239 51.3
Table 11
SRFRCE− 4584 39.5 279 98.2
COUL3− 3327 28.7 262 97.9
IMAGES− 2099 18.1 303 47.1
NVTQ− B1− 455 3.9 4 60.0
PARLST− 65 0.6 1026 98.5
Table 12
Benchmark 4
Original 448.4 221.6 133.1
Optimized 236.9 112.9 64.3
Benchmark 4
Shell model of water (768 atoms, 1024 sites, 1000 time steps); 256 molecules of water with a polarizable oxygen
atom. The electrostatic potential and forces are handled by the reaction field method. The execution profiles for the
original and optimized codes are presented in Tables 10 and 11, respectively and the benchmark results are shown
in Table 12.
Benchmark 5
Dynamic Shell model MgCl2 structure (768 atoms, 1280 sites, 500 time steps). The Ewald sum is used with
cubic periodic boundary conditions. The execution profiles for the original and optimized codes are presented in
Tables 13 and 14, respectively. The computation times for the original and tuned codes are presented in Table 15.
Table 13
EWALD2− 6780 39.6 261 48.8
SRFRCE− 4989 29.2 237 48.8
IMAGES− 824 4.8 331 83.6
FORCES− 642 3.8 433 54.4
EWALD1− 565 3.3 1271 39.1
EWALD3− 515 3.0 65 21.7
PARLST− 100 0.6 1472 52.5
Table 14
SRFRCE− 2881 40.4 282 98.1
EWALD2− 1948 27.3 283 98.5
IMAGES− 1214 17.0 327 85.3
EWALD1− 602 8.4 1266 47.3
EWALD3− 340 4.8 11 80.0
PARLST− 25 0.4 968 98.0
Table 15
Benchmark 5
Original 350.3 202.6 127.9
Optimized 145.8 104.3 82.5
Benchmark 8
Simulation of a model membrane with 2 membrane chains, 202 solute molecules and 2746 solvent molecules
(3148 atoms, 500 time steps). Only short ranged forces are used. Tables 16 and 17 show us the corresponding
execution profiles for the original and optimized codes. In Table 18 the results of the computation times for the
above cases are presented.
4. Conclusion
We have presented highly vectorised FORTRAN codes for the DL_POLY MD simulation package, which are
based on the link-cell algorithm. The subroutine “parlink” in DL_POLY, which constructs the neighbor list for the
identifying of all potential interaction pairs according to a predefined distance criteria, has now been vectorised.
The neighbor list to be updated periodically to account for the dynamic nature of the system (diffusion of particles,
density fluctuations, etc). The poor performance of “parlink” on a vector computer was due to the short vector
length of the central loops and in many cases this subroutine used a significant amount of CPU time. To vectorise the
“parlink” subroutine we have preliminarily constructed two new lists, which contain all atoms (by a “particle-cell”
scheme) in the appropriate cells and the corresponding 13 neighboring subcells. And by defining the maximum
Table 16
SRFRCE− 8043 61.1 71 43.3
IMAGES− 1018 7.7 187 76.4
PARLST− 787 6.0 1661 51.2
Table 17
SRFRCE− 3768 69.2 115 92.6
IMAGES− 1042 19.1 203 61.9
PARLST− 168 3.1 968 99.4
Table 18
Benchmark 8
Original 269.5 141.2 79.4
Optimized 111.0 69.7 42.3
Fig. 1. General sketched flow diagram for the optimized “link-cell” codes.
atom number over each cell we have then increased the vector length in the main loops of “parlink” subroutine.
The efficiency of the proposed codes has been tested on a Fujitsu VPP700/128E vector computer and for several
DL_POLY benchmark systems. The results agree with the original, and thus the optimized codes are verified with
respect to the tested systems. We have shown that in the constructing of neighbor list and calculating of atomic
forces our link-cell method is faster than the original code. The efficiency of the subroutines “parlink” has now
been significantly improved and its vectorisation has reached 96%. We have also tuned the “parlst” subroutine,
which constructs the neighbor list in accordance with the Brode–Ahlrichs scheme, which originally was highly
vectorisable. By introducing a new logical array, which checks the neighbor list capacity, we have reached a
full vectorisation of this subroutine. Further modifications of the forces subroutines has been performed, which
have improved the procedures of calculating of atomic forces and potentials. Originally, the forces subroutines
only calculate interactions for one specific central atom per call, while in the modified versions the conventional
double loop over each force has been inserted, so all interactions can now be calculated with just one call to these
subroutines.
Appendix A
.
.
.
00000239 v do i = 1,natms
00000240
00000241 v tx=xxx(i)
00000242 v ty=yyy(i)
00000243 v tz=zzz(i)
00000244
00000245 v uxx(i)=(rcell(1)*tx+rcell(4)*ty+rcell(7)*tz)+0.5d0
00000246 v uyy(i)=(rcell(2)*tx+rcell(5)*ty+rcell(8)*tz)+0.5d0
00000247 v uzz(i)=(rcell(3)*tx+rcell(6)*ty+rcell(9)*tz)+0.5d0
00000248
00000249 v enddo
00000251 c
00000252 c link neighbours
00000253
00000254 m do i = 1,natms
00000255
00000256 v ix = min(int(xdc*uxx(i)),ilx-1)
00000257 v iy = min(int(ydc*uyy(i)),ily-1)
00000258 v iz = min(int(zdc*uzz(i)),ilz-1)
00000259
00000260 v icell = 1+ix+ilx*(iy+ily*iz)
00000261
00000262 m6 j = lct(icell)
00000263 s6 lct(icell)=i
00000264 s6 link(i)=j
00000265
00000266 v enddo
00000268 c
00000269 c set control variables for loop over subcells
00000270
00000271 ix=1
00000272 iy=1
00000273 iz=1
00000274 c
00000275 c primary loop over subcells
00000276
00000277 s do ic = 1,ncells
00000278
00000279 m ii=lct(ic)
00000280 s if(ii.gt.0) then
00000281 c
00000282 c secondary loop over subcells
00000283
00000284 ik=0
00000285
00000286 do kc = 1,nsbcll
00000287
00000288 i=ii
00000289
00000290 cx = 0.d0
00000291 cy = 0.d0
00000292 cz = 0.d0
00000293 jx=ix+nix(kc)
00000294 jy=iy+niy(kc)
00000295 jz=iz+niz(kc)
00000296 c
00000297 c minimum image convention
00000298
00000299 if(jx.gt.ilx) then
00000300
00000301 jx = jx-ilx
00000302 cx = 1.d0
00000303
00000304 elseif(jx.lt.1) then
00000305
00000306 jx = jx+ilx
00000307 cx =-1.d0
00000308
00000309 endif
00000310
00000311 if(jy.gt.ily) then
00000312
00000313 jy = jy-ily
00000314 cy = 1.d0
00000315
00000316 elseif(jy.lt.1) then
00000317
00000318 jy = jy+ily
00000319 cy =-1.d0
00000320
00000321 endif
00000322
00000323 if(jz.gt.ilz) then
00000324
00000325 jz = jz-ilz
00000326 cz = 1.d0
00000327
00000328 elseif(jz.lt.1) then
00000329
00000330 jz = jz+ilz
00000331 cz =-1.d0
00000332
00000333 endif
00000334 c
00000335 c index of neighbouring cell
00000336
00000337 jc =jx+ilx*((jy-1)+ily*(jz-1))
00000338 j=lct(jc)
00000339 c
00000340 c ignore if empty
00000341
00000342 if(j.gt.0) then
00000343
00000344 200 continue
00000345 c
00000346 c test if site is of interest to this node
00000347
00000348 if(mod(i-1,mxnode).eq.idnode) then
00000350 c
00000351 c i’s index for this processor
00000352 ik = ((i-1)/mxnode)+1
00000353 c
00000354 c test if i is a frozen atom
00000355
00000356 lfrzi=(lstfrz(i).ne.0)
00000357
00000358 if(ic.eq.jc) j=link(i)
00000359 if(j.gt.0) then
00000360
00000361 300 continue
00000363 c
00000364 c test of frozen atom pairs
00000365
00000366 ldo=.true.
00000367 if(lfrzi) ldo=(lstfrz(j).eq.0)
00000368
00000369 if(ldo) then
00000370 c
00000371 c distance in real space : minimum image applied
00000372
00000373 sxd = uxx(j)-uxx(i)+cx
00000374 syd = uyy(j)-uyy(i)+cy
00000375 szd = uzz(j)-uzz(i)+cz
00000376
00000377 xd=cell(1)*sxd+cell(4)*syd+cell(7)*szd
00000378 yd=cell(2)*sxd+cell(5)*syd+cell(8)*szd
00000379 zd=cell(3)*sxd+cell(6)*syd+cell(9)*szd
00000380
00000381 rsq = xd**2+yd**2+zd**2
00000382 c
00000383 c test of distance
00000384 if(rcsq.gt.rsq) then
00000385 c
00000386 c test for excluded atom
00000387 c
00000388 linc = .true.
00000389 v do ixl =1,nexatm(ik)
00000390
00000391 v if(lexatm(ik,ixl).eq.j) linc=.false.
00000392
00000393 v enddo
00000394
00000395 if(linc) then
00000396
00000397 lentry(ik)=lentry(ik)+1
00000398
00000399 if(lentry(ik).gt.mxlist) then
00000400
00000401 ibig = max(ibig,lentry(ik))
00000402 lchk = .false.
00000403
00000404 else
00000405
00000406 list(ik,lentry(ik))=j
00000407
00000408 endif
00000409
00000410 endif
00000411
00000412 endif
00000413
00000414 endif
00000415
00000416 j=link(j)
00000417 if(j.ne.0) goto 300
00000418
00000419 endif
00000420
00000421 endif
00000422
00000423 j=lct(jc)
00000424 i=link(i)
00000425
00000426 if(i.ne.0) goto 200
00000427
00000428 endif
00000429
00000430 enddo
00000431
00000432 v endif
00000433
00000434 s ix=ix+1
00000435 s if(ix.gt.ilx) then
00000436
00000437 s ix=1
00000438 s iy=iy+1
00000439
00000440 s if(iy.gt.ily) then
00000441
00000442 s iy=1
00000443 s iz=iz+1
00000444
00000445 v endif
00000446
00000447 v endif
00000448
00000449 v enddo
Appendix B
.
.
.
00000289 m do i = 1,natms
00000290
00000291 v ix = min(int(xdc*uxx(i)),ilx-1)
00000292 v iy = min(int(ydc*uyy(i)),ily-1)
00000293 v iz = min(int(zdc*uzz(i)),ilz-1)
00000294
00000295 v icell = 1+ix+ilx*(iy+ily*iz)
00000296
00000297 v icellx(icell)=ix
00000298 v icelly(icell)=iy
00000299 v icellz(icell)=iz
00000300
00000301 v numcell(i)=icell
00000302 s8 numatm1(icell)=numatm1(icell)+1
00000303
00000304 v list1(icell,numatm1(icell))=i
00000305
00000306 v enddo
00000307
00000308 s do ic=1,ncells
00000309
00000310 v ix=icellx(ic)+1
00000311 v iy=icelly(ic)+1
00000312 v iz=icellz(ic)+1
00000313
00000314 v na2=0
00000315
00000316 s do kc=1,14
00000317
00000318 m jx=ix+nix(kc)
00000319 s jy=iy+niy(kc)
00000320 s jz=iz+niz(kc)
00000321 c
00000322 c minimum image convention
00000323
00000324 s if(jx.gt.ilx) then
00000325
00000326 m jx = jx-ilx
00000327
00000328 s elseif(jx.lt.1) then
00000329
00000330 m jx = jx+ilx
00000331
00000332 v endif
00000333
00000334 s if(jy.gt.ily) then
00000335
00000336 m jy = jy-ily
00000337
00000338 s elseif(jy.lt.1) then
00000339
00000340 m jy = jy+ily
00000341
00000342 v endif
00000343
00000344 s if(jz.gt.ilz) then
00000345
00000346 m jz = jz-ilz
00000347
00000348 s elseif(jz.lt.1) then
00000349
00000350 m jz = jz+ilz
00000351
00000352 v endif
00000353 c
00000354 c index of neighbouring cell
00000355
00000356 s jc =jx+ilx*((jy-1)+ily*(jz-1))
00000357
00000358 v do iii=1,numatm1(jc)
00000359
00000360 v na2=na2+1
00000361
00000362 v list2(ic,na2)=list1(jc,iii)
00000363
00000364 v enddo
00000365
00000366 v enddo
00000367
00000368 m numatm2(ic)=na2
00000369
00000370 v enddo
00000371 c
00000372 c maximum atom number over each cell
00000373
00000374 maxjjj=numatm2(1)
00000375
00000376 v do ic = 1,ncells
00000377
00000378 v maxjjj= max(maxjjj, numatm2(ic))
00000379
00000380 v enddo
00000381 c
00000382 c the main loop over subcells
00000383
00000384 do jjj = 1,maxjjj
00000385
00000386 v do i = 1,natms
00000387
00000388 v ic=numcell(i)
00000389
00000390 c test if site is of interest to this node
00000391
00000392 v if(mod(i-1,mxnode).eq.idnode) then
00000393
00000395
00000396 v ik = ((i-1)/mxnode)+1
00000397
00000398 v if(list2(ic,jjj).ne.0) then
00000399
00000400 v j=list2(ic,jjj)
00000401
00000402 v if(numcell(i).eq.numcell(j).and.i.le.j) goto 5000
00000403
00000405
00000406 v lfrzi=(lstfrz(i).ne.0)
00000407 c
00000409
00000410 v ldo=.true.
00000411
00000412 v if(lfrzi) ldo=(lstfrz(j).eq.0)
00000413
00000414 v if(ldo) then
00000415 c
00000417
00000418 v sxd = uxx(j)-uxx(i)
00000419 v syd = uyy(j)-uyy(i)
00000420 v szd = uzz(j)-uzz(i)
00000421
00000422 v sxd2=sxd-int(2*sxd)
00000423 v syd2=syd-int(2*syd)
00000424 v szd2=szd-int(2*szd)
00000425
00000426 v xd=cell(1)*sxd2+cell(4)*syd2+cell(7)*szd2
00000427 v yd=cell(2)*sxd2+cell(5)*syd2+cell(8)*szd2
00000428 v zd=cell(3)*sxd2+cell(6)*syd2+cell(9)*szd2
00000429
00000430 v rsq = xd**2+yd**2+zd**2
00000431 c
00000433
00000434 v if(rcsq.gt.rsq) then
00000435 c
00000437
00000438 v linc = .true.
00000439
00000441
00000443
00000444 v enddo
00000445
00000446 v if(linc) then
00000447
00000448 v lentry_tmp(i)=lentry_tmp(i)+1
00000449
00000450 v if(lentry_tmp(i).gt.mxlist) then
00000451
00000452 v ibig = max(ibig,lentry_tmp(i))
00000453
00000454 v lchk = .false.
00000455
00000456 v else
00000457
00000458 v list_tmp(i,lentry_tmp(i))=j
00000459
00000460 v endif
00000461
00000462 v endif
00000463
00000464 v endif
00000465
00000466 v endif
00000467
00000468 5000 continue
00000469
00000470 v endif
00000471
00000472 v endif
00000473
00000474 v enddo
00000475
00000476 enddo
00000477
00000478 s do i=1,natms
00000479 v if(mod(i-1,mxnode).eq.idnode) then
00000480 v ik = ((i-1)/mxnode)+1
00000481 v lentry(ik)=lentry_tmp(i)
00000482 v do jjj=1,lentry_tmp(i)
00000483 v list(ik,jjj)=list_tmp(i,jjj)
00000484 v enddo
00000485 v endif
00000486 v enddo
Appendix C
.
.
.
00000369 c
00000370 c the main loop over subcells
00000371
00000372 do jjj = 1,maxjjj
00000373
00000374 v do i = iatm1,iatm2
00000375
00000376 v ic=numcell(i)
00000377
00000378 v if(list2(ic,jjj).ne.0) then
00000379
00000380 v j=list2(ic,jjj)
00000381
00000382 v if(numcell(i).eq.numcell(j).and.i.le.j) goto 5000
00000383 c
00000385
00000386 v ik=i-iatm1+1
00000387 c
00000389
00000391 c
00000393
00000394 v ldo=.true.
00000395
00000396 v if(lfrzi) ldo=(lstfrz(j).eq.0)
00000397
00000399 c
00000401
00000402 v sxd = uxx(j)-uxx(i)
00000403 v syd = uyy(j)-uyy(i)
00000404 v szd = uzz(j)-uzz(i)
00000405
00000406 v sxd2=sxd-int(2*sxd)
00000407 v syd2=syd-int(2*syd)
00000408 v szd2=szd-int(2*szd)
00000409
00000410 v xd=cell(1)*sxd2+cell(4)*syd2+cell(7)*szd2
00000411 v yd=cell(2)*sxd2+cell(5)*syd2+cell(8)*szd2
00000412 v zd=cell(3)*sxd2+cell(6)*syd2+cell(9)*szd2
00000413
00000414 v rsq = xd**2+yd**2+zd**2
00000415 c
00000417
00000418 v if(rcsq.gt.rsq) then
00000419 c
00000421
00000422 v linc = .true.
00000423
00000425
00000427
00000428 v enddo
00000429
00000430 v if(linc) then
00000431
00000432 v lentry(ik)=lentry(ik)+1
00000433
00000434 v if(lentry(ik).gt.mxlist) then
00000435
00000436 v ibig = max(ibig,lentry(ik))
00000437
00000438 v lchk = .false.
00000439
00000440 v else
00000441
00000442 v list(ik,lentry(ik))=j
00000443
00000444 v endif
00000445
00000446 v endif
00000447
00000448 v endif
00000449
00000450 v endif
00000451
00000452 5000 continue
00000453
00000454 v endif
00000455
00000456 v enddo
00000457
00000458 enddo
Appendix D
.
.
.
00000078 c
00000079 c outer loop over atoms
00000080
00000081 do m=1,mpm2
00000082
00000083 if(m.gt.npm2)last=min(mpm2,iatm2)
00000085 c
00000086 c inner loop over atoms
00000087
00000088 ii=0
00000089
00000090 v do i=idnode+1,last,mxnode
00000091 c
00000094 c calculate atom indices
00000095
00000096 v j=i+m
00000097 v if(j.gt.natms)j=j-natms
00000098
00000099 v ii=ii+1
00000100 v xdf(ii)=xxx(i)-xxx(j)
00000101 v ydf(ii)=yyy(i)-yyy(j)
00000102 v zdf(ii)=zzz(i)-zzz(j)
00000103
00000104 v enddo
00000106 c
00000107 c apply minimum image convention
00000108
00000109 call images(imcon,0,1,ii,cell,xdf,ydf,zdf)
00000111 c
00000112 c allocate atoms to neighbour list
00000113
00000114 ii=0
00000115
00000116 m do i=idnode+1,last,mxnode
00000118
00000120 c
00000122
00000123 v j=i+m
00000125
00000126 v ii =ii+1
00000127 c
00000128 c reject atoms in excluded pair list
00000129
00000130 v if((nexatm(ii).gt.0).and.(lexatm(ii,noxatm(ii)).eq.j))
00000131 x then
00000132
00000133 v noxatm(ii)=min(noxatm(ii)+1,nexatm(ii))
00000134 c
00000135 c reject frozen atom pairs
00000136
00000137 v else
00000138
00000139 v ldo = .true.
00000140 v if(lfrzi) ldo = (lstfrz(j).eq.0)
00000142 c
00000144 c calculate interatomic distance
00000145
00000146 v rsq=xdf(ii)*xdf(ii)+ydf(ii)*ydf(ii)+zdf(ii)*zdf(ii)
00000148 c
00000149 c running check of neighbour list array capacity
00000150
00000151 v if(rsq.lt.rclim)then
00000152
00000153 v lentry(ii)=lentry(ii)+1
00000154
00000155 v if(lentry(ii).gt.mxlist)then
00000156
00000158 m lchk=.false.
00000159
00000160 v ibig = max(lentry(ii),ibig)
00000161
00000162 v endif
00000164 c
00000165 c compile neighbour list array
00000166
00000168 s if(lchk)then
00000169
00000170 v list(ii,lentry(ii))=j
00000172
00000173 v endif
00000174
00000175 v endif
00000176
00000177 v endif
00000178
00000179 v endif
00000180
00000181 v enddo
00000182
00000183 enddo
Appendix E
.
.
.
00000092 c
00000093 c outer loop over atoms
00000094
00000095 s do m=1,mpm2
00000096
00000097 v if(m.gt.npm2)last=min(mpm2,iatm2)
00000099 c
00000100 c inner loop over atoms
00000101
00000102 m ii=0
00000103
00000107 c
00000109
00000110 v j=i+m
00000112
00000113 v ii=ii+1
00000114 v xdf(ii)=xxx(i)-xxx(j)
00000115 v ydf(ii)=yyy(i)-yyy(j)
00000116 v zdf(ii)=zzz(i)-zzz(j)
00000117
00000118 v enddo
00000120 c
00000121 c apply minimum image convention
00000122
00000123 m call images(imcon,0,1,ii,cell,xdf,ydf,zdf)
00000125 c
00000126 c allocate atoms to neighbour list
00000127
00000128 s ii=0
00000129
00000132
00000134 c
00000136
00000137 v j=i+m
00000139
00000140 v ii =ii+1
00000141 c
00000142 c reject atoms in excluded pair list
00000143
00000144 v if((nexatm(ii).gt.0).and.(lexatm(ii,noxatm(ii)).eq.j))
00000145 x then
00000146
00000147 v noxatm(ii)=min(noxatm(ii)+1,nexatm(ii))
00000148 c
00000149 c reject frozen atom pairs
00000150
00000151 v else
00000152
00000153 v ldo = .true.
00000154 v if(lfrzi) ldo = (lstfrz(j).eq.0)
00000156
00000158 c calculate interatomic distance
00000159
00000160 v rsq=xdf(ii)*xdf(ii)+ydf(ii)*ydf(ii)+zdf(ii)*zdf(ii)
00000162 c
00000163 c running check of neighbour list array capacity
00000164
00000165 v if(rsq.lt.rclim)then
00000166
00000167 v lentry(ii)=lentry(ii)+1
00000168
00000169 v if(lentry(ii).gt.mxlist)then
00000172
00000173 v lchk1(ii) = .false.
00000174
00000175 v lchk=lchk1(ii)
00000176
00000177 v ibig = max(lentry(ii),ibig)
00000178
00000179 v endif
00000181 c
00000182 c compile neighbour list array
00000183
00000186 v if(lchk1(ii)) then
00000187
00000188 v list(ii,lentry(ii))=j
00000189
00000190 v endif
00000193
00000194 v endif
00000195
00000196 v endif
00000197
00000198 v endif
00000199
00000200 v enddo
00000201
00000202 v enddo
References
[1] W. Smith, T.R. Forester, Molecular Graphics 14 (1996) 136.

[2] W. Smith, T.R. Forester, The DL_POLY User Manual. Version 2.11 (Daresbury Laboratory, 1999).
See also www.dl.ac.uk/TCS/Software/DL_POLY.
[3] W. Smith, T.R. Forester, Comput. Phys. Commun. 79 (1994) 52.
[4] W. Smith, T.R. Forester, Comput. Phys. Commun. 79 (1994) 63.
[5] T.R. Forester, W. Smith, Molecular Simulation 13 (1994) 195.
[6] W. Smith, Comput. Phys. Commun. 67 (1992) 392.
[7] M.P. Allen, D.J. Tildesley, Computer Simulation of Liquids (Clarendon Press, Oxford, 1989).
[8] S. Brode, R. Ahlrichs, Comput. Phys. Commun. 42 (1986) 51.
[9] R.W. Hockney, J.W. Eastwood, Computer Simulation Using Particles (McGraw-Hill International, 1981).
[10] D.M. Heyes, W. Smith, CCP5 Quarterly 26 (68) (1987).
[11] D.C. Rapaport, Comp. Phys. Rep. 9 (1988) 1.
[12] G.S. Grest, B. Dunweg, K. Kremer, private communication;
K. Kremer, G.S. Grest, I. Carmesin, Phys. Rev. Lett. 61 (1988) 566.
[13] S. Liem, UMIST-FECIT Collaboration Report, Private communication (1998).

A Highly Vectorised "Link-Cell" FORTRAN Code For The DL - POLY Molecular Dynamics Simulation Package

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Highly Vectorised "Link-Cell" FORTRAN Code For The DL - POLY Molecular Dynamics Simulation Package

Uploaded by

Copyright:

Available Formats

Computer Physics Communications 125 (2000) 167–192

A highly vectorised “link-cell” FORTRAN code for the

Received 7 June 1999

2. Optimization of neighbor list generation

2.1. Optimization of “parlink” code

(1) MD cell is divided into contiguous cubic subcells (icell);

2.2. Optimization of “parlst” code

3. The DL_POLY benchmark

[1] W. Smith, T.R. Forester, Molecular Graphics 14 (1996) 136.

You might also like