Professional Documents
Culture Documents
The SIESTA Method For Ab Initio Order-N Materials Simulation
The SIESTA Method For Ab Initio Order-N Materials Simulation
net/publication/39402943
CITATIONS READS
9,533 2,134
7 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Daniel Sánchez-Portal on 30 May 2014.
2
Department of Earth Sciences, University of Cambridge,
Downing St., Cambridge CB2 3EQ, United Kingdom
3
Department of Chemistry, Imperial College of Science,
Technology and Medicine, South Kensington SW7 2AY, United Kingdom
4
Departamento de Fı́sica de la Materia Condensada,
Universidad del Paı́s Vasco, Apt. 644, 48080 Bilbao, Spain
5
Institut de Physique, Bâtiment B5, Université de Liège, B-4000 Sart-Tilman, Belgium
6
Institut de Ciència de Materials de Barcelona, CSIC,
Campus de la UAB, Bellaterra, 08193 Barcelona, Spain
7
Dep. de Fı́sica de Materiales and DIPC, Facultad de Quı́mica, UPV/EHU, Apt. 1072, 20080 Donostia, Spain
(Dated: February 1, 2008)
We have developed and implemented a self-consistent density functional method using standard
norm-conserving pseudopotentials and a flexible, numerical LCAO basis set, which includes multiple-
zeta and polarization orbitals. Exchange and correlation are treated with the local spin density or
generalized gradient approximations. The basis functions and the electron density are projected on
a real-space grid, in order to calculate the Hartree and exchange-correlation potentials and matrix
elements, with a number of operations that scales linearly with the size of the system. We use a
modified energy functional, whose minimization produces orthogonal wavefunctions and the same
energy and density as the Kohn-Sham energy functional, without the need of an explicit orthogonal-
ization. Additionally, using localized Wannier-like electron wavefunctions allows the computation
time and memory, required to minimize the energy, to also scale linearly with the size of the system.
Forces and stresses are also calculated efficiently and accurately, thus allowing structural relaxation
and molecular dynamics simulations.
I. INTRODUCTION ized basis set appears to be the natural choice. One pro-
posed approach are the ‘blips’ of Hernandez and Gillan10 ,
regularly-spaced Gaussian-like splines that can be sys-
As the improvements in computer hardware and soft- tematically increased, in the spirit of finite-element meth-
ware allow the simulation of molecules and materials ods, although at a considerable computational cost.
with an increasing number of atoms N , the use of so- We have developed a fully self-consistent DFT, based
called order-N algorithms, in which the computer time on a flexible linear combination of atomic orbitals
and memory scales linearly with the simulated system (LCAO) basis set, with essentially perfect O(N ) scal-
size, becomes increasingly important. These O(N ) meth- ing. It allows extremely fast simulations using mini-
ods were developed during the 1970’s and 80’s for long mal basis sets and very accurate calculations with com-
range forces1 and empirical interatomic potentials2 but plete multiple-zeta and polarized bases, depending on the
only in the last 5-10 years for the much more complex required accuracy and available computational power.
quantum-mechanical methods, in which atomic forces are In previous papers11,12 we have described preliminary
obtained by solving the interaction of ions and electrons versions of this method, that we call Siesta (Spanish
together3 . Even among quantum mechanical methods, Initiative for Electronic Simulations with Thousands of
there are very different levels of approximation: empir- Atoms). There is also a review13 of the tens of stud-
ical or semiempirical orthogonal tight-binding methods ies performed with it, in a wide variety of systems, like
are the simplest ones4,5 ; ‘ab-initio’ nonorthogonal-tight- metallic surfaces, nanotubes, and biomolecules. In this
binding and nonself-consistent Harris-functional meth- work we present a more complete description of the
ods are next6,7 ; and fully self-consistent density func- method, as well as some important improvements.
tional theory (DFT) methods are the most complex Apart from that of Born and Oppenheimer, the most
and reliable8 . The implementation of O(N ) methods in basic approximations concern the treatment of exchange
quantum-mechanical simulations has also followed these and correlation, and the use of pseudopotentials. Ex-
steps, with several methods already well established change and correlation (XC) are treated within Kohn-
within the tight-binding formalism5 , but much less so Sham DFT14 . We allow for both the local (spin) den-
in self-consistent DFT9 . The latter also require, in addi- sity approximation15 (LDA/LSD) or the generalized gra-
tion to solving Schrödinger’s equation, the determina- dient approximation16 (GGA). We use standard norm-
tion of the self-consistent Hamiltonian in O(N ) itera- conserving pseudopotentials17,18 in their fully non-local
tions. While this is difficult using plane waves, a local- form19 . We also include scalar-relativistic effects and the
2
nonlinear partial-core-correction to treat exchange and V H and V xc are the Hartree and XC potentials for the
correlation in the core region20 . pseudo-valence charge density, and we are using atomic
The Siesta code has been already tested and applied units (e = h̄ = me = 1) throughout all of this paper.
to dozens of systems and a variety of properties13 . There- The local part of the pseudopotential Vlocal (r) is in
fore, we will just illustrate here the convergence of a principle arbitrary, but it must join the semilocal poten-
few characteristic magnitudes of silicon, the architypi- tials Vl (r) which, by construction, all become equal to the
cal system of the field, with respect to the main preci- (unscreened) all-electron potential beyond the pseudopo-
sion parameters that characterize our method: basis size tential core radius rcore . Thus, δVl (r) = 0 for r > rcore .
(number of atomic basis orbitals); basis range (radius of Ramer and Rappe have proposed that Vlocal (r) be opti-
the basis orbitals); fineness of the real-space integration mized for transferability25 , but most plane wave schemes
grid; and confinement radius of the Wannier-like electron make it equal to one of the Vl (r)’s for reasons of effi-
states. Other parameters, like the k-sampling integration ciency. Our case is different because Vlocal (r) is the only
grid, are common to all similar methods and we will not pseudopotential part that needs to be represented in the
discuss their convergence here. real space grid, while the matrix elements of the non-
local part V̂KB are cheaply and accurately calculated by
two-center integrals. Therefore, we optimize Vlocal (r) for
II. PSEUDOPOTENTIAL smoothness, making it equal to the potential created by
a positive charge distribution of the form26
Although the use of pseudopotentials is not strictly
necessary with atomic basis sets, we find them con- ρlocal (r) ∝ exp[−(sinh(abr)/ sinh(b))2 ], (7)
venient to get rid of the core electrons and, more where a and b are chosen to provide simultane-
importantly, to allow for the expansion of a smooth ously optimal real-space localization and reciprocal-space
(pseudo)charge density on a uniform spatial grid. The convergence27. After some numerical tests we have taken
theory and usage of first principles norm-conserving b =1 and a =1.82/rcore. Figure 1 shows Vlocal (r) for sil-
pseudopotentials17 is already well established. Siesta icon.
reads them in semilocal form (a different radial potential Since Vl (r) = Vlocal (r) outside rcore , χKB
ln (r) is strictly
Vl (r) for each angular momentum l, optionally gener- zero beyond that radius, irrespective of the value of ǫln 28 .
ated scalar-relativistically21,22 ) from a data file that users Generally it is sufficient to have a single projector χKB lm
can fill with their preferred choice. We generally use the for each angular momentum (i.e. a single term in the
Troullier-Martins parameterization23. We transform this sum on n). In this case we follow the normal practice
semilocal form into the fully non-local form proposed by of making ǫln equal to the valence atomic eigenvalue ǫl ,
Kleinman and Bylander (KB)19 : and the function ϕl (r) in Eq. 4 is identical to the cor-
V̂ P S = Vlocal (r) + V̂ KB (1) responding eigenstate ψl (r). In some cases, particularly
for alkaline metals, alkaline earths, and transition met-
als of the first few columns, we have sometimes found
lKB
l Nl KB
max
X X X it necessary to include the semicore states together with
KB
V̂ = |χKB KB KB
lmn ivln hχlmn | (2) the valence states29 . In these cases, we also include two
l=0 m=−l n=1 independent KB projectors, one for the semicore and one
for the valence states. However, our pseudopotentials are
KB
vln =< ϕln |δVl (r)|ϕln > (3) still norm-conserving rather than “ultrasoft”30 . This is
because, in our case, it is only the electron density that
where δVl (r) = Vl (r) − Vlocal (r). χKB
lmn (r) = needs to be accurately represented in a real-space grid,
χKB
ln (r)Ylm (r̂) (with Ylm (r̂) a spherical harmonic) are the rather than each wavefunction. Therefore, the ultrasoft
KB projection functions pseudopotential formalism does not imply in Siesta the
same savings as it does in PW schemes. Also, since the
χKB
ln (r) = δVl (r)ϕln (r). (4)
non-local part of the pseudopotential is a relatively cheap
The functions ϕln are obtained from the eigenstates ψln operator within Siesta, we generally (but not necessar-
KB
of the semilocal pseudopotential (screened by the pseudo- ily) use a larger than usual value of lmax in Eq. (2), mak-
valence charge density) at energy ǫln using the orthogo- ing it one unit larger than the lmax of the basis functions.
nalization scheme proposed by Blöchl24 :
n−1
X < ϕln′ |δVl (r)|ψln > III. BASIS SET
ϕln (r) = ψln (r) − ϕln′ (r) (5)
< ϕln′ |δVl (r)|ϕln′ >
n′ =1
Order-N methods rely heavily on the sparsity of the
Hamiltonian and overlap matrices. This sparsity re-
1 d2 l(l + 1) quires either the neglect of matrix elements that are small
[− 2
r + + Vl (r) +
2r dr 2r2 enough or the use of strictly confined basis orbitals, i.e.,
+ V H (r) + V xc (r)]ψln (r) = ǫln ψln (r) (6) orbitals that are zero beyond a certain radius7 . We have
3
1 d2
l(l + 1)
− r+ + Vl (r) φl (r) = (ǫl + δǫl )φl (r)
2r dr2 2r2
(9)
instead of φ2ζ 1ζ 2ζ
l thus defined, we use φl − φl , which is 2.50
s
zero beyond rl , to reduce the number of nonzero matrix
elements, without any loss of variational freedom. Si
SZ (4)
To achieve well converged results, in addition to the 2.00
atomic valence orbitals, it is generally necessary to also
include polarization orbitals, to account for the defor-
Energy (eV)
DZ (8)
1.50
mation induced by bond formation. Again, using pseu- TZ (12)
doatomic orbitals of higher angular momentum is fre-
quently unsatisfactory, because they tend to be too ex- 1.00
SZP (9)
tended, or even unbound. Instead, consider a valence
DZP (13)
pseudoatomic orbital φlm (r) = φl (r)Ylm (r̂), such that TZDP (22)
there are no valence orbitals with angular momentum 0.50 TZP (17)
TZTP (27)
l + 1. To polarize it, we apply a small electric field E in TZTPF (34)
the z-direction. Using first-order perturbation theory 0.00
5 10 15 20 25 30 35 PW cutoff (Ry)
(25) (71) (130) (201) (280) (369) (464) PW basis size
(H − E)δφ = −(δH − δE)φ, (11)
where δH = Ez and δE = hφ|δH|φi = 0 because δH is FIG. 2: Comparison of convergence of the total energy with
odd. Selection rules imply that the resulting perturbed respect to the sizes of a plane wave basis set and of the LCAO
orbital will only have components with l′ = l±1, m′ = m: basis set used by Siesta. The curve shows the total energy
per atom of silicon versus the cutoff of a plane wave basis, cal-
δHφlm (r) = (Er cos(θ)) (φl (r)Ylm (r̂)) culated with a program independent of Siesta, which uses the
same pseudopotential. The arrows indicate the energies ob-
= Erφl (r)(cl−1 Yl−1,m + cl+1 Yl+1,m ) (12)
tained with different LCAO basis sets, calculated with Siesta,
and the plane wave cutoffs that yield the same energies. The
and numbers in parentheses indicate the basis sizes, i.e. the num-
ber of atomic orbitals or plane waves of each basis set. SZ:
δφlm (r) = ϕl−1 (r)Yl−1,m (r̂) + ϕl+1 (r)Yl+1,m (r̂). (13) single-ζ (valence s and p orbitals); DZ: double-ζ; TZ: triple-ζ;
DZP: double-ζ valence orbitals plus single-ζ polarization d or-
Since in general there will already be orbitals with an- bitals; TZP: triple-ζ valence plus single-ζ polarization; TZDP:
gular momentum l − 1 in the basis set, we select the triple-ζ valence plus double-ζ polarization; TZTP: triple-ζ va-
l + 1 component by substituting (12) and (13) in (11), lence plus triple-ζ polarization; TZTPF: same as TZTP plus
∗
multiplying by Yl+1,m (r̂) and integrating over angular extra single-ζ polarization f orbitals.
variables. Thus we obtain the equation
1 d2 (l + 1)(l + 2)
[− r + in Car-Parrinello molecular dynamics simulations, while
2r dr2 2r2 DZP sets are comparable to the cutoffs used in geom-
+ Vl (r) − El ]ϕl+1 (r) = −rφl (r) (14) etry relaxations and energy comparisons. As expected,
the LCAO is far more efficient, tipically by a factor of
where we have also eliminated the factors E and cl+1 ,
10 to 20, in terms of number of basis orbitals. This ef-
which affect only the normalization of ϕl+1 . The po-
ficiency must be balanced against the faster algorithms
larization orbitals are then added to the basis set:
available for plane waves, and our main motivation for
φl+1,m (r) = N ϕl+1 (r)Yl+1,m (r̂), where N is a normal-
using an LCAO basis is its suitability for O(N ) meth-
ization constant.
ods. Still, we have generally found that, even without
We have found that the previously described proce-
using the O(N ) functional, Siesta is considerably faster
dures generate reasonable minimal single-ζ (SZ) basis
than a plane wave calculation of similar quality.
sets, appropriate for semiquantitative simulations, and
double-ζ plus polarization (DZP) basis sets that yield Figure 3 shows the convergence of the total energy
high quality results for most of the systems studied. We curve of silicon, as a function of lattice parameter, for
thus refer to DZP as the ‘standard’ basis, because it usu- different basis sizes, and table I summarizes the same
ally represents a good balance between well converged information numerically. It can be seen that the ‘stan-
results and a reasonable computational cost. In some dard’ DZP basis offers already quite well converged re-
cases (typically alkali and some transition metals), semi- sults, comparable to those used in practice in most plane
core states also need to be included for good quality re- wave calculations.
sults. More recently36 , we have obtained extremely ef- Figure 4 shows the dependence of the lattice constant,
ficient basis sets optimized variationally in molecules or bulk modulus, and cohesive energy of bulk silicon with
solids. Figure 2 shows the performance of these atomic the range of the basis orbitals. It shows that a cutoff
basis sets compared to plane waves, using the same pseu- radius of 3 Å for both s and p orbitals yields already very
dopotentials and geometries. It may be seen that the SZ well converged results, specially when using a ‘standard’
bases are comparable to planewave cutoffs typically used DZP basis.
5
5.6
2.0 SZ
5.5
Exp
5.4
a (Å)
PW
Total Energy (eV)
1.5 5.3
SZ
DZP
DZ
TZ 5.2
SZP
1.0 DZP 5.1
TZP
TZDP
TZTP
TZTPF
PW
0.5 minima 175
B (GPa)
150
DZP
0.0 125
5.0 5.2 5.4 5.6 5.8 6.0 Exp
100 PW
Lattice Constant (Å) SZ
75
FIG. 3: Total energy per atom versus lattice constant for
bulk silicon, using different basis sets, noted as in Fig. 2. PW
refers to a very well converged (50 Ry cutoff) plane wave PW
5
E c (eV)
calculation. The dotted line joins the minima of the different
DZP
curves. Exp
SZ
4
TABLE I: Comparisons of the lattice constant a, bulk mod-
ulus B, and cohesive energy Ec for bulk Si, obtained with
different basis sets. The basis notation is as in Fig. 2. PW 3
refers to a 50 Ry-cutoff plane wave calculation. The LAPW
results were taken from ref. 37, and the experimental values
from ref. 38. 5.0 6.0 7.0 8.0
Basis a (Å) B (GPa) Ec (eV) cutoff radii (a.u.)
SZ 5.521 88.7 4.722
DZ 5.465 96.0 4.841 FIG. 4: Dependence of the lattice constant, bulk modulus,
TZ 5.453 98.4 4.908 and cohesive energy of bulk silicon with the cutoff radius of
SZP 5.424 97.8 5.227 the basis orbitals. The s and p orbital radii have been made
equal in this case, to simplify the plot. PW refers to a well
DZP 5.389 96.6 5.329 converged plane wave calculation with the same pseudopoten-
TZP 5.387 97.5 5.335 tial.
TZDP 5.389 96.0 5.340
TZTP 5.387 96.0 5.342
TZTPF 5.385 95.4 5.359 where T̂ = − 21 ∇2 is the kinetic energy operator, I is an
PW 5.384 95.9 5.369 atom index, V H (r) and V xc (r) are the total Hartree and
LAPW 5.41 96 5.28 XC potentials, and VIlocal (r) and V̂IKB are the local and
Expt. 5.43 98.8 4.63 non-local (Kleinman-Bylander) parts of the pseudopo-
tential of atom I.
In order to eliminate the long range of VIlocal , we screen
it with the potential VIatom , created by an atomic electron
IV. ELECTRON HAMILTONIAN
density ρatom
I , constructed by populating the basis func-
tions with appropriate valence atomic charges. Notice
Within the non-local-pseudopotential approximation, that, since the atomic basis orbitals are zero beyond the
the standard Kohn-Sham one-electron Hamiltonian may cutoff radius rIc = maxl (rIl
c
), the screened ‘neutral-atom’
be written as (NA) potential VIN A ≡ VIlocal +VIatom is also zero beyond
X X this radius7 (see Fig. 1). Now let δρ(r) be the differ-
Ĥ = T̂ + VIlocal (r) + V̂IKB + V H (r) + V xc (r) ence between the self-consistent electron Pdensity ρ(r) and
I I the sum of atomic densities ρatom = I ρatom I , and let
(15) δV H (r) be the electrostatic potential generated by δρ(r),
6
which integrates to zero and is usually much smaller than elements. We now substitute in (18) the expansion of a
ρ(r). Then the total Hamiltonian may be rewritten as plane wave in spherical harmonics40
l
∞ X
X X
Ĥ = T̂ + V̂IKB + VIN A (r) + δV H (r) + V xc (r) ikr
X
I I
e = 4πil jl (kr)Ylm
∗
(k̂)Ylm (r̂), (22)
(16) l=0 m=−l
The matrix elements of the first two terms involve only to obtain
two-center integrals which are calculated in reciprocal lX
max l
X
space and tabulated as a function of interatomic distance. ψ(k) = ψlm (k)Ylm (k̂), (23)
The remaining terms involve potentials which are calcu- l=0 m=−l
lated on a three-dimensional real-space grid. We consider
these two approaches in detail in the following sections. r ∞
2
Z
ψlm (k) = (−i)l r2 drjl (kr)ψlm (r). (24)
π 0
V. TWO-CENTER INTEGRALS
Substituting now (23) and (22) into (19) we obtain
The overlap matrix and the largest part of the Hamilto- 2lX l
max
nian matrix elements are given by two-center integrals39. X
S(R) = Slm (R)Ylm (R̂) (25)
We calculate these integrals in Fourier space, as proposed
l=0 m=−l
by Sankey and Niklewski7 , but we use some implemen-
tation details explained in this section. Let us consider where
first overlap integrals of the form X X
Z Slm (R) = Gl1 m1 ,l2 m2 ,lm Sl1 m1 ,l2 m2 ,l (R), (26)
S(R) ≡ hψ1 |ψ2 i = ψ1∗ (r)ψ2 (r − R)dr, (17) l1 m 1 l2 m 2
π 2π Nθ Nϕ
This is clearly true for basis functions and KB projec- Z Z
1 X 1 X
tors, which contain a single spherical harmonic, and also sin θdθ dϕ → 4π wi sin θi (30)
0 0 Nθ i=1 Nϕ j=1
for functions like xψ(r), which appear in dipole matrix
7
with Nϕ = 1 + 3lmax , Nθ = 1 + int(3lmax /2), and the where cµi = hφ̃µ |ψi i and φ̃µ is the dual orbital of φµ :
points cos θi and weights wi are calculated as described hφ̃µ |φν i = δµν . We use the compact index notation
in ref. 31. This quadrature is exact in equation (27) for µ ≡ {Ilmn} for the basis orbitals, Eq. (8). The elec-
spherical harmonics Ylm (real or complex) of l ≤ lmax , tron density is then
and it can be used also to find the expansion of ψ(r) in X
spherical harmonics (eq. (21)). ρ(r) = ni |ψi (r)|2 (33)
The coefficients Gl1 m1 ,l2 m2 ,lm are universal and they i
can be calculated and stored once and for all. The func-
tions Sl1 m1 ,l2 m2 ,l (R) depend, of course, on the functions where ni is the occupation of state ψi . If we substitute
ψ1,2 (r) being integrated. For each pair of functions, they (32) into (33) and define a density matrix
can be calculated and stored in a fine radial grid Ri , up X
to the maximum distance Rmax = r1c + r2c at which ψ1 ρµν = cµi ni ciν , (34)
and ψ2 overlap. Their value at an arbitrary distance R i
can then be obtained very accurately using a spline in-
terpolation. where ciν ≡ c∗νi , the electron density can be rewritten as
Kinetic matrix elements T (R) ≡ hψ1∗ | − 12 ∇2 |ψ2 i can X
be obtained in exactly the same way, except for an extra ρ(r) = ρµν φ∗ν (r)φµ (r) (35)
µν
factor k 2 in Eq. (28):
Z ∞
l1 −l2 −l 1 4 We use the notation φ∗µ for generality, despite our use
Tl1 m, l2 m2 ,l (R) = 4πi k dkjl (kR) of real basis orbitals in practice. Then, to calculate the
0 2
density at a given grid point, we first find all the atomic
∗
× i−l1 ψ1,l 1 m1
(k)il2 ψ2,l2 m2 (k). (31)
basis orbitals, Eq. (8), at that point, interpolating the
Since we frequently use basis orbitals with a kink7 , we radial part from numerical tables, and then we use (35)
need rather fine radial grids to obtain accurate kinetic to calculate the density. Notice that only a small num-
matrix elements, and we typically use grid cutoffs of more ber of basis orbitals are non-zero at a given grid point,
than 2000 Ry for this purpose. Once obtained, the fine so that the calculation of the density can be performed
grid does not penalize the execution time, because the in O(N ) operations, once ρµν is known. The storage of
interpolation effort is independent of the number of grid the orbital values at the grid points can be one of the
points. It also affects very marginally the storage require- most expensive parts of the program in terms of memory
ments, because of the one-dimensional character of the usage. Hence, an option is included to calculate and use
tables. However, even though it needs to be done only these terms on the fly, in the the spirit of a direct-SCF
once, the calculation of the radial integrals (24), (28), calculation. The calculation of ρµν itself with Eq. (34)
and (31) is not negligible if performed unwisely. We have does not scale linearly with the system size, requiring in-
developed a special fast radial Fourier transform for this stead the use of special O(N ) techniques to be described
purpose, as explained in appendix B. below. However, notice that in order to calculate the
Dipole matrix elements, such as hψ1 |x|ψ2 i, can also density, only the matrix elements ρµν for which φµ and
be obtained easily by defining a new function χ1 (r) ≡ φν overlap are required, and they can therefore be stored
xψ1 (r), expanding it using (21), and computing hχ1 |ψ2 i as a sparse matrix of O(N ) size. Once the valence den-
as explained above (with the precaution of using lmax + 1 sity is available in the grid, we add to it, if necessary,
instead of lmax ). the non-local core correction20 , a spherical charge den-
sity intended to simulate the atomic cores, which is also
interpolated from a radial grid. With it, we find the
VI. GRID INTEGRALS exchange and correlation potential V xc (r), trivially in
the LDA and using the method described in ref. 42 for
The matrix elements of the last three terms of Eq. (16) the GGA. To calculate δV H (r), we first find ρatom (r)
involve potentials which are calculated on a real-space at the grid points, as a sum of spherical atomic densi-
grid. The fineness of this grid is controlled by a ‘grid cut- ties (also interpolated from a radial grid) and subtract it
off’ Ecut : the maximum kinetic energy of the planewaves from ρ(r) to find δρ(r). We then solve Poisson’s equa-
that can be represented in the grid without aliasing41 . tion to obtain δV H (r) and find the total grid potential
The short-range screened pseudopotentials VIN A (r) in V (r) = V N A (r) + δV H (r) + V xc (r). Finally, at every
(16) are tabulated as a function of the distance to atoms I grid point, we calculate V (r)φ∗µ (r)φν (r)∆r3 for all pairs
and easily interpolated at any desired grid point. The last φµ , φν which are not zero at that point (∆r3 is the vol-
two terms require the calculation of the electron density ume per grid point) and add it to the Hamiltonian matrix
on the grid. Let ψi (r) be the Hamiltonian eigenstates, element Hµν .
expanded in the atomic basis set To solve Poisson’s equation and find δV H (r) we nor-
X mally use fast Fourier transforms in a unit cell that is
ψi (r) = φµ (r)cµi , (32) either naturally periodic or made artificially periodic by
µ a supercell construction. For neutral isolated molecules,
8
∆ Et (meV)
-20
crease rapidly with cell size. For charged molecules we 0
supress the G = 0 Fourier component (an infinite con- -40
-20
stant) of the potential created by the excess of charge. LDA GGA
This amounts to compensating this excess with a uni- -40 -60
∆ P (kbar)
20 20
can solve Poisson’s equation by the multigrid method, 0 0
using finite differences and fixed boundary conditions, -20 -20
obtained from the multipole expansion of the molecular -40 -40
charge density. This can be done in strictly O(N ) opera- 100 300 500 100 300 500
tions, unlike the FFT’s, which scale as N log N . However, Ec (Ry) Ec (Ry)
the cost of this operation is typically negligible and there-
fore has no influence on the overall scaling properties of
FIG. 6: Same as Fig. 5 for the total energy and pressure
the calculation.
of bulk iron. This is presented as a specially difficult case
Figures 5 and 6 show the convergence of different mag- because of the very hard partial core correction (rm = 0.7 au)
nitudes with respect to the energy cutoff of the integra- required for a correct description of exchange and correlation.
tion grid. For orthogonal unit cell vectors this is simply,
in atomic units, Ecut = (π/∆x)2 /2 with ∆x the grid
interval. repeated twice in an almost independent way: only to
calculate V xc (r) need they be combined. However, in
2
the non-collinear spin case44,45,46,47 , the density at every
40
point is not represented by the up and down values, but
20
Si 0
also by a vector giving the spin direction. Equivalently,
∆ d (mAng)
∆ Et (meV)
0
-2 it may be represented by a local spin density matrix
-4
-20 H2O ραβ (r) =
X
ni ψiβ∗ (r)ψiα (r) =
X
ραβ ∗
-6 µν φν (r)φµ (r) (36)
-40
i µν
15 1
X
∆ P (kbar)
µi
0
µ
-5
-15 -1
β
X
10 30 50 70 90 20 60 100 140 180 ραβ
µν = cα
µi ni ciν (38)
Ec (Ry) Ec (Ry) i
↑ ↓
vector. We then find VXC (r), VXC (r) in that direction, where φµ and φν are within the unit cell. The resulting
with the usual local spin density functional15 and we ro- N × N complex eigenvalue problem, with N the num-
αβ
tate back VXC (r) to the original direction. Thus, the ber of orbitals in the unit cell, is then solved at every
grid operations are still basically the same, except that sampled k point, finding the Bloch-state expansion coef-
they need now be repeated three times, for the ↑↑, ↓↓ ficients cµi (k):
αβ
and ↑↓ components. Notice that ραβ (r) and VXC (r) are X
αβ αβ
locally Hermitian, while Hµν and ρµν are globally Her- ψi (k, r) = eikRµ′ φµ′ (r)cµ′ i (k) (43)
βα αβ∗ µ′
mitian (Hνµ = Hµν ), so that their ↓↑ components can
be obtained from the ↑↓ ones. where the sum in µ′ extends to all basis orbitals in space,
i labels the different bands, cµ′ i = cµi if µ′ ≡ µ, and
ψi (k, r) is normalized in the unit cell.
VIII. BRILLOUIN ZONE SAMPLING The electron density is then
XZ
ρ(r) = ni (k)|ψi (k, r)|2 dk
Integration of all magnitudes over the Brillouin zone i BZ
(BZ) is essential for small and moderately large unit cells, X
especially of metals. Although Siesta is designed for = ρµ′ ν ′ φ∗ν ′ (r)φµ′ (r) (44)
large unit cells, in practice it is very useful, especially µ′ ν ′
for comparisons and checks, to be able to also perform where the sum is again over all basis orbitals in space,
calculations efficiently on smaller systems without using and the density matrix
expensive superlattices. On the other hand, an efficient XZ
k-sampling implementation should not penalize, because ρµν = cµi (k)ni (k)ciν (k)eik(Rν −Rµ ) dk (45)
of the required complex arithmetic, the Γ-point calcula- i BZ
tions used in large cells. A solution used in some pro-
grams is to have two different versions of all or part of is real (for real φµ ’s) and periodic, i.e. ρµν = ρµ′ ν ′ if
the code, but this poses extra maintenance requirements. (ν, µ) ≡ (ν ′ , µ′ ) (with ‘≡’ meaning again ‘equivalent by
We have dealt with this problem in the following way: translation’). Thus, to calculate the density at a grid
around the unit cell (and comprising itself) we define an point of the unit cell, we simply find the sum (44) over
auxiliary supercell large enough to contain all the atoms all the pairs of orbitals φµ , φν in the supercell that are
whose basis orbitals are non-zero at any of the grid points non-zero at that point.
of the unit cell, or which overlap with any of the basis In practice, the integral in (45) is performed in a finite,
orbitals in it. We calculate all the non-zero two-center uniform grid of the Brillouin zone. The fineness of this
integrals between the unit cell basis orbitals and the su- grid is controlled by a k-grid cutoff lcut , a real-space ra-
percell orbitals, without any complex phase factors. We dius which plays a role equivalent to the planewave cutoff
also calculate the grid integrals between all the supercell of the real-space grid48 . The origin of the k-grid may be
basis orbitals φµ′ and φν ′′ (primed indices run over all displaced from k = 0 in order to decrease the number of
the supercell), but within the unit cell only. We accumu- inequivalent k-points49 .
late these integrals in the corresponding matrix elements, If the unit cell is large enough to allow a Γ-point-only
thus making use of the relation calculation, the multiplication by phase factors is skipped
and a single real-matrix eigenvalue problem is solved (in
X this case, the real matrix elements are accumulated di-
< φµ |V (r)|φν ′ >= < φµ′ |V (r)f (r)|φν ′′ > .
rectly in the first stage, if multiple overlaps occur). In
(µ′ ν ′′ )≡(µν ′ )
this way, no complex arithmetic penalty occurs, and the
(41) differences between Γ-point and k-sampling are limited to
a very small section of the code, while all the two-center
f (r) = 1 for r within the unit cell and is zero other- and grid integrals use always the same real-arithmetic
wise. φµ is within the unit cell. The notation µ′ ≡ µ code.
indicates that φµ′ and φµ are equivalent orbitals, re-
lated by a lattice vector translation. (µ′ ν ′′ ) ≡ (µν ′ )
means that the sum extends over all pairs of supercell IX. TOTAL ENERGY
orbitals φµ′ and φν ′′ such that µ′ ≡ µ, ν ′′ ≡ ν ′ , and
Rµ − Rν ′ = Rµ′ − Rν ′′ . Once all the real overlap and The Kohn-Sham14 total energy can be written as a
Hamiltonian matrix elements are calculated, we multiply sum of a band-structure (BS) energy plus some correction
them, at every k-point by the corresponding phase fac- terms, sometimes called ‘double count’ corrections. The
tors and accumulate them by folding the supercell orbital BS term is the sum of the energies of the occupied states
to its unit-cell counterpart. Thus ψi :
X X
E BS =
X
Hµν (k) = Hµν ′ eik(Rν ′ −Rµ ) (42) ni hψi |Ĥ|ψi i = Hµν ρνµ = Tr(Hρ) (46)
ν ′ ≡ν i µν
10
where spin and k-sampling notations are omitted here for Defining ρN
I
A
from VIN A , analogously to ρlocal
I , we have
NA
simplicity. At convergence, the ψi ’s are simply the eigen- that ρI = ρlocal
I + ρ atom
I , and equation (47) can be
vectors of the Hamiltonian, but it is important to realize transformed, after some rearrangement of terms, into
that the Kohn-Sham functional is also perfectly well de-
fined outside this so-called ‘Born-Oppenheimer surface’,
X 1 X NA
E KS = KB
(Tµν + Vµν )ρνµ + UIJ (RIJ )
i.e. it is defined for any set of orthonormal ψi ’s. The µν
2
IJ
correction terms are simple functionals of the electron X
local
X
density, which can be obtained from equation (35), and + δUIJ (RIJ ) − UIlocal
the atomic positions. The Kohn-Sham total energy can I<J I
Z
then be written as
+ V N A (r)δρ(r)d3 r (53)
1
X Z
KS
E = Hµν ρνµ − V H (r)ρ(r)d3 r 1
Z Z
µν
2 + δV H (r)δρ(r)d3 r + ǫxc (r)ρ(r)d3 r
Z 2
+ (ǫxc (r) − V xc (r))ρ(r)d3 r where V N A = I VIN A and δρ = ρ − I ρN
P P A
I .
X ZI ZJ Z
+ (47) NA
RIJ UIJ (R) = VIN A (r)ρN A
J (r − R)d r
3
I<J
1
Z
where I, J are atomic indices, RIJ ≡ |RJ − RI |, ZI , ZJ = − VIN A (r)∇2 VJN A (r − R)d3 r (54)
are the valence ion pseudoatom charges, and ǫxc (r)ρ(r) 4π
is the exchange-correlation energy density. In order to is a radial pairwise potential that can be obtained from
avoid the long range interactions of the last term, we VIN A (r) as a two-center integral, by the same method
construct from the local-pseudopotential VIlocal , which described previously for the kinetic matrix elements:
has an asymptotic behavior of −ZI /r, a diffuse ion
charge, ρlocal
I (r), whose electrostatic potential is equal 1
to VIlocal (r): Tµν = hφµ | − ∇2 |φν i
Z 2
1
1 2 local = − φ∗µ (r)∇2 φν (r − Rµν )d3 r (55)
ρlocal
I (r) = − ∇ VI (r). (48) 2
4π
KB
Notice that we define the electron density as positive, Vµν is also obtained by two-center integrals:
and therefore ρlocal
I ≤ 0. Then, we write the last term in X
KB
(47) as Vµν = hφµ |χα ivαKB hχα |φν i (56)
X ZI ZJ α
1 X local X
local
= UIJ (RIJ ) + δUIJ (RIJ )
RIJ 2 where the sum is over all the KB projectors χα that over-
I<J IJ I<J
X lap simultaneously with φµ and φν .
− UIlocal (49) Although (53) is the total energy equation actually
I used by Siesta, its meaning may be further clarified if
the I = J terms of 12 IJ UIJ NA
P
local
(RIJ ) are combined with
where UIJ is the electrostatic interaction between the P local
diffuse ion charges in atoms I and J: I UI to yield
X X
E KS = KB
(Tµν + Vµν )ρνµ + NA
UIJ (RIJ )
Z
local
UIJ (|R|) = VIlocal (r)ρlocal
J (r − R)d3 r, (50) µν I<J
X X
local
local + δUIJ (RIJ ) + UIatom
δUIJ is a small short-range interaction term to correct
I<J I
for a possible overlap between the soft ion charges, which Z
appears when the core densities are very extended: + V NA
(r)δρ(r)d r3
(57)
local ZI ZJ local 1
Z Z
δUIJ (R) = − UIJ (R), (51) + δV H (r)δρ(r)d3 r + ǫxc (r)ρ(r)d3 r
R 2
and UIlocal is the fictitious self interaction of an ion charge where
(notice that the first right-hand sum in (49) includes the
∞
I = J terms):
1 atom
Z
UIatom = VIlocal (r) + VI (r) ρatom
I (r)4πr2 dr
0 2
1 local 1
Z
local
UI = UII (0) = VIlocal (r)ρlocal
I (r)4πr2 dr. (58)
2 2
(52) is the electrostatic energy of an isolated atom.
11
The last three terms in Eq. (53) are calculated using usefulness to improve dramatically the estimate of the
the real space grid. In addition to getting rid of all long- converged total energy, by taking ρin µν as the density ma-
range potentials (except that implicit in δV H (r)), the trix of the (n− 1)’th self-consistency iteration and ρout
µν of
advantage of (53) is that, apart from the relatively slowly- the n’th iteration. In fact, E Harris frequently gives, after
varying exchange-correlation energy density, the grid in- just two or three iterations, a better estimate than E KS
tegrals involve δρ(r), which is generally much smaller after tens of iterations. Unfortunately, we have found
than ρ(r). Thus, the errors associated with the finite that there is hardly any improvement in the convergence
grid spacing are drastically reduced. Critically, the ki- of the atomic forces thus estimated, and therefore the
netic energy matrix elements can be calculated almost self-consistent Harris functional is less useful for geome-
exactly, without any grid integrations. try relaxations or molecular dynamics.
It is frequently desirable to introduce a finite electronic
temperature T and/or a fixed chemical potential µ, either
because of true physical conditions or to accelerate the
self-consistency iteration. Then, the functional that must
be minimized is the free energy50
XI. ATOMIC FORCES
X
F (RI , ψi (r), ni ) = E KS (RI , ψi (r), ni ) − µ ni
i
X Atomic forces and stresses are obtained by direct dif-
−kB T (ni log ni + (1 − ni ) log(1 − ni )). (59) ferentiation of (53) with respect to atomic positions.
i They are obtained simultaneously with the total energy,
mostly in the same places of the code, under the general
Minimization with respect to ni yields the usual Fermi-
paradigm “a piece of energy ⇒ a piece of force/stress”
Dirac distribution ni = 1/(1 + e(ǫi −µ)/kB T ).
(except that some pieces are calculated only in the last
self-consistency step). This ensures that all force contri-
X. HARRIS FUNCTIONAL
butions, including Pulay corrections, are automatically
included. The force contribution from the first term in
(53) is
We will mention here a special use of the Harris energy
functional, that is generally defined as51,52
∂ X
KB
(Tµν + Vµν )ρνµ =
X
E Harris [ρin ] = nout out in out
i hψi |Ĥ |ψi i ∂RI µν
i
1
Z Z in in
ρ (r)ρ (r ) 3 3 ′ ′ X
KB ∂ρνµ X X dTµν
− d rd r (Tµν + Vµν ) +2 ρνµ
2 |r − r′ | µν
∂RI µ
dRµν
ν∈I
Z X ZI ZJ dSαν
+ (ǫin in in 3
XXX
xc (r) − vxc (r))ρ (r)d r + (60) +2 Sµα vαKB ρνµ
RIJ dRαν
I<J µ ν∈I α
XX dSαν
where Ĥ in is the KS Hamiltonian produced by a trial −2 Sµα vαKB ρνµ (62)
dRαν
density ρin and ψiout are its eigenvectors (which in gen- µν α∈I
eral are different from those whose density is ρin ). As
in Eq. (46), the first term in (60) can be written as
Tr(H in ρout ), and the rest are the so-called ‘double count where α are KB projector indices, ∈ I indicates orbitals
corrections’. An important advantage of eq. (60) is that or KB projectors belonging to atom I, and we have con-
it does not require ρin sidered that
i to be obtained from a set of or-
thogonal electron states ψiin , and in fact ρin is frequently
taken as a simple superposition of atomic densities. How- ∂Sµν ∂Sµν dSµν
ever, we will assume here that the states ψiin are indeed =− = , (63)
∂RIν ∂RIµ dRµν
known. In this case, the Kohn-Sham energy E KS [ρin ],
Eq. (47), obeys exactly the same expression (60), except
that ψiout and nout i must be replaced by ψiin and nin i . where RIµ is the position of atom Iµ , to which orbital
Thus, a simple subtraction gives φµ belongs and Rµν = RIν − RIµ .
Leaving aside for appendix A the terms contain-
X
E Harris [ρin ] = E KS [ρin ] + in
ρout in
Hνµ µν − ρµν . (61)
µν
ing ∂ρνµ /∂RI , the other derivatives can be obtained
by straightforward differentiation of their expansion in
Generally the Harris functional is used nonself- spherical harmonics (Eq. (25)). However, instead of us-
consistently, with a trial density given by the sum of ing the spherical harmonics Ylm (r̂) themselves, it is con-
atomic densities. But here we want to comment on its venient to multiply them by rl , in order to make them
12
analytic at the origin. Thus aside the terms with ∂ρνµ /∂RI , the last term in Eq. (65),
as well as those in (66) and (67), have the general form
dSµν (R) X S µν (R)
lm l
= ∇ R Ylm (R̂) Z
dR Rl XX
lm Re ρνµ V (r)φ∗µ (r)∇φν (r)d3 r
X d S µν (R) µ ν∈I
= lm
Rl Ylm (R̂)R̂
dR Rl
XX
lm = Re ρνµ hφµ |V (r)|∇φν i. (70)
X S µν (R) µ ν∈I
+ lm
∇(Rl Ylm (R̂)) (64)
Rl These integrals are calculated on the grid, in the same
lm
way as those for the total energy (i. e. hφµ |V (r)|φν i).
µν µν
In fact, it is Slm (R)/Rl , rather than Slm (R), that is The gradients ∇φν (r) at the grid points are obtained
stored as a function of R on a radial grid. Its deriva- analytically, like those of φν (r) from their radial grid in-
µν
tive, d(Slm (R)/Rl )/dR, is then obtained from the same terpolations of φ(r)/rl :
cubic spline interpolation used for the value itself. The
value and gradient of Rl Ylm (R̂) are calculated analyt-
d φIln (r)
ically from explicit formulae (up to l = 2) or recur- ∇φIlmn (r) = rl Ylm (r̂)r̂
dr rl
rence relations31 . Entirely analogous equations apply to
φIln (r)
dTµν /dRµν . + ∇(rl Ylm (r̂)). (71)
The second and third terms in Eq. (53) are sim- rl
ple interatomic pair potentials whose force contributions In some special cases, with elements that require hard
are calculated trivially from their radial spline inter- partial core corrections or explicit inclusion of the semi-
polations. The fourth term is a constant which does core, the grid integrals may pose a problem for geometry
not depend on the atomic P positions. Taking into ac- relaxations, because they make the energy dependent on
count that V N A (r) = I VI
NA
(r − RI ), and therefore the position of the atoms relative to the grid. This ‘egg-
∂V N A (r)/∂RI = −∇VIN A (r − RI ), the force contribu- box effect’ is small for the energy itself, and it decreases
tion from the fifth term is fast with the grid spacing. But the effect is larger and
∂
Z the convergence slower for the forces, as they are pro-
V N A (r)δρ(r)d3 r = (65) portional to the amplitude of the energy oscillation, but
∂RI
inversely proportional to its period. These force oscilla-
∂δρ(r) 3
Z Z
− ∇VIN A (r)δρ(r)d3 r + V N A (r) d r tions complicate the force landscape, especially when the
∂RI true atomic forces become small, making the convergence
The sixth term is the electrostatic self-energy of the of the geometry optimization more difficult. Of course,
charge distribution δρ(r): the problem can be avoided by decreasing the grid spac-
ing but this has an additional cost in computer time and
∂ 1 ∂δρ(r) 3 memory. Therefore, we have found it useful to minimize
Z Z
δV (r)δρ(r)d r = δV H (r)
H 3
d r (66) this problem by recalculating the forces, at a set of po-
∂RI 2 ∂RI
sitions, determined by translating the whole system by
In the last term, we take into account that d(ρǫxc )/dρ = a set of points in a finer mesh. This procedure, which
v xc to obtain we call ‘grid-cell sampling’, has no extra cost in memory.
And since it is done only at the end of the self-consistency
∂ ∂ρ(r) 3
Z Z
ǫxc (r)ρ(r)d3 r = V xc (r) d r (67) iteration, for fixed ρµν , it has only a moderate cost in
∂RI ∂RI CPU time.
Now, using Eq. (35) and that, for ν ∈ I, ∂φν (r)/∂RI = At finite temperature, the forces are really the deriva-
−∇φν , the change of the self-consistent and atomic den- tives of the free energy with respect to atomic displace-
sities are ments since
The latter are the atomic forces actually calculated. No- origin gets displaced according to (75). From this equa-
tice, however, that dE/dRI 6= ∂E/∂RI , so that the cal- tion, we find that
culated forces are indeed the total derivatives of the free,
not the internal energy. ∂rγ
= δγα rβ (76)
We would like to also mention the calculation of forces ∂ǫαβ
using the non-self-consistent Harris functional, in which
the ‘in’ density is a superposition of atomic densities. We The change in E KS is essentially due to these position
have implemented this as an option for ‘quick and dirty’ displacements, and therefore the calculation of the stress
calculations because, used with a minimal basis set, it is almost perfectly parallel to that of the atomic forces,
makes Siesta competitive with tight binding methods, thus being performed in the same sections of the code.
which are much faster than density functional calcula- For example:
tions. The problem that we address here is that, al-
3 γ
though E Harris is stationary with respect to ρout , it is ∂Tµν X ∂Tµν ∂rµν ∂Tµν β
not so with respect to ρin . In particular, there appears = γ = α
rµν (77)
∂ǫαβ γ=1
∂rµν ∂ǫαβ ∂rµν
a force term
α
in
∂Vxc (r) out Since ∂Tµν /∂rµν is evaluated to calculate the forces, it
Z
ρ (r)d3 r. (73) β
∂RI takes very little extra effort to multiply it also by rµν
for the stress. Equally, force contributions like (70) have
A similar term appears for the electrostatic interaction their obvious stress counterpart
between the input and output density, but it presents
no special problems because of the linear character of
X
ρνµ hφµ |V (r)|(∇α φν )rβ i (78)
the Hartree potential. However, evaluation of (73) re- µν
quires the change of the exchange-correlation potential
with density, a quantity also required to evaluate the lin- However, there are three exceptions to this parallelism.
ear response of the electron gas, but not in normal energy The first concerns the change of the volume per grid point
and force calculations. Finally, notice that, apart from or, in other words, the Jacobian of the transformation
this minor difficulty, the Harris-functional forces are per- (75) in the integrals over the unit cell. This Jacobian is
fectly well defined at the first iteration only. For later simply δαβ , and it leads to a stress contribution
iterations (but still not converged) there is no practi-
cal way to calculate ∂ρin /∂RI and, without the help of
Z
NA 1 H 3 xc
V (r) + δV (r) δρ(r)d r + E δαβ (79)
Hellman-Feynmann’s theorem (which applies only at con- 2
vergence), the forces are not well defined. Of course, the
omission of the terms depending on this quantity pro- Notice that the the renormalization of the density, re-
duces an estimate of the forces, but we have found that quired to conserve the charge when the volume changes,
their convergence is not appreciably faster than those es- enters through the orthonormality constraints, to be dis-
timated from the Kohn-Sham functional. cussed in appendix A. The second special contribu-
tion to the stress lies in the fact that, as we deform
the lattice, there is a change in the factor 1/|r − r′ | of
XII. STRESS TENSOR the electrostatic energy integrals. We deal with this
contribution in reciprocal space, when we calculate the
We define the stress tensor as the positive derivative Hartree potential by FFTs, by evaluating the derivative
of the total energy with respect to the strain tensor of the P
reciprocal-space vectors with respect to ǫαβ . Since
G′α = β Gβ (δβα − ǫβα ):
∂E KS
σαβ = (74)
∂ǫαβ ∂ 1 2Gα Gβ
= (80)
∂ǫαβ G2 G4
where α, β are Cartesian coordinate indices. To trans-
late to standard units of pressure, we must simply divide Finally, the third special stress contribution arises in
by the unit-cell volume and change sign. During the de- GGA exchange and correlation, from the change of the
formation, all vector positions, including those of atoms gradient of the deformed density ρ(r) → ρ(r′ ). The treat-
and grid points (and of course lattice vectors), change ment of this contribution is explained in detail in refer-
according to ence 42.
3
X
rα′ = (δαβ + ǫαβ ) rβ (75)
Electric Polarization
β=1
The shape of the basis functions, KB projectors, and The calculation of the electric polarization, as an in-
atomic densities and potentials do not change, but their tegral in the grid across the unit cell, is standard and
14
almost free for molecules, chains and slabs (in the direc- sectionV. It is interesting to note that, since the dis-
tions perpendicular to the chain axis, or to the surface). cretized formula (83) only holds to O(∆k 2 ), the approxi-
For bulk systems, the electric polarization cannot be mation of the matrix elements in (84) does not introduce
found from the charge distribution in the unit cell alone. any further errors in the calculation of the polarization.
In this case, we need the so-called Berry-phase theory of In a symmetrized version, we approximate equation (84)
polarization53,54 , which allows to compute quantities like as
the dynamical charges53 and piezoeletric constants55,56 .
57
Here we comment some details of our implementation
P3 .
e e
If Rα are the lattice vetors and P = α=1 Pα Rα is ∆k
XX
ciν (k)cµ′ j (k + ∆k)e−i(k+ 2 )·(Rν −Rµ′ ) [ hφν |φµ′ i
the electronic contribution to the macroscopic polariza- ν µ′
tion, then we have
∆k
−i · ( hφν |(r − Rν )|φµ′ i + hφν |(r − Rµ′ )|φµ′ i) ](85)
,
2π Pαe = Gα · P e 2
2e ∂
Z
′
= − 3
dk Gα · ′
Φ(k, k ) ′ (81)
(2π) BZ ∂k k =k
band-structure energy is rewritten as: strict localization decreases rapidly as a function of the
X localization radius Rc , as can be seen in Fig. 7.
E KMG = 2 (2δji − Sji )(Hij − ηSij )
ij 80
XX
= 4 ciµ δHµν cνi
Bulk Modulus
i µν
X X 60 Lattice Constant
− 2 ciα Sαβ cβj cjµ δHµν cνi (86) Cohesive Energy
Memory (Mb)
constant energy, with a time step of 1.5 fs, and a minimal
basis set with a cutoff radius rc = 5 a.u. Rc is the local- 150
ization radius of the Wannier-like wavefunctions used in the
1000
O(N ) functional (see text). N is the number of atoms in the
100
system.
Rc = 4Å Rc = 5Å
50 500
N CG SCF CG SCF
64 5.8 9.3 8.4 8.4
0
512 4.9 11.4 8.8 10.1
1000 4.3 11.5 9.9 11.5 0
0 2000 4000 6000 8000
Number of Atoms
ble II presents the average number of iterations required FIG. 8: CPU time and memory for silicon supercells of 64,
to minimize the O(N ) functional and the average number 512, 1000, 4096, and 8000 atoms. Times are for one aver-
of selfconsistency iterations, during a molecular dynam- age molecular dynamics step at 300 K. This includes 10 SCF
ics simulation of bulk silicon at room temperature. It can steps, each with 10 conjugate gradient minimization steps of
be seen that these numbers are quite small and that they the O(N ) energy functional. Memories are peak ones. Al-
increase very moderately with system size. As might be though the memory requierement for 8000 atoms was deter-
expected, the number of minimization iterations increase mined accurately, the run could not be performed because of
insufficient memory in the PC used.
with the localization radius, i.e. with the number of de-
grees of freedom (cµi coefficients) of the wavefunctions.
But this increase is also rather moderate. conjugate gradients minimization or several other
Figure 8 shows the essentially perfect O(N ) behaviour minimization/annealing algorithms.
of the overall CPU time and memory. This is not surpris-
ing in view of the completely strict enforcement of O(N ) • It is possible to perform a variety of molecular dy-
algorithms everywhere in the code (except the marginal namics simulations, at constant energy or temper-
N log N factor in the FFT used to solve Poisson’s equa- ature, and at constant volume or pressure, also in-
tion, which represents a very small fraction of CPU time cluding Parrinello-Rahman dynamics with variable
even for 4000 atoms). cell shape67 . The geometry relaxation may be re-
stricted, to impose certain positions or coordinates,
or more complex constraints.
XIV. OTHER FEATURES
• The auxiliary program Vibra processes systemat-
ically the atomic forces for sets of displaced atomic
Here we will simply mention some of the possibilities positions, and from them computes the Hessian ma-
and features of the Siesta implementation of DFT: trix and the phonon spectrum. An interface to the
Phonon program69 is also provided within Siesta.
• A general-purpose package68 , the flexible data for-
mat (fdf), initially developped for the Siesta • A linear response program (Linres) to calculate
project, allows the introduction of all the data phonon frequencies has also been developed70 . The
and precision parameters in a simple tag-oriented, code reads the SCF solution obtained by Siesta,
order-independent format which accepts different and calculates the linear response to the atomic
physical units. The data can then be accessed from displacements, using first order perturbation the-
anywhere in the program, using simple subroutine ory. It then calculates the dynamical matrix, from
calls in which a default value is specified for the which the phonon frequencies are obtained.
case in which the data are not present. A simple
call also allows the read pointer to be positioned • A number of auxiliary programs allows various rep-
in order to read complex data ‘blocks’ also marked resentations of the total density, the total and lo-
with tags. cal density of states, and the electrostatic or to-
tal potentials. The representations include both
• The systematic calculation of atomic forces and two-dimensional cuts and three-dimensional views,
stress tensor allows the simultaneous relaxation of which may be colored to simultaneously represent
atomic coordinates and cell shape and size, using a the density and potential.
17
• Thanks to an interface with the Transiesta pro- APPENDIX A: ORTHOGONALITY FORCE AND
gram, it is posible to calculate transport properties STRESS
across a nanocontact, finding self-consistently the
effective potential across a finite voltage drop, at We have yet to comment on the force and stress
a DFT level, using the Keldish Green’s function terms containing ∂ρµν /∂RI . Substituting the first
formalism71 . term of Eq. (68) into Eqs. (65-67) and adding the
first
P term of Eq. (62) we obtain a simple expression:
µν Hνµ ∂ρµν /∂RI . Now, ρµν is a function of the Hamil-
• The optical response can be studied with siesta us- tonian eigenvector coefficients and occupations only
ing different approaches. An approximate dielectric (Eq. (34)). On the Born-Oppenheimer surface (BOS),
function can be calculated from the dipolar tran- E KS is stationary with respect to those coefficients and
sition matrix elements between occupied and un- occupations, and the Hellman-Feynmann theorem guar-
occupied single-electron eigenstates using first or- antees that any change of them will not modify the total
der time-dependent pertubation theory.72 For finite energy to first order, and therefore will not affect the
systems, these are easily calculated from the matrix forces. In other words, the atomic forces are the partial
elements of the position operator between the ba- derivatives ∂E KS /∂RI at constant cµi and ni . Even in
sis orbitals. For infinite periodic systems, we use the Car-Parrinello scheme, in which the system moves
the matrix elements of the momentum operator. out of the BOS, making the Hellman-Feynmann theo-
It is important to notice, however, that the use of rem invalid, the atomic forces are nevertheless defined as
non-local pseudopotentials requires some correction derivatives at constant cµi and ni . Thus, it may seem
terms73 . that the terms ∂ρµν /∂RI are irrelevant for the calcula-
tion of the forces. However, in the previous discussion
We have also implemented a more sophisticated
we have omitted to say that the KS energy must be min-
approach to compute the optical response of fi-
imized under the constraint of orthonormality of the oc-
nite systems, using the adiabatic approximation
cupied states and that, therefore, at the BOS the energy
to time-dependent DFT74,75 . The idea is to in-
is stationary only with respect to changes of ψi which do
tegrate the time-dependent Schrödinger equation
not violate the orthonormality. With an atomic basis set,
when a time depedent perturbation is applied to the
the displacement of atoms (and the deformation of the
system76 . From the time evolution, it is then pos-
unit cell) modifies the basis, and therefore the occupied
sible to extract the optical adsorption and dipole P
states ψi = µ φµ cµi , even at constant cµi ’s. And the
strength functions, including some genuinely many-
change of the states affects their orthonormality. Thus,
body effects, like plasmons. Using this approach we
in order to calculate the new total energy, we need to
have succesfully calculated the electronic response
re-orthonormalize the occupied states, by changing their
of systems such as fullerenes and small metallic
coefficients cµi . Schematically, we must solve
clusters77 .
hψi |δS|ψj i + hδψi |S|ψj i + hψi |S|δψj i = 0 (A1)
where δS represents the change of Sµν due to the atomic
displacements, and δψi the modification of ψi due to the
ACKNOWLEDGMENTS change of cµi . Without lack of generality, we can P expand
δψi in the basis of the eigenvectors ψi as δψi = j ψj λji .
We are deeply indebted to Otto Sankey and David Substituting this expansion into (A1) and using that
Drabold for allowing us to use their code as an initial hψi |S|ψj i = δij we obtain λji = − 21 hψj |δS|ψi i. Thus
seed for this project, and to Richard Martin for contin- 1X
uous ideas and support. We thank Jose Luis Martins |δψi i = − |ψj ihψj |δS|ψi i. (A2)
2 j
for numerous discussions and ideas, and Jürgen Kübler
for helping us implement the noncollinear spin. The
In
P terms of the coefficients cµi , we have hψj |δS|ψi i =
exchange-correlation methods and routines were devel-
oped in collaboration with Carlos Balbas and Jose L. µν cjµ δSµν cνi and
Differentiating now Eq. (34) and using (A3) we obtain Notice that we have extended the integral to the whole
real axis, defining ψl (−r) ≡ (−1)l ψl (r), in accordance to
1 X
δρµν = − −1
ρµη δSηζ Sζν −1
+ Sµη δSηζ ρζν (A5) the behaviour ψl (r) ∼ rl , r → 0. The coefficients cs,c
ln can
2 be obtained by defining a complex polynomial Pl (x) =
ηζ
Plc (x) + iPls (x), which obeys the recurrence relations82
And
√
X
Hµν δρνµ = −
X
Eµν δSνµ (A6) P0 (x) = i ≡ −1
µν µν P1 (x) = i − x (B3)
2
where Eµν is the so-called energy-density matrix: Pl+1 (x) = (2l + 1)Pl (x) − x Pl−1 (x)
A direct translation of this expression into Fortran90 only on the relative positions of iext and jext (i.e. on
code might read jα − iα ), but not on the position of iext within the unit
cell. We can then create a list of neighbor strides ∆ijext ,
Lf(ix,iy,iz) = & and two arrays to translate back and forth between i and
( f(modulo(ix+1,nx),iy,iz) + & iext . One of the arrays maps the unit cell points to the
f(modulo(ix-1,nx),iy,iz) )/dx2 & central region of the extended mesh, while the other one
+ ( f(ix,modulo(iy+1,ny),iz) + & folds back the extended mesh points to their periodically
f(ix,modulo(iy-1,ny),iz) )/dy2 & equivalent points within the unit cell. Then, to access the
+ ( f(ix,iy,modulo(iz+1,nz)) + & neighbors of a point i, we a) translate i → iext ; b) find
f(ix,iy,modulo(iz-1,nz)) )/dz2 & jext = iext + ∆ijext ; and c) translate jext → j. Notice
- f(ix,iy,iz) * (2/dx2+2/dy2+2/dz2) that several points of the extended mesh will map to the
where the indices iα (α = {x, y, z}) of the arrays f and same point within the unit cell and that, in principle, a
Lf run from 0 to nα − 1, as in C. There are two problems unit cell point j may be neighbor of i through different
with this construction. First, the modulo operations are values of jext . In our example, the innermost loop would
required to bring the indices back to the allowed range then read
[0, nα − 1]. And second, the use of three indices to refer Lf(i) = 0
to a mesh point implies implicit arithmetic operations, do neighb = 1,n_neighb
generated by the compiler, to translate them into a single j_ext = i_extended(i) + ij_delta(neighb)
index giving its position in memory. j = i_cell(j_ext)
A straightforward solution to these inefficiencies would Lf(i) = Lf(i) + L(neighb) * f(j)
be to create a neighbor-point list j_neighb(i,neighb), end do
of the size of the number of mesh points times the num-
ber of neighbor points. However, although the latter are where the number of neighbor points would be
only six in our simple example, they may frequently be n_neighb=7, including the central point itself, and
as many as several hundred, what generally makes this
approach unfeasible. A partial solution, addressing only ij_delta(1) = 1 ; L(1) = 1/dx2
the first problem, is to create six (or more for longer ij_delta(2) = -1 ; L(1) = 1/dx2
ranges) one-dimensional tables jα±1 (iα ) = mod(iα ±1, nα ) ij_delta(3) = nx_ext ; L(3) = 1/dy2
to avoid the modulo computations83 . Here, we describe ij_delta(4) = -nx_ext ; L(4) = 1/dy2
a multidimensional generalization of this method, which ij_delta(5) = nx_ext*ny_ext ; L(5) = 1/dz2
solves both problems at the expense of a very reasonable ij_delta(6) = -nx_ext*ny_ext ; L(6) = 1/dz2
amount of extra storage. ij_delta(7) = 0 ; L(7)=-2/dx2-2/dy2-2/dz2
The method is based on an extended mesh, which ex- Notice that the above loop is completely general, for any
tends beyond the periodic unit cell, by as much as re- linear operator, using an arbitrary number of neighbor
quired to cover all the space that can be reached from points for its finite difference representation. In fact, it
the unit cell by the range of the interactions or the is even independent of the space dimensionality. Further-
finite-difference operator. The extended mesh range is more, the index operations required are just one addition
imin
α = −∆nα and imaxα = nα − 1 + ∆nα , where ∆nα = 1 and three memory calls to arrays of range one84 . This in-
in our particular example, in which the Laplacian for- ner loop simplicity comes at the expense of the two extra
mula extends just to first-neighbor mesh points. In prin- arrays i_extended and i_cell (of the size of the normal
ciple, in cases with a small unit cell and a long range, the and extended meshes, respectively) which are generally
mesh extension may be larger than the unit cell itself, an acceptable memory overhead. Notice, however, that
extending over several neighbor cells. However, in the the the neighbor-point list ij_delta is independent of
more relevant case of a large system, we will expect the the mesh index i, what makes this array quite small in
extension region to be small compared to the unit cell. most problems of interest.
We then consider two combined indices, one associated
to the normal unit-cell mesh, and another one associated
to the extended mesh APPENDIX D: SPARSE MATRIX TECHNIQUES
i = i x + nx i y + nx ny i z ,
We will describe here some of the sparse-matrix mul-
tiplication techniques used in evaluating Eqs. (35), (86),
iext = (ix − imin
x ) + next min
x (iy − iy ) + next ext min
x ny (iz − iz ), (87), and (88). There is a large variety of sparse-matrix
representations and algorithms, each one optimized for a
where next
α = imax
α − imin
α + 1 = nα + 2∆nα . The key different kind of sparsity. The main constraint for choos-
observation is that, if iext is a mesh point within the ing our representation and algorithms is that they must
unit cell (0 ≤ iα ≤ nα − 1), and if jext is a neighbor be O(N ) in both memory and CPU time. We enforce
mesh point (within its interaction range, i.e. |jα − iα | ≤ this condition strictly by requiring, for example, that a
∆nα ), then the arithmetic difference jext − iext depends vector of size ∼ N will not be reset to zero a number
20
The combination of these two matrix multiplication al- which are non-zero at the grid point r, into an auxiliary
gorithms allows an efficient evaluation of Eqs. (86-88). matrix array, of size naux × naux , with naux ≥ nr . We
Since these equations involve a trace or a relatively small also create a lookup table pos, of size equal to the to-
subset of a final matrix, it is important to control the tal number of basis orbitals, such that pos(mu) is the
order and sparsity of the intermediate products, in order position, in the auxiliary matrix, of the matrix elements
to keep them as sparse as possible. Notice that, once a of orbital mu (or zero if they have not been copied to
row of A × B has been evaluated, it may be multiplied it). If there are new non-zero orbitals at the next grid
by a third matrix, to obtain a row of the final product, points, we keep copying them into the auxiliary matrix,
without need of storing the whole intermediate matrix. until all its naux slots are full, at which point we erase it
To calculate the density at a grid point using Eq. (35) and restart the process. Since succesive grid points tend
we need to access the matrix elements ρµν , and this is in- to contain the same non-zero basis orbitals, these copies
efficient if they are stored in sparse format. Thus, we first and erasures are not frequent.
copy the matrix elements, between the nr basis orbitals
1
L. Greengard, Science 265, 909 (1994). neither too deep nor too shallow. This tends to maintain
2
R. W. Hockney and J. W. Eastwood, Computer Simulation the separable potentials free of ghost states86 .
28
Using Particles (IOP Publishing, Bristol, 1988). For some atoms, tipically those with semicore states
3
R. Car and M. Parrinello, Phys. Rev. Lett. 55, 2471 (1985). suitable to be treated together with the valence states,
4
C. M. Goringe, D. R. Bowler, and E. Hernandez, Rep. Vl (r) only assumes the asymptotic coulombic behaviour
Prog. Phys. 60, 1447 (1997). −2Zval /r, and therefore only cancels out exactly with our
5
P. Ordejon, Comp. Mat. Sci. 12, 157 (1998). Vlocal (r), for r larger than certain rC > rcore . In these
6
P. Ordejon, D. A. Drabold, M. P. Grumbach, and R. M. cases, to avoid very extended KB projector functions, we
Martin, Phys. Rev. B 48, 14646 (1993). generate the local potentials with a prescription differ-
7
O. F. Sankey and D. J. Niklewski, Phys. Rev. B 40, 3979 ent from that presented in the text: if rC > 1.3 rcore
(1989). we take, Vlocal (r) = Vl (r) for r > rcore and, Vlocal (r) =
8
M. C. Payne, M. P. Teter, D. C. Allan, T. A. Arias, and exp(v1 +v2 r 2 +v3 r 3 ) for r < rcore , where v1 , v2 , and v3 are
J. D. Joannopoulos, Rev. Mod. Phys. 64, 1045 (1992). determined by enforcing the continuity of the potential up
9
S. Goedecker, Rev. Mod. Phys. 71, 1085 (1999). to the second derivative. This simple prescription usually
10
E. Hernandez and M. J. Gillan, Phys. Rev. B 51, 10157 produces smooth local potentials with similar properties
(1995). to those commented in the text (see note 27).
11 29
P. Ordejon, E. Artacho, and J. M. Soler, Phys. Rev. B 53, If there are both semicore and valence electrons with the
R10441 (1996). same angular momentum, the pseudopotential is generated
12
D. Sanchez-Portal, P. Ordejon, E. Artacho, and J. M. for an ion.
30
Soler, Int. J. Quantum Chem. 65, 453 (1997). D. Vanderbilt, Phys. Rev. B 41, 7892 (1990).
13 31
P. Ordejon, Physica Status Solidi (b) 217, 335 (2000), see W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P.
also http://www.uam.es/siesta. Flannery, Numerical Recipes (Cambridge University Press,
14
W. Kohn and L. J. Sham, Phys. Rev. 140, A1133 (1965). Cambridge, 1992).
15 32
J. P. Perdew and A. Zunger, Phys. Rev. B 23, 5048 (1981). D. Sanchez-Portal, J. M. Soler, and E. Artacho, J. Phys.:
16
J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Condens. Matter 8, 3859 (1996).
33
Lett. 77, 3865 (1996). G. Lippert, J. Hutter, P. Ballone, and M. Parrinello, J.
17
D. R. Hamann, M. Schlüter, and C. Chiang, Phys. Rev. Phys. Chem. 100, 6231 (1996).
34
Lett. 43, 1494 (1979). E. Artacho, D. Sanchez-Portal, P. Ordejon, A. Garcia, and
18
G. B. Bachelet, D. R. Hamann, and M. Schlüter, Phys. J. M. Soler, Phys. Stat. Sol. (b) 215, 809 (1999).
35
Rev. B 26, 4199 (1982). S. Huzinaga et al., Gaussian basis sets for molecular cal-
19
L. Kleinman and D. M. Bylander, Phys. Rev. Lett. 48, culations (Elsevier Science, Berlin, 1984).
36
1425 (1982). J. Junquera, O. Paz, D. Sanchez-Portal, and E. Artacho
20
S. G. Louie, S. Froyen, and M. L. Cohen, Phys. Rev. B 26, (2001), cond-mat/0104170. To appear in Phys. Rev. B.
37
1738 (1982). C. Filippi, D. J. Singh, and C. J. Umrigar, Phys. Rev. B
21
L. Kleinman, Phys. Rev. B 21, 2630 (1980). 50, 14947 (1994).
22 38
G. B. Bachelet and M. Schlüter, Phys. Rev. B 25, 2103 C. Kittel, Introduction to Solid State Physics (John Wiley
(1982). and Sons, New York, 1986).
23
N. Troullier and J. L. Martins, Phys. Rev. B 43, 1993 39
Some integrals, like hφIlmn |VINA |φI ′ l′ m′ n′ i could also
(1991). be calculated in this way, but this is not the case
24
P. E. Blöchl, Phys. Rev. B 41, 5414 (1990). of hφIlmn |VINA′′ |φI ′ l′ m′ n′ i, which involve rather cum-
25
N. J. Ramer and A. M. Rappe, Phys. Rev. B 59, 12471 bersome three-center integrals of arbitary numerical
(1999). functions. Therefore, it is simpler to find the total
26
D. Vanderbilt, Phys. Rev. B 32, 8412 (1985). neutral-atom potential and to calculate a single integral
27
The local potentials constructed in this way usually have a hφIlmn |V NA (r)|φI ′ l′ m′ n′ i in the uniform spatial grid.
40
strength (depth) that is an average of the different Vl ’s and J. D. Jackson, Classical Electrodynamics (John Wiley &
22
68
Sons, New York, 1962). A. Garcia and J. M. Soler (2001), unpublished.
41 69
Notice that our grid cutoff to represent the density is not K. Parlinski (2001), unpublished.
70
directly comparable to the energy cutoff in the context of J. M. Pruneda, S. Estreicher, J. Junquera, J. Ferrer, and
plane wave codes, which usually refers to the wavefunc- P. Ordejon (2001), cond-mat/0109306. To appear in Phys.
tions. Strictly speaking, the density requires a value four Rev. B.
71
times larger. M. Brandbyge, K. Stokbro, J. Taylor, J.-L. Mozos, and
42
L. C. Balbas, J. L. Martins, and J. M. Soler, Phys. Rev. B P. Ordejon, Mat. Res. Soc. Symp. Proc. 636, D9.25.1
64, 165110 (2001). (2001).
43 72
G. Makov and M. C. Payne, Phys. Rev. B 51, 4014 (1995). E. N. Economou, Green’s Functions in Quantum Physics
44
L. M. Sandratskii and P. G. Guletskii, J. Phys. F: Metal (Springer-Verlag, Berlin, 1983).
73
Phys. 16, L43 (1986). A. J. Read and R. J. Needs, Phys. Rev. B 44, 13071 (1991).
45 74
J. Kübler, K. H. Höck, J. Sticht, and A. R. Williams, J. E. K. U. Gross, C. A. Ullrich, and U. J. Gossmann, in
Appl. Phys 63, 3482 (1988). Density Functional Theory, edited by E. K. U. Gross and
46
T. Oda, A. Pasquarello, and R. Car, Phys. Rev. Lett. 80, R. M. Dreizler (Plenum Press, New York, 1995), pp. 149–
3622 (1998). 171.
47 75
A. V. Postnikov, P. Engel, and J. M. Soler (2001), cond- E. K. U. Gross, J. F. Dobson, and M. Petersilka, in Den-
mat/0109540. Submitted to Phys. Rev. B. sity Functional Theory II: Relativistic and Time Dependent
48
J. Moreno and J. M. Soler, Phys. Rev. B 45, 13891 (1992). Extensions, edited by R. F. Nalewajski (Springer-Verlag,
49
H. J. Monkhorst and J. D. Pack, Phys. Rev. B 13, 5188 Berlin-Heidelberg, 1996), vol. 181 of Topics in Current
(1976). Chemistry, pp. 81–172.
50 76
N. D. Mermin, Phys. Rev. B 137, A1441 (1965). K. Yabana and G. F. Bertsch, Phys. Rev. B 54, 4484
51
J. Harris, Phys. Rev. B 31, 1770 (1985). (1996).
52 77
W. M. C. Foulkes and R. Haydock, Phys. Rev. B 39, 12520 A. Tsolakidis, D. Sanchez-Portal, and R. M. Martin (2001),
(1989). cond-mat/0109488.
53 78
R. D. King-Smith and D. Vanderbilt, Phys. Rev. B 47, P. Ordejon, D. A. Drabold, R. M. Martin, and M. P. Grum-
1651 (1993). bach, Phys. Rev. B 51, 1456 (1995).
54 79
R. Resta, Rev. Mod. Phys. 66, 899 (1994). B. W. Suter, IEEE Transactions on Signal Processing 39,
55
G. Saghi-Szabo, R. E. Cohen, and H. Krakauer, Phys. Rev. 532 (1991).
80
Lett. 80, 4321 (1998). A. A. Mohsen and E. A. Hashish, Geophysical Prospecting
56
D. Vanderbilt, J. Phys. Chem. Solids 61, 147 (2000). 42, 131 (1994).
57 81
D. Sánchez-Portal, I. Souza, and R. M. Martin, in Fun- T. Ferrari, D. Perciante, and A. Dubra, Journal of the
damental Physics of Ferroelectrics, edited by R. Cohen Optical Society of America A 16, 2581 (1999).
82
(American Institute of Physics, Melville, 2000), vol. 535 M. Abramowitz and I. A. Stegun, Handbook of Mathemath-
of AIP Conference Proceedings, pp. 111–120. ical Functions (Dover Publications, New York, 1964).
58 83
S. Dall’Olio and R. Dovesi, Phys. Rev. B 56, 10105 (1997). K. Binder and D. W. Heermann, Monte Carlo Simulation
59
E. Yaschenko, L. Fu, L. Resca, and R. Resta, Phys. Rev. in Statistical Physics (Springer Verlag, Berlin, 1992).
84
B 58, 1222 (1998). In the present example, further savings can be achieved by
60
F. Mauri, G. Galli, and R. Car, Phys. Rev. B 47, 9973 extending the arrays f(i) and Lf(i) themselves, eliminating
(1993). the index translations and facilitating the paralelization.
61
J. Kim, F. Mauri, and G. Galli, Phys. Rev. B 52, 1640 Another efficient possibility is provided in Fortran90 by
(1995). the intrinsic cshift operation. However, these approaches
62
U. Stephan, D. A. Drabold, and R. M. Martin, Phys. Rev. are more complicated in other cases, like atomic-neighbor
B 58, 13472 (1998). list constructions, while the double index algorithm is quite
63
W. Kohn, Phys. Rev. 115, 809 (1959). general.
64 85
E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Dem- In Fortran, we alternate the order of i and k, to store the
mel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammar- row elements consecutively in memory. If the number of
ling, A. McKenney, et al., LAPACK Users’ Guide (SIAM, nonzero elements fluctuates largely for different rows, we
Philadelphia, 1999). also store all the rows consecutively as a large single vector.
65 86
P. Pulay, Chem. Phys. Lett. 73, 393 (1980). X. Gonze, R. Stumpf, and M. Scheffler, Phys. Rev. B 44,
66
P. Pulay, J. Comp. Chem. 13, 556 (1982). 8503 (1991).
67
M. P. Allen and D. J. Tildesley, Computer Simulation of
Liquids (Oxford Univ. Press, Oxford, 1987).