Professional Documents
Culture Documents
On Representing Chemical Environment
On Representing Chemical Environment
are differentiability with respect to moving the atoms, and invariance to the basic symmetries
of physics: rotation, reflection, translation, and permutation of atoms of the same species. We
demonstrate that certain widely used descriptors that initially look quite different are specific cases
of a general approach, in which a finite set of basis functions with increasing angular wave numbers
are used to expand the atomic neighbourhood density function. Using the example system of small
clusters, we quantitatively show that this expansion needs to be carried to higher and higher wave
numbers as the number of neighbours increases in order to obtain a faithful representation, and
that variants of the descriptors converge at very different rates. We also propose an altogether
new approach, called Smooth Overlap of Atomic Positions (SOAP), that sidesteps these difficulties
by directly defining the similarity between any two neighbourhood environments, and show that
it is still closely connected to the invariant descriptors. We test the performance of the various
representations by fitting models to the potential energy surface of small silicon clusters and the
bulk crystal.
constructed such that each term is invariant to the per- of completeness and faithfulness of various descriptors
mutation of identical atoms. Computer code is available and SOAP, and also show an explore their performance
that generates the permutationally invariant polynomi- in fitting models for small silicon clusters and the bulk
als automatically7 (up to ten atoms), but this approach crystal.
still does not allow for varying number of atoms in the
database of configurations.
In order to generate interatomic potentials for solids II. POTENTIAL ENERGY SURFACE FITTING
or large clusters, capable of describing a wide variety
of conditions, the number of neighbours that contribute
to the energetics of an atom has to be allowed to vary, The main motivation behind this paper is to define and
with the symmetry invariant descriptors remaining con- assess a family of invariant descriptors to be used for fit-
tinuous and differentiable. Even though it is possible ting interatomic potentials, or potential energy surfaces
to allow the dimensionality M to change with the num- (PESs). In the construction of potentials for materials
ber of neighbours, for the purpose of function fitting it applications, the short range part of the total energy is
is more practical that M remains the same. None of decomposed into atomic contributions,
the traditional representations fulfil this criterion. Re-
(n) (n)
X
cently, however, a number of new, promising descriptors Eshort = ε(q1 , . . . , qM ),
have been proposed together with potential energy sur- n
faces based on them5,8,24,28,29 . Behler and Parrinello’s
“symmetry functions”5 were used to generate potentials where ε is the contribution of the nth atom, and q(n) =
(n) (n)
for silicon30 , sodium31 , zinc oxide32 and water33 amongst (q1 , . . . , qM ) is a system of descriptors characterizing
others; Bartok et al. employed the bispectrum8 to fit a the local atomic environment.
many-body potential for crystalline phases and defects in Traditionally, such atomic energy functions are defined
diamond. Sanville et al. used a subset of internal coor- in closed form. However, recently, there has been a lot
dinates to fit silicon potentials28 . Rupp et al. used the of interest in using more flexible, nonparametric PESs,
ordered eigenvalues of the Coulomb-matrix to fit atom- derived from computing the total energy and its deriva-
isation energies29 of a set of over 7000 small organic tives at a certain set {q(1) , . . . , q(N ) } of “training” con-
molecules. At this point it is not clear which method figurations using first principles calculations. A crucial
of representing atomic neighbourhoods will prove to be question then is how to fit ε to the computed datapoints.
optimal in the long term. We attempt to disentangle this The simplest approach is to use a linear fit, while Behler
issue from the rather complex details of generating first and Parrniello advocate using artificial neural networks
principles data and fitting PES, and separately consider (NN)5 , and Bartók et al. use Gaussian Approximation
the problem of constructing good descriptors. Potentials8 . However, ultimately, each of these proce-
The most well-known invariants describing atomic dures result in a PES of the form
neighbourhoods are the bond-order parameters originally
proposed by Steinhardt et al.34 Here we show that the N
X
αk K q, q(k) ,
bond-order parameters form a subset of a more general ε(q) = (1)
set of invariants called the bispectrum 35 . The formally in- k=1
finite array of bispectrum components provides an over-
complete system of invariants, and by truncating it one where the coefficient vector α = (α1 , . . . , αN ) is deter-
obtains representations whose sensitivity can be refined mined by the fitting procedure, and K is a fixed (non-
at will. We relate the bispectrum to the representation linear) function, called the kernel, whose role, intuitively,
proposed by Behler et al.5,36 , and show that, together is to capture the degree of similarity between the atomic
with another descriptor set described below, their angu- environments described by its two arguments. Clearly,
lar parts are all simple polynomials of the same set of then, the choice of descriptors, in particular, their invari-
canonical invariants. ance to symmetries, as well as the choice of kernel, are
The paper is organised as follows. In section II we critical ingredients to obtaining good quality PESs.
briefly recall how potential energy surfaces are con- In general, the kernel K can be interpreted as a co-
structed using invariant descriptors. In section III we variance function, and therefore it must be symmet-
describe a number of descriptors, starting with a sim- ric and positive definite (meaning that K(q, q0 ) =
ple distance metric between atomic configurations which K(q0 , q)P and
P for any (k) non-zero vector α of coeffi-
will be used as a reference to assess the faithfulness of cients, k ` αk α` K(q , q(`) ) > 0). Rasmussen and
37
all other descriptors but which itself is not differentiable. Williams present a number of such kernels, some of the
In Section IV we introduce a new, continuous and dif- simplest ones being the following.
ferentiable distance metric for constructing potential en- The dot-product (DP) kernel is defined as
ergy surfaces, called Smooth Overlap of Atomic Func- X
tions (SOAP), which has superior properties. In section KDP q, q0 = qj qj0 , (2)
V we show numerical tests that help assess the degree j
3
which, when substituted into equation (1), results in permutations of atoms change the order of rows and
N N
columns. For example, swapping atoms 1 and 2 results
X X (k)
X X (k) in the transformed matrix
ε(q) = αk qj qj = qj αk qj
k=1 j j k=1
X r2 · r2 r2 · r1 · · · r2 · rN
= qj βj ≡ q · β, r1 · r2 r1 · r1 · · · r1 · rN
j . (6)
.. .. .. ..
. . . .
i.e. the linear regression on the descriptor elements with
rN · r2 rN · r1 · · · rN · rN
coefficient vector β.
When using artificial neural networks with NH hidden To compare two structures using their Weyl matrices
units, the atomic energy function ε is given by Σ and Σ0 , we define a reference distance metric
NH dref = min ||Σ − PΣ0 PT ||, (7)
P
X
ε(q) = b + vj h(q, uj ), (3)
j=1 where P is a permutation matrix and the minimisation
is over all possible permutations. This metric is not dif-
where b is the bias, v the vector of unit weights, h is ferentiable at locations where the permutation that min-
the transfer function, and {uj }N 37
j=1 the unit parameters .
H
imises (7) changes. It would also be intractable to calcu-
In the limit of an infinite number of hidden units, for late exactly for large numbers of atoms, but nevertheless
specific transfer functions it is possible to reformulate we will use this metric to assess the faithfulness of other
equation (3) in the form of equation (1) with well-defined representations for a small system. Other, differentiable
covariance Pfunctions37–39 . For example, for h(q, u) = invariants shown later in this paper are, however, closely
tanh(u0 + uj qj ) the corresponding kernel is39 related to the elements of Σ.
2 One way to generate permutationally invariant dif-
KNN q, q0 ∼ −|q − q0 + const.
ferentiable functions of the Weyl matrix is to compute
its eigenvalues; indeed, a very similar descriptor was re-
Finally, the squared exponential (SE) kernel that we
cently used to fit the atomisation energies of a large set of
have used in the past with the Gaussian Approximation
molecules29 . However, the list of eigenvalues is very far
Potentials8 and in some of the examples below is
from being complete, since there are only N eigenvalues,
X (q − q 0 )2 whereas the dimensionality of the configuration space of
KSE q, q0 = exp −
. (4) N neighbour atoms is 3N − 3 (after the rotational sym-
j
2σj2
metries are removed). It is also unclear how to make the
descriptors based on the eigenspectrum continuous and
differentiable as the number of neighbours varies.
III. DESCRIPTORS
The Z-matrix, or internal coordinates, is a customary
set of rotationally invariant descriptors usually used to
Among the applications mentioned in the introduc- describe the geometry of entire molecules, but it is not
tion, some require representing the geometry of an en- invariant to permutations of atoms. The Z-matrix is com-
tire molecule, while for others one needs to describe the plete, but in contrast to the Weyl matrix of basic invari-
neighbourhood of an atom perhaps within a finite cut- ants which are based solely on bond lengths and bond
off distance. While these two cases are closely related, angles, it is a minimal set of descriptors that also con-
the descriptors for one are not directly suitable for the tains some dihedral angles.
other, although often the same idea can be used to derive Another straightforward way to compare structures is
representations for either case. Below, we focus on repre- based on pairing the atoms from each and finding the
senting the neighbour environment of a single atom, but optimal rotation that brings the two structures into as
for some cases we briefly mention easy generalisations close an alignment as possible. For each pair of structures
that yield global molecular descriptors. {ri }N 0 N
i=1 and {ri }i=1 , it is possible to order the atoms ac-
For N neighbouring atomic position vectors cording to their distance from the central atom (or centre
{r1 , r2 , . . . , rN } taken relative to a central atom, of mass, in case we want to compare entire molecules) and
the symmetric matrix compute
r1 · r1 r1 · r2 · · · r1 · rN N
X
r2 · r1 r2 · r2 · · · r2 · rN ∆(R̂) = |ri − R̂r0i |2 ,
Σ= (5)
.. .. . . .. j
. . . .
rN · r1 rN · r2 · · · rN · rN where R̂ represents an arbitrary rotation (including the
possibility of a reflection). We can then define the dis-
is, according to Weyl40 , an over-complete array of basic tance between two configurations as
invariants with respect to rotation, reflection and trans-
∆ = min ∆(R̂). (8)
lation. However, Σ is not a suitable descriptor, because R̂
4
This distance clearly has all the necessary invariances and list only affects the order of the summation. To simplify
completeness properties, but, like dref , it is not suitable the following derivation, for now we omit the information
for parametrising potential energy surfaces, because it on the radial distance to the neighbours, but will show
is again not differentiable: the reordering procedure and later how the radial information can be included. The
the minimisation over rotations and reflections introduce atomic neighbour density function can then be expanded
cusps. in terms of spherical harmonics:
In the field of molecular informatics, one popular de-
scriptor is based on the histogram of pairwise atomic ∞ X
X l
distances41 , similar to Valle’s crystal fingerprint42 . We ρ(r̂) = clm Ylm (r̂), (10)
will not consider these here, because they are unsuit- l=0 m=−l
able for fitting PES, as they are clearly not complete:
e.g. from six unordered distance values it is not neces- where r̂ is the point on the unit sphere corresponding to
sarily possible to construct a unique tetrahedron, even the direction of the vector r, thus ρ(r̂) is the projection
though the number of degrees of freedom is also six. of ρ(r) onto the unit sphere, S 2 .
Figure 1 shows two tetrahedra that were constructed such The properties of functions defined on the unit sphere
are related to the group theory of SO(3), the group of
2.532 2.532 three dimensional rotations. Spherical harmonics form
an orthonormal basis set for L2 (S 2 ), the class of square
2.6 integrable functions on the sphere:
2.215
61
2.215
2.177
2.
26
17
26
3.
7
3.
2.6
As a first step in deriving continuous invariant repre- The quantities Qlm introduced by Steinhardt et al.34
sentations of atomic environments, we define the atomic are proportional to the coefficients clm . Dividing by N ,
neighbour density function associated with a given atom the number of neighbours of the atom (within a finite
as cut-off distance) provides the atomic order parameters
X
ρ(r) = wZi δ(r − ri ), (9) 1 X
Qlm = Ylm (r̂i ). (12)
i N i
where the index i runs over the neighbours of the atom
within some cutoff distance, wZi is a unique weight fac- Furthermore, averaging (12) over atoms in the entire sys-
tor assigned according to the atomic species of i, and ri tem gives a set of global order parameters
is the vector from the central atom to neighbour i. For
clarity, we will omit the species weights from now on, and 1 X
Q̄lm = Ylm (r̂ii0 ),
assume a single atomic species, but none of our results Nb 0
ii
rely on this. Determining which neighbours to include
in the summation can be done by using a simple binary where Nb is the total number of atom pairs included in
valued, or a smooth real valued cutoff function of the the summation, and we wrote rii0 for the vector connect-
interatomic distance, or via a more sophisticated pro- ing atom i to its neighbour i0 . Both sets are invariant
cedure, e.g. Voronoi analysis34 . The atomic neighbour to permutations of atoms and translations, but still de-
density is already invariant to permuting neighbours, be- pend on the orientation of the reference frame. However,
cause changing the order of the atoms in the neighbour rotationally invariant combinations can be constructed
5
and (13) shows that the second-order bond-order param- Writing out the block diagonal matrix in the square
eters are related to the power spectrum via the simple brackets as
scaling
1/2
D|l1 −l2 |
4π
Ql = pl .
2l + 1
The power spectrum is clearly not a complete descrip-
D|l1 −l2 |+1
lM
1 +l2
tor for a general function f (r̂) on the sphere, for example l
D ≡ ,
consider the two different functions
l=|l1 −l2 |
..
.
f1 = Y22 + Y2−2 + Y33 + Y3−3
and
Dl1 +l2
f2 = Y21 + Y2−1 + Y32 + Y3−2 ,
which both have the same power spectrum, p2 = 2 and
p3 = 2 (with all other components equal to zero). How- we see that each block selects a particular slice of the
ever, for the restricted class of functions which are sums vector in equation (20), which transforms according to a
of a limited number of delta functions (such as the atomic given Dl matrix. We give a new symbol to these slices,
neighbour density ρ in equation (9)), the power spectrum gl l1 l2 , so that the original vector is their direct sum
elements turn out to be polynomials of the basic invari- lM
1 +l2
ants of Weyl. Using numerical experiments we demon-
Cl1 l2 (cl1 ⊗ cl2 ) ≡ gl l1 l2 ,
strate in section V that for a fixed number of neighbours
l=|l1 −l2 |
a certain set of power spectrum components is likely to
be over-complete. and each gl l1 l2 transforms under rotation as
gl l1 l2 → Dl gl l1 l2 .
C. The bispectrum
Analogously to the power spectrum, we can now define
We generalise the concept of the power spectrum to the bispectrum as the collection of scalars
obtain a larger set of invariants via coupling different an-
gular momentum channels35,52 . Let us consider the direct bl l1 l2 = c†l gl l1 l2 ,
product cl1 ⊗ cl2 , which transforms under a rotation as
which are invariant to rotations:
cl1 ⊗ cl2 → Dll ⊗ Dl2 (cl1 ⊗ cl2 ) .
†
It follows from the representation theory of compact bl l1 l2 = c†l gl l1 l2 → cl Dl Dl gl l1 l2 = c†l gl l1 l2 .
groups that the direct product of two irreducible rep-
It follows from equation (17) that those elements of the
resentations can be decomposed into a direct sum of ir-
bispectrum where l1 + l2 + l is odd change sign under
reducible representations53 . In case of the SO(3) group,
reflection about the origin. If invariance to reflection is
the direct product of two Wigner matrices can be decom-
required, we take the absolute value of these components
posed into a direct sum of Wigner matrices,
or omit them from the descriptor.
l1 l2
X
l l l1 l2
∗ l l1 l2 Rewriting the bispectrum formula as
Dm 1m
0 Dm m0 =
2
Dmm 0 Cmm 1 m2
Cm0 m0 m0 ,
1 2 1 2
l,m,m0
l l1 l2
(18) X X X
l l1 l2 bl l1 l2 = c∗lm Cmm
l l1 l2
c c
1 m2 l1 m1 l2 m2
,
where Cmm 1 m 2
denote the Clebsch-Gordan coefficients
m=−l m1 =−l1 m2 =−l2
or, using more compact notation,
(21)
the similarity to the third-order bond-order parameters
lM
1 +l2
†
Dll ⊗ Dl2 = Cl1 l2 Dl Cl1 l2 , becomes apparent. Indeed, the Wigner 3jm-symbols are
(19)
l=|l1 −l2 |
related to the Clebsch-Gordan coefficients through
D. Radial basis
l
X l
X
bl 0 l = N c∗lm δmm2 clm2 =
Thus far we neglected the distance of neighbouring
m=−l m2 =−l
atoms from the central atom by using the unit-sphere
l
X projection of the atomic environment. One way to intro-
=N c∗lm clm = N pl , duce radial information is to complement the spherical
m=−l
harmonics basis in equation (10) with radial basis func-
tions gn 54 ,
therefore, l
XX X
ρ(r) = cnlm gn (r)Ylm (r̂). (24)
n l=0 m=−l
√ p
Ql ∝ pl ∝ bl 0 l
If the set of radial basis functions is not orthonormal, i.e.
W l ∝ bl l l .
hgn |gm i = Snm 6= δnm , after obtaining the coefficients
c0nlm with
The first few terms of the power spectrum and bispec- c0nlm = hgn Ylm |ρi,
trum for an atom with three neighbours are shown below,
the elements cnlm are given by
where θii0 is the angle between the bonds to neighbours
i and i0 , and the sums are over all the neighbours. X
S −1 n0 n c0n0 lm .
cnlm =
n0
g n (r)
g n (r)
0 r cut
r
0 r1 r2 r cut
r FIG. 3. Example of radial basis functions gn (r), as defined in
equation (25) for n = 1, 2, 3, 4.
FIG. 2. Example of weakly overlapping radial basis functions
gn (r), cf. equation (24). Atoms 1 and 2 at distance r1 and
r2 from the centre become decoupled as their contribution to Another way to avoid radial decoupling is to define
the power spectrum or bispectrum elements is weighed down the rotational invariants in such a way that they couple
by the product gn (r1 )gn (r2 ), which is rather small for all n. different radial channels explicitly, for example, as
of s are defined so that in equation (15). In this case the arguments represent a
rotation by θ0 around the vector pointing in the (φ, θ)
s1 = r0 cos θ0 direction, which can be transformed to the conventional
s2 = r0 sin θ0 cos θ Euler-angles, and j takes half-integer values.
s3 = r0 sin θ0 sin θ cos φ The hyperspherical harmonics form an orthonormal
basis for L2 (S 3 ), thus the expansion coefficients cjm0 m
s4 = r0 sin θ0 sin θ sin φ.
can be calculated via
We choose to use the projection cjm0 m = hUm
j
0 m |ρi,
φ = arctan(y/x)
x where h.|.i denotes the inner product defined on the four-
r ≡ y → θ = arccos(z/|r|),
dimensional hypersphere:
z θ0 = π|r|/r0 Z π Z π Z 2π
where r0 > rcut is a parameter, thus rotations in three- hf |gi = dθ0 sin2 θ0 dθ sin θ dφ
0 0 0
dimensional space correspond to a subset of rotations
f ∗ (θ0 , θ, φ) g(θ0 , θ, φ).
in four-dimensional space. This projection is somewhat
similar to a Riemann projection, except in that case θ0 Although the coefficients cjm0 m have two indices besides
would be defined as j, for each j it is convenient to collect them into a sin-
θ0 = 2 arctan(|r|/2r0 ), gle vector cj . Similarly to the three-dimensional case,
rotations act on the hyperspherical harmonic functions
implying as
j
X j j
θ0 ≈ |r|/r0 for |r| r0 . R̂Um 0m =
1
Rm0 m1 m0 m2 Um 0m ,
2
1 1 2 2
m02 m2
In contrast to the Riemann projection, our choice of θ0
j
allows more sensitive representation of the entire radial where the matrix elements Rm 0 m m0 m
1 2
are given by
1 2
range. The limit r0 = rcut projects each atom at the
j j j
cutoff distance to the South pole of the 4-dimensional Rm 0 m m0 m = hUm0 m |R̂|Um0 m i.
1 1 2 1
2 2 1 2
sphere, thus losing all angular information. Too large
an r0 would project all positions onto a small surface Hence the rotation R̂ acting on ρ transforms the coeffi-
area of the sphere around the North pole, requiring a cient vectors cj according to
large number of basis functions to represent the atomic
environment. In practice, a large range of r0 values works cj → R j cj .
well, in particular, we used r0 = 43 rcut .
The unitary Rj matrices are the SO(4) analogues of the
To illustrate the procedure, Figure 4 shows the Rie-
Wigner-matrices Dl of the SO(3) case above, and it can
mann projections for one and two dimensions, which can
be shown that the direct product of the four-dimensional
be easily drawn.
rotation matrices decomposes according to
jM
1 +j2
†
Rjl ⊗ Rj2 = Hj1 j2 Rj Hj1 j2 ,
j=|j1 −j2 |
The SO(4) bispectrum is invariant to rotations of four- where the cutoff function is defined as
dimensional space, which include three-dimensional rota- (h i
tions. However, there are additional rotations, associated cos rπr + 1 /2 for r ≤ rcut
fc (r) = cut .
with the third polar angle θ0 , which, in our case, repre- 0 for r > rcut
sents the radial information. In order to eliminate the
unphysical invariance with respect to rotations along the Different values of the parameters η, Rs , ζ, λ can be used
third polar angle, we modify the atomic density as to generate an arbitrary number of invariants.
X 1
ρ(r) = δ(0) + δ(r − ri ),
i 0.9 η = 0. 001 Boh r− 2
η = 0. 010 Boh r− 2
0.8
i.e. we add the central atom — with the coordinates η = 0. 020 Boh r− 2
(0, 0, 0) — as a fixed reference point, anchoring the neigh- 0.7
η = 0. 035 Boh r− 2
bourhood. The resulting invariants Bj j1 j2 have only η = 0. 060 Boh r− 2
three indices, but contain both radial and angular in- 0.6 η = 0. 100 Boh r− 2
G 2 (r )
formation, and have the required symmetry properties. η = 0. 200 Boh r− 2
0.5
There are no adjustable parameters in the definition of η = 0. 400 Boh r− 2
these invariants, apart from the projection parameter r0 0.4
n
0
Z Z
= dR̂ ρ(r)ρ0 (R̂r)dr , (30)
Z Z Z
l0 i0
X X
0 0 ∗
S(R̂) ≡ S(ρ, R̂ρ ) = dr ρ(r)ρ (R̂r) = Dm 0 m00 (R̂) dr ci∗
lm (r)cl0 m0 (r) dr̂ Ylm (r̂)Yl0 m00 (r̂) =
i,i0 l, m
l0 , m0 , m00
X X X
= I˜mm
l l
0 (α, ri , ri0 )Dmm0 (R̂) =
l
Imm l
0 Dmm0 (R̂),
where the integral of the coefficients is radial basis functions gn (r), the atomic neighbour density
function becomes
I˜mm
l
0 (α, ri , ri0 ) =
X X
∗
ρ(r) = exp −α|r − ri |2 = cnlm gn (r)Ylm (r̂),
4π exp −α(ri2 + ri20 )/2 ιl (αri ri0 ) Ylm (r̂i )Ylm
(r̂i0 ), i nlm
(37)
and and similarly, the ρ0 environment is expanded using co-
X efficients c0nlm . If the radial basis functions form an or-
l
Imm 0 ≡ I˜mm
l
0 (α, ri , ri0 ). (33) thonormal basis, i.e.
i,i0 Z
drgn (r)gn0 (r) = δnn0 ,
The rotationally invariant kernel with n = 2 then be-
comes
the sum in equation (33) becomes
Z
k(ρ, ρ0 ) = dR̂ S ∗ (R̂)S(R̂) = l
Imm
X ∗
cnlm (c0nlm0 ) .
0 = (38)
n
Z
X ∗ λ
= Imm0 Iµµ0 dR̂ D∗ (R̂)lmm0 D(R̂)λµµ0 =
l
l, m, m0
The significance of this result becomes apparent when
λ, µ, µ0 substituting equation (38) into equation (34) to obtain
X
l
∗ l
= Imm 0 Imm 0, (34) X
l,m,m0
k(ρ, ρ0 ) = cnlm (c0nlm0 )∗ (cnlm )∗ c0n0 lm0
n,n0 ,l,m,m0
where we used the orthogonality of the Wigner-matrices.
X
≡ pnn0 l p0nn0 l , (39)
Although in practice we always use n = 2, it is easy n,n0 ,l
to derive the kernel for any arbitrary order n using the
Clebsch-Gordan series in equation (18). For n = 3, using since
the fact that S, as defined in equation (29) is real and X
positive, pnn0 l ≡ cnlm (cn0 lm )∗ , (40)
m
Z
k(ρ, ρ0 ) = dR̂ S(R̂)3 , is exactly the power spectrum (cf. equation (26)), and,
analogously, p0nn0 l is the power spectrum of the primed
which can be shown to be environment. Furthermore, the kernel in equation (39)
is the dot-product of the power spectra, cf. equation (2).
lm0
X
k(ρ, ρ0 ) = l1
Im 0 I
l2
0 I
l lm
0 Cl m l m Cl m0 l m0 .
Analogously, the kernel for n = 3, defined in equation
1 m1 m2 m2 mm 1 1 2 2 1 1 2 2
(35), can be expressed as
(35)
Raising a positive definite function to a positive integer X
power yields a function that is similarly positive defi- k(ρ, ρ0 ) = bn1 n2 nl l1 l2 b0n1 n2 nl l1 l2 , (41)
nite. In our context, raising k to some power ζ ≥ 2 has n1 , n 2 , n
l1 , l2 , l
the effect of accentuating the sensitivity of the kernel to
changing the atomic positions, which we generally found where
to be advantageous in experiments. p Therefore, following X
normalization by dividing by k(ρ, ρ)k(ρ0 , ρ0 ), we define bn 1 n 2 n l 1 l 2 l ≡ cn1 l1 m1 cn2 l2 m2 (cnlm )∗ Cllm
1 m1 l2 m2
,
the general form of what we call the Smooth Overlap of
Atomic Positions (SOAP) kernel as cf. equation (21), and b0 is analogously the bispectrum of
!ζ the primed environment. In Figure 7 we show the conver-
k(ρ, ρ0 ) gence of the similarity kernel (39) with increasing number
0
K(ρ, ρ ) = p , (36) of angular and radial basis functions in the expansion.
k(ρ, ρ)k(ρ0 , ρ0 ) In this section we started out by taking a different ap-
proach to the problem of comparing neighbour environ-
where ζ is any positive integer. ments, defining the SOAP similarity kernel (36) directly,
rather than going via a descriptor. Equations (39) and
(41), however, reveal the relation between SOAP and the
B. Radial basis and relation to spectra SO(3) power spectrum and bispectrum: SOAP is equiv-
alent to using the SO(3) power or bispectrum descriptor
l
Note that Imm 0 needs to be computed for each pair together with Gaussian atomic neighbour density contri-
of neighbours, which can become expensive for a large butions and a dot product covariance kernel. The ad-
number of neighbours. If we expand equation (32) using vantage of SOAP over the previous descriptors is that it
13
eliminates some of the ad hoc choices that were needed surfaces for silicon clusters and the bulk crystal using our
before, while retaining control over the smoothness of the Gaussian Approximation Potentials framework8 .
similarity measure using α, the width of the Gaussians
in defining the atomic neighbourhood density in equation
(32) and its sensitivity using the exponent ζ in equation A. Reconstruction experiments
(36).
Recall that the elements ri · ri0 of Σ defined in equa-
0
10
tion (5) are an over-complete set of basic invariants,
10
−2 which, in the case of atoms scattered on the surface of a
unit sphere (|ri | = 1), are the cosines of the bond angles,
−4
10
θii0 . Thus, the angular part of all descriptors in section
Differ ence in K (ρ, ρ ′ )
10
−6 III are permutationally invariant functions of the basic
invariants in Σ, and depending on the actual number of
−8
10
0 5 10 15 20 25 30 descriptor elements used, they may form an incomplete,
lmax complete or over-complete representation of the atomic
0
10 environment. In practice, one would like to use as few
−2
10
descriptors as possible, partly due to computational cost,
−4
but also because descriptors that use high exponents of
10
the angles are likely to lead to less smooth PESs, as will
−6
10
be shown below.
−8
10 Given N neighbours, the number of independent de-
−10
10 grees of freedom in the neighbourhood configuration is
5 10 15 20 25
nma x 3N − 3, so we need at most this many algebraically in-
dependent descriptor elements. But because the alge-
FIG. 7. Convergence of the similarity kernel K(ρ, ρ0 ) of two braic dependency relationships between the descriptor el-
arbitrary atomic environments with 15 neighbours at different ements is in general complicated, it is unclear how many
sizes of basis expansion. We used the parameters α = 0.4 and descriptor elements are actually needed in order to make
ζ = 1.0, and the converged kernel is K(ρ, ρ0 ) = −0.842735. the descriptor complete and thus able to uniquely specify
The top panel shows the convergence of the kernel, evaluated
an atomic environment of the N neighbours. However,
according to equation (34), with increasing number of spher-
ical harmonics functions. The bottom panel shows the con-
it is possible to conduct numerical experiments in which
vergence of the kernel, evaluated according to equation (39), we compare the descriptors of a fixed target with that of
with increasing number of radial basis functions, while keep- a candidate structure and minimise the difference with
ing lmax = 16. respect to the atomic coordinates of the candidate. In
this way we determine if a representation is likely to be
complete or not, and in the latter case to characterise the
degree of its faithfulness.
Descriptor matching procedure. The global minimum
V. NUMERICAL RESULTS of the descriptor difference between the target and the
candidate is zero and is always attained on a manifold due
We have derived methods to transform atomic neigh- to the symmetries built into the descriptors, but for an
bourhoods to descriptors that are invariant under the re- incomplete descriptor, many inequivalent structures will
quired symmetry operators, however, their relative merit also appear equivalent, thus enlarging the dimensionality
for fitting potential energy surfaces remains to be seen. of the global minimum manifold. Furthermore, it can
The faithfulness of the representation, i.e. that no gen- be expected that the descriptor difference function has a
uinely different configurations should map onto the same number of local minima.
descriptor, is of particular interest. As the inverse trans- In our experiments we tried to recover a given tar-
formation from the descriptor to atomic coordinates — get structure after randomising its atomic coordinates.
apart from the simplest cases — is not available, we de- For each n (4 ≤ n ≤ 19) we used 10 different Sin clus-
scribe numerical experiments in which we attempt to re- ters as targets, obtained from a tight-binding59 molecu-
construct atomic coordinates from descriptors, up to ro- lar dynamics trajectory run at a temperature of 2000 K.
tations, reflections and permutations. Descriptors which For each target cluster, we selected one atom as the ori-
severely fail in this test are unlikely to be good for fit- gin, randomised the positions of its neighbours by some
ting potential energy surfaces, because entire manifolds amount, and then tried to reconstruct the original struc-
of neighbour environments that are genuinely different ture by minimising the magnitude of the difference be-
with widely varying true energies will be assigned the tween the descriptors of the fixed target and the candi-
same descriptor, resulting in fitted PES with many de- date as the atomic positions of the latter were varied.
generate modes. We compare and test the performance In contrast to a general global minimum search prob-
of the various descriptors by generating potential energy lem, we have the advantage of knowing the target value
14
incomplete.
In order to assess the success of the reconstruction
procedure (i.e. whether the target and candidate con- 4 6 8 10 12 14 16 18 20
figurations are genuinely different or not) we employed n
the reference measures defined in equations (7) and (8).
1
However, in some cases it was difficult or impossible to 10
2.1
7
2.1
2
2.3
2.43 2.23
2.0
2.38
63 2.75
2.40
2.65
4. 7
2.67
2.37 4.3
2.67
5
5 2.8
2.3
2 2.4 6
2.3
2.32
2.35
bers of components: 51 in total for the SO(3) bispectrum
and PB descriptors and 50 for the AFS and SO(3) power
2 spectrum. This corresponds to a truncation of the SO(4)
FIG. 8. Two Si8 clusters that differ by dref = 4.1Å . The bispectrum with 2jmax = 5 (the factor of 2 on account
black atoms are taken as the origin in each environment, i.e.
of the half-integer nature of j), the SO(3) bispectrum
the centres of rotations. InPterms of Parrinello-Behler type
0 2 with lmax = 4 and nmax = 3, the PB descriptor with its
descriptors, the difference α (Gα − Gα ) between the two
−7
atomic environments is 6 · 10 . The bond lengths are shown published parameters57 and the AFS and SO(3) power
in Ångströms. spectrum using lmax = 9 and nmax = 5. We note that
in case of the PB descriptor the band limit of the angu-
In the first set of reconstruction experiments, in or- lar descriptors (corresponding to our lmax or jmax ) was
der to provide a fair comparison, the truncation of the ζmax = 16 and only the values ζ = 1, 2, 4, 16 are used.
formally infinite set of descriptors was chosen in such a Figure 9 shows the quality of reconstruction for dif-
way that the finite descriptors had roughly equal num- ferent cluster sizes, based on the PB, AFS, SO(3) power
15
2.6
0
2.6
2.3
3
2.68
the 6Å cutoff, but close to it. In order to separate out
6
2.68
2.6
2.81
3
2.31
2.28
2.28 this effect, we repeated the reconstruction experiments
2.
0
57
0
2. 5.8 with a radial cutoff of 9Å (omitting the PB descriptor
5.8 57
now since there is no published parameter set for this
2.50
2.48
2.50
2.48
2.6
6
2.2
2.6
6
2.2 initial randomisation. The peak near n = 8 is now ab-
2
2
2.48 2.48
sent, and the transition from faithful reconstruction (for
n ≤ 9 for the SO(3) power spectrum and AFS, and for
2 n ≤ 12 for the SO(4) bispectrum) to failure for larger n
FIG. 10. Two Si8 clusters that differ by dref = 0.7Å . The is much clearer.
reference atom, i.e. the centre of the rotation, is coloured
Since all the descriptors are likely to be over-complete
black. The only difference between the two clusters is the
relative position of the furthest atom, A.
when the infinite series of the basis set expansion is not
truncated, the reconstruction quality is expected to in-
crease with increasing descriptor length. To verify this,
Figure 12 shows the reconstruction quality of the AFS
A FS descriptor for varying truncations of the angular part
10
1
S O(4) BS
S O(3) P S
of the basis set. The representation becomes monotoni-
S O(3) BS cally better for higher angular resolutions. However, this
comes at the price of introducing ever more highly os-
Aver age d REF
10
0
cillating basis functions, which might be less and less
suitable for fitting generally smooth potential energy sur-
faces.
−1
Figure 12 also shows the achieved reference values
10
when using the SOAP similarity measure. In this case,
rather than minimising the difference between descrip-
tors, we optimised the candidate structure until its nor-
10
−2
malised similarity to the target as given by equation (36)
was as close to unity as possible. In contrast to the
other descriptors, SOAP with the modest band limit of
4 6 8 10 12 14 16 18 20 lmax = 6 performs perfectly for all structures, without
n showing any degradation for larger numbers of neigh-
bours.
FIG. 11. Difference between the target and reconstructed
To verify that the above results are not affected by
structures after randomisation by 1.6Å and minimisation, as
a function of the number of atoms in the cluster, n, averaged
artefacts of the minimisation procedure, e.g. getting
over 10 targets for each cluster size. The cutoff was 9Å. The stuck, Figure 13 shows the convergence of the reference
line types corresponding to the different descriptors are the measure dref during a minimisation as well as the cor-
same as in Figure 9. responding convergence of the target function (the dif-
ference between the target and candidate descriptors).
There was no difficulty in converging the target function
spectrum, SO(3) bispectrum and SO(4) bispectrum as to zero (the global minimum) for any of the complete (or
given by the reference distance dref achieved, averaged over-complete) descriptors or SOAP, while the reference
over 10 reconstruction trials for each cluster size n. The similarity converged to a non-zero value for incomplete
general trend is the same for all descriptors: as the num- descriptors.
ber of neighbours increases, the average dref increases,
and thus the faithfulness of the reconstruction decreases.
Noting that the stopping criterion for the reconstruc- B. Gaussian Approximation Potentials
tion process was dref < 10−2 Å2 , larger randomisation of
the initial atomic coordinates (bottom panel) reveals the Our main motivation for assessing different approaches
poor representation power for all descriptors using this to representing atomic neighbour environments is to de-
parameter set for n > 10, and the neighbour configura- termine their efficacy for generating interatomic poten-
tion becomes impossible to determine from the descrip- tials. Therefore, as a final test, we fitted a series of inter-
tor. atomic potentials for Si3−19 , based on different descrip-
The poor quality of representation is partly at- tors, using our Gaussian Approximation Potential (GAP)
16
1
10
l m ax = 4
AFS
1
10 l m ax = 6
l m ax = 8
4
0 10
l m ax = 10 10
l m ax = 12
Aver age d REF
0 2
10 10
target
d REF
0
10
−1
10
−1
10 −2
10
PB −4
AF S 10
−2
−2
10 10 SO (4) B S
−6
10
SO (3) PS
SO (3) B S −8
10
−3
10 SOAP 0
10
2
10
4
10
−3
10
0 1 2 3 4
4 6 8 10 12 14 16 18 20 10 10 10 10 10
n numb er of m inim isation steps
1
10
0.4
A FS S O(4) BS S O(3) P S S OA P TABLE III. Quality of the GAP potential energy surface using
0.38 different descriptors and angular band limits, as measured
by the RMS energy and force errors. The fitting database
RMS for ce er r or (eV/ Å)
0.36
contained 8000 atomic neighbourhoods in Sin clusters with
3 ≤ n ≤ 19. The units of the RMS errors of the energy and
0.34
force are meV/atom and eV/Å, respectively.
0.32
Angular band limit RMS(e) RMS(f)
0.3
lmax
6 50.0 0.37
0.28 AFS 8 47.0 0.35
nmax = 6 10 45.7 0.34
0.26 12 44.7 0.34
2jmax
0.24
6 27.6 0.28
0.22
SO(4) BS 8 22.8 0.26
r0 = 34 rcut 10 20.2 0.25
0.2 12 19.2 0.26
2000 3000 4000 5000 6000 7000 8000
Databas e s ize lmax
6 41.5 0.36
SO(3) PS 8 37.1 0.34
FIG. 14. Quality of GAP potentials constructed using differ-
nmax = 6 10 35.7 0.33
ent descriptors, as a function of the size of the database used
12 35.0 0.32
for the fit, with all configurations drawn from Sin clusters
with 3 ≤ n ≤ 19. The angular band limit was lmax = 12 in all lmax
cases (equivalent to 2jmax = 12 for the SO(4) bispectrum). 2 21.4 0.23
SOAP 4 17.6 0.21
α = 2, ζ = 4 6 17.0 0.21
8 15.3 0.22
atomic neighbourhood density from smooth Gaussians
and using the dot product kernel — both directly linked
to the construction of SOAP as a smooth similarity mea-
VI. CONCLUSION
sure — in contrast to using Dirac-delta functions for the
density and a squared exponential kernel for the other
descriptors. In this paper we discussed a number of approaches to
representing atomic neighbour environments within a fi-
nite cutoff such that the representation is a continuous
200 N data = 100 and differentiable function of the atomic positions and
150
is invariant to global rotations, reflections, and permu-
C ij [GPa]
which does not suffer from these difficulties and demon- ACKNOWLEDGMENTS
strates excellent faithfulness for any number of neigh-
bours. We also tested the performance of the descriptors APB gratefully acknowledges funding from Magdalene
for fitting models of small silicon clusters and bulk silicon College, Cambridge. Figures 8 and 10 were generated
crystal and found that SOAP leads to a more accurate using AtomEye60 . This work was partly supported by
and much more robust potential energy surface. the European Framework Programme (FP7/2007-2013)
under grant agreement no. 229205.
∗ 27
ab686@cam.ac.uk W. Cencek, K. Szalewicz, C. Leforestier, R. v. Harrevelt,
1
C. J. Pickard and R. J. Needs, Nature Materials 7, 775 and A. v. d. Avoird, Phys. Chem. Chem. Phys. 10 (2008).
28
(2008). E. Sanville, A. Bholoa, R. Smith, and S. D. Kenny, J.
2
D. J. Wales, Energy Landscapes, Applications to Clusters, Phys.: Condens. Matter 20, 285219 (2008).
29
Biomolecules and Glasses (Cambridge University Press, M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von
2004). Lilienfeld, Phys. Rev. Lett. 108, 058301 (2012).
3 30
O. Obrezanova, G. Csányi, J. M. R. Gola, and M. D. J. Behler, R. Martoňák, D. Donadio, and M. Parrinello,
Segall, J. Chem. Inf. Model. 47, 1847 (2007). Phys. Rev. Lett. 100, 185501 (2008).
4 31
M. Segall, J. Comput. Aided Mol. Des. 26, 121 (2011). H. Eshet, R. Khaliullin, T. Kühne, J. Behler, and M. Par-
5
J. Behler and M. Parrinello, Phys. Rev. Lett. 98, 146401 rinello, Phys. Rev. Lett. 108, 115701 (2012).
32
(2007). N. Artrith, T. Morawietz, and J. Behler, Phys. Rev. B 83,
6
L. Raff, R. Komanduri, M. Hagan, and S. Bukkapatnam, 153101 (2011).
33
Neural Networks in Chemical Reaction Dynamics (Oxford T. Morawietz, V. Sharma, and J. Behler, J. Chem. Phys.
University Press, USA, 2012). 136, 064103 (2012).
7 34
B. Braams and J. Bowman, Int. Rev. in Phys. Chem. 28, P. Steinhardt, D. Nelson, and M. Ronchetti, Phys. Rev.
577 (2009). B 28, 784 (1983).
8 35
A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi, R. Kondor, “A novel set of rotationally and transla-
Phys. Rev. Lett. 104, 136403 (2010). tionally invariant features for images based on the non-
9
D. Tromans, Hydrometallurgy 48, 327 (1998). commutative bispectrum,” (2007), arxiv:cs/0701127.
10 36
J. Ischtwan and M. A. Collins, J. Chem. Phys. 100, 8080 J. Behler, J. Chem. Phys. 134, 074106 (2011).
37
(1994). C. E. R. Williams and C. K. I, Gaussian Processes for
11
M. A. Collins, Theor. Chem. Acc. 108, 313 (2002). Machine Learning (MIT Press, 2007).
12 38
T.-S. Ho, T. Hollebeek, H. Rabitz, L. B. Harding, and D. Mackay, Information Theory, Inference, and Learning
G. C. Schatz, J. Chem. Phys. 105, 10472 (1996). Algorithms (Cambridge University Press, 2003).
13 39
G. G. Maisuradze, D. L. Thompson, A. F. Wagner, and R. M. Neal, Bayesian Learning for Neural Networks, Ph.D.
M. Minkoff, J. Chem. Phys. 119, 10002 (2003). thesis, Department of Computer Science, University of
14
Y. Guo, A. Kawano, D. L. Thompson, A. F. Wagner, and Toronto (1995).
40
M. Minkoff, J. Chem. Phys. 121, 5091 (2004). H. Weyl, The Classical Groups, Their Invariants and Rep-
15
X. Huang, B. J. Braams, S. Carter, and J. M. Bowman, resentations (Princeton University Press, 1946).
41
J. Am. Chem. Soc. 126, 5042 (2004). S. J. Swamidass, J. Chen, J. Bruand, P. Phung,
16
X. Zhang, B. J. Braams, and J. M. Bowman, J. Chem. L. Ralaivola, and P. Baldi, Bioinformatics 21, i359 (2005).
42
Phys. 124, 021104 (2006). M. Valle and A. Oganov, Acta Cryst. A66, 507 (2010).
17 43
T. B. Blank, S. D. Brown, A. W. Calhoun, and D. J. D. A. Varshalovich, A. N. Moskalev, and V. K. Kherson-
Doren, J. Chem. Phys. 103, 4129 (1995). skii, Quantum theory of angular momentum (World Scien-
18
H. Gassner, M. Probst, A. Lauenstein, and K. Hermans- tific, 1987).
44
son, J. Phys. Chem. A 102, 4596 (1998). J. P. K. Doye, M. A. Miller, and D. J. Wales, J. Chem.
19
S. Lorenz, A. Groß, and M. Scheffler, Chem. Phys. Lett. Phys. 110, 6896 (1999).
45
395, 210 (2004). L. B. Pártay, A. P. Bartók, and G. Csányi, J. Phys. Chem.
20
A. Brown, B. Braams, K. Christoffel, Z. Jin, and J. Bow- B 114, 10502 (2010).
46
man, J. Chem. Phys. 119, 8790 (2003). C. Chakravarty and R. M. Lynden-Bell, J. Chem. Phys.
21
X. Huang, B. J. Braams, and J. M. Bowman, J. Chem. 113, 9239 (2000).
47
Phys. 122, 44308 (2005). R. D. Mountain and A. C. Brown, J. Chem. Phys. 80, 2730
22
S. Manzhos and T. Carrington, J. Chem. Phys. 125, (1984).
48
194105 (2006). J. Duijneveldt and D. Frenkel, J. Chem. Phys. 96, 4655
23
S. Manzhos, X. Wang, R. Dawes, and T. Carrington, J. (1992).
Phys. Chem. A 110, 5295 (2006). 49
E. R. Hernández and J. Íñiguez, Phys. Rev. Lett. 98,
24
C. Handley and P. Popelier, Journal of Chemical Theory 055501 (2007).
and Computation 5, 1474 (2009). 50
A. van Blaaderen and P. Wiltzius, Science 270, 1177
25
H. Partridge and D. W. Schwenke, J. Chem. Phys. 106 (1995).
(1997). 51
R. Biswas and D. Hamann, Phys. Rev. B 36, 6434 (1987).
26
R. H. Tipping and A. Forbes, J. Mol. Spectrosc. 39, 65
(1971).
19
52 56
R. Kakarala, Triple correlation on groups, Ph.D. thesis, M. A. Caprio, K. D. Sviratcheva, and A. E. McCoy, J.
Department of Mathematics, UC Irvine (1992). Math. Phys. 51, 093518 (2010).
53 57
H. Maschke, Math. Ann. 50, 492 (1898). N. Artrith and J. Behler, Phys. Rev. B 85, 045439 (2012).
54 58
C. D. Taylor, Phys. Rev. B 80, 024104 (2009). K. Kaufmann and W. Baumeister, Journal of Physics B:
55
A. V. Meremianin, J. Phys. A: Math. Gen. 39, 3099 (2006). Atomic, Molecular and Optical Physics 22, 1 (1989).
59
D. Porezag, T. Frauenheim, T. Köhler, G. Seifert, and
R. Kaschner, Phys. Rev. B 51, 12947 (1995).
60
J. Li, Modell. Simul. Mater. Sci. Eng. 11, 173 (2003).