Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

An Algorithm for Geometry Optimization Without

Analytical Gradients

Jon Baker*
Research School of Chemistry, Australian National University, Canberra, ACT 2601 Australia

Received 27 November 1985; accepted 11 April, 1986

A numerical algorithm for locating both minima and transition states designed for use in the ab initio
program package GAUSSIAN 82 is presented. It is based on the RFO method of Simons and coworkers
and is effectively the numerical version of an analytical algorithm (OPT = EF) previously published in
this journal. The algorithm is designed to make maximum use of external second derivative information
obtained from prior optimizations at lower levels of theory. It can be used with any wave function for
which an energy can be calculated and is about two to three times faster than the default DFP algorithm
(OPT = FP) supplied with GAUSSIAN 82.

INTRODUCTION gradient optimization with additional modi-


fications to take into account the fact that the
Geometry optimization to determine the
gradient is not “exact”. Perhaps the most
location of minima and transition states on
commonly used algorithm of this type, when
potential energy surfaces has now become
searching for minima, is a numerical version
standard practice. This is due in no small part
of the Davidon-Fletcher-Powell method,*first
to the availability, in recent years, of effi-
presented over 20 years ago. The nonderiva-
cient computational algorithms to obtain
tive optimization algorithm in GAUSSIAN
analytical gradients.’-4 Several ab initio pro-
82 (OPT = FP) is a modified versiong of this.
gram packages now incorporate analytical
The standard approach for investigating
gradients for a wide range of wavefunctions, the structure and energetics of molecules and
e.g., RHF, UHF, MP2, CID, and CISD in ions in the gas phase is to locate the relevant
GAUSSIAN 825; RHF, UHF, and ROHF (re-
stationary points on the potential energy
stricted open shell SCF) in CADPAC‘ and surface at a modest level of theory, e.g.,
even CASSCF gradients in GAMESS7 How- HF/3-21G or HF/6-31G*, determine their
ever, there are still several types of wave- exact characteristics -whether minima,
function e.g., MP3, MP4, for which analytical transition states or higher-order saddle
gradients are not currently available, and, points -by vibrational analysis, and then
even if they were to become so, are likely to be perform single-point calculations a t higher
very expensive to calculate. For these and
theoretical levels, e.g., MP3/6-3lG** on the
other types of wavefunction, nonderivative
optimized structures to give “more accurate”
methods for geometry optimization, based on
estimates of barrier heights and dissociation
function (energy) evaluations only, must be
energies, the assumption being that the ge-
used.
ometries of the stationary points found at the
It is generally accepted that the best non-
low level are more-or-less what they would be
derivative optimization methods are those
at the higher level. In many cases this as-
based on approximating the gradient by finite
sumption is perfectly valid, and a large body
differences, and then proceeding in essen-
of evidence exists’’ suggesting that, for
tially the same way as with an analytical
“standard” molecules in their ground state,
geometries obtained at the SCF level are
*Present Address: Department of Theoretical Chem- similar to those obtained at correlated levels
istry, University Chemical Laboratory, Lensfield Road,
Cambridge CB2 lEW, UK. and, indeed, comparable to geometries deter-

Journal of Computational Chemistry, Vol. 8, No. 5, 563-574 (1987)


0 1987 by John Wiley & Sons,Inc. CCC 0192-8651/87/050563-12$04.00
564 Baker

mined experimentally. However, there is al- viously studied at HF/6-31G* by Bouma,


ways the possibility that reoptimization at Yates, and Radom,15 is reexamined a t
the higher level could significantly alter esti- MP3/6-31G*.
mated barrier heights, particularly when the
barrier is small, and in certain cases inclusion I. USE OF NUMERICAL GRADIENTS
of correlation may change the geometry quite The Taylor series expansion of a smooth,
continuously differentiable function F(X) at
In general though, geometries should not the point X + h about X is given by
change too much at higher levels of theory,
and if a post-SCF optimization is contem- F(X + h) = F ( X ) + hF’(X)
plated in order to improve the energetics it is h2
obviously sensible to start the optimization
+ -F”(X)
2
+ . . . (1)
off from the SCF geometry. In fact, it is usu- By ignoring the quadratic and higher terms,
ally advisable to utilize as much information (1) can be rearranged to give the well known
as possible from the SCF optimization as forward-difference approximation for the
input to the post-SCF optimization and, in gradient at X
particular, to use the final SCF hessian as the F ( X + h) - F ( X )
initial starting hessian for a higher level F’(X) =
h
+ O ( h ) (2)
treatment. This is easy to do in GAUSSIAN The major source of error in an approxi-
82 since all useful information from a gra- mation such as (2) is due to truncation of the
dient optimization, such as final geometry, Taylor series; as indicated, this error is of
hessian and SCF wave function, is stored on order h, where h is known as the finite-
a “checkpoint file” at the end of the job and difference interval. The computed function
this can be read in as input to start off another values used to calculate F(X) are also subject
(better) optimization. to error. This gives rise to cancellation (or
The purpose of this article is to present the condition) error when the two values are sub-
numerical version of a previously presented tracted. Such error is proportional to l/h, and
gradient algorithm13 (based on the RFO consequently truncation and cancellation
method of Simons and ~oworkers’~) which can errors oppose each other, i.e., decreasing one
be used to efficiently locate both minima and means an increase in the other. When choos-
transition states on potential energy surfaces ing a suitable finite-difference interval one
for those wavefunctions for which analytical has to balance these two errors; h should be
gradients are not available (the algorithm small, to reduce the truncation error, but not
can, of course, be used with any wavefunction so small that the function values will be sub-
for which an energy can be calculated). The ject to large cancellation error.
algorithm has been designed for use in the By expanding the function at the point X - h
GAUSSIAN 82 program package and, in line
with the comments given above, to make F(X - h) = F ( X ) - h F ’ ( X )
h2
maximum use of second derivative informa- + -F”(X)
2
+ * * - (3)
tion available from previous lower level opti-
mizations, although it can be used without the analogous backward-difference formula
such information. can be obtained
Section I discusses some of the problems F ( X ) - F ( X - h)
associated with determining numerical, as F’(X) =
h + O(h) (4)
opposed to analytical, gradients; Section I1
gives some details of the algorithm and its As in the case of the forward-difference
implementation within the framework of approximation, the truncation error for
GAUSSIAN 82; Section I11 presents numeri- backward-difference is also of order h.
cal results for a variety of systems at various If the expansions given by (1)and (3) are
levels of theory with comparisons, where ap- subtracted and divided by h the central-
plicable, between the gradient version of the difference approximation to the gradient is
algorithm and the standard DFP algorithm obtained
supplied with GAUSSIAN 82, and in Section F(X + h) - F(X - h)
IV the CH,F+ potential energy surface, pre- F’(X)= + W2) (5)
2h
Algorithm for Geometry Optimization 565

The truncation error in the central difference (ii) The change in a given variable during
formula is of order h2, since the terms in- an iteration is less than the finite-
volving F ” ( X )cancel. Central-differencesare difference interval.
thus inherently more accurate than forward If either of the above holds, it is probably
differences, but have the disadvantage that advisable to switch to central-differences.
each variable has to be stepped twice in order If a reliable estimate of the second deriva-
to estimate its gradient, as opposed to only tive is available, it is possible to improve the
once with the other two formulae. accuracy of the simple forward-difference
From a practical point of view, therefore, approximation. Truncating (1) after the
gradients should be estimated by a one-step, quadratic term gives a corrected forward-
finite-differenceapproximation,e.g., forward- difference formula
differences until this becomes unreliable, and
only then should a switch be made to central
differences. Forward-differences usually pro-
vide approximate gradients of acceptable
accuracy unless lp’(X)llis small. Since Ip’(X)ll
approaches zero at the stationary point, this By including second derivative information
means that as the optimization converges, the in this way it is possible to extend the lifetime
forward-difference approximation becomes of the forward-difference approximation, and,
less and less accurate and may be inap- provided the convergence criteria are not too
propriate during the last few iterations. stringent, to use forward-differencesthrough-
That forward-differences are in error at the out the entire optimization.
stationary point can be clearly seen from
Figure 1: A forward-difference approxima-
tion to the gradient at Xo-the bottom of the 11. THE ALGORITHM
well -would give a non-zero value and indi- As mentioned in the Introduction, the gra-
cate that the minimum should be shifted dient version of the algorithm has already
(incorrectly) towards X o - h; the central- been presented13; the major changes in the
difference approximation is obviously much numerical version involve consideration as
more reliable. to what finite-difference approximation to
In general, forward-differences cease to be use when estimating the gradient and what
accurate when: the associated finite-difference interval, h,
(i) The difference in function values (en- should be. Several of the features incorpo-
ergy) is small relative to the finite- rated into the algorithm have been taken
difference interval. from ref. 16, which contains useful dis-
cussions about the problems associated with
numerical evaluation of gradients.
Three options for specifying the finite-
difference approximation are supported:
(i) Use the corrected forward-difference
formula [eq. (6)]throughout the opti-
mization.
(ii) Use central-differences [eq. (5)l
throughout.
(iii) Use corrected forward-differences
with a test for accuracy and switch to
central-differences when this test
fails.
There are also three options for specifying
the finite-difference interval:
Xo-h Xo Xo+h X
Figure 1. Forward-difference (Fd) versus central- (i) Use a default interval calculated in-
difference (cd) estimate of the gradient at a stationary ternally by the program [see eq. (711.
point (see the text for more details). (ii) Read in a finite-difference interval.
566 Baker

(iii) Estimate the “best” finite-difference


F ” ( X )=
F(X + h) - 2F(X) + F(X - h)
interval at the first point before start- h2
ing the optimization. + O(h) (9)
In general, the above options can be applied
independently to each variable; thus one Equation (9) can be obtained by subtracting
variable could use the corrected forward- (1) and (3) so as to eliminate the term in
difference formula with a default interval F ’ ( X ) .As indicated, the truncation error is of
while another uses central-differences with order h.
the finite-difference interval read in. If no other options are specified, then by
The default finite-difference interval is cal- default the initial finite-difference interval is
culated according to ref. 16. calculated according to (7),an initial (diago-
nal) Hessian formed using (9) and gradients
calculated using the corrected forward-dif-
(7) ference formula (6) with a switch to central-
differences (5) if either of the two criteria
where is an estimate of the error in the given in Section I are satisfied (the energy
function (energy) values F ( X ) . is probably change is considered to be “small” if it is less
the hardest quantity to estimate in the whole than 10%of the square of the finite-difference
algorithm; after some experimentation a interval). Since estimating second deriva-
value of lo-’ was taken, which, with p = 10, tives involves calculating F ( X + h ) and
gives finite-difference intervals of 5 x - F(X - h), then the gradient can be calculated
2 x lop4 a.u. for systems with energies using central-differenceson the first cycle at
between 1 and 1000 hartrees. All arithmetic no extra cost.
was in double precision. An additional feature of the algorithm is a
By seeking a balance between truncation line search option which can be used when
and cancellation errors it is possible to esti- looking for minima. This involves calculating
mate a “best” initial finite-difference inter- the energy at additional points along the
val. A procedure that attempts to select h as search direction and fitting a quadratic to
the smallest value that results in an accept- give an estimate of the minimum, i.e., the
able cancellation error has been included greatest energy decrease along the line. The
in the algorithm; this is a slightly modified line search is fairly simple: the new step and
version of “Algorithm FD” given in section energy are calculated in the normal manner
8.6.2.2 of ref. 16. Estimation of the best h and then, provided the original step was not
involves several additional function evalu- scaled, an additional step of length 30%of the
ations before the optimization starts how- original step is taken and the energy found at
ever, and in most cases the default interval is this new point along the line. If the new en-
sufficient. ergy is higher than the previous energy, a
The finite-difference interval is allowed to parabola is fitted through the original point,
change during the course of the optimization the point after the first step and the new point
as the variables themselves change. The in- after the additional step-the minimum of
terval for thejth variable at the lzth iteration this parabola forms the next estimate of the
is given by stationary point (minimum) being sought. If
the new energy is lower, then a further step
along the line is taken (of length 30% of the
original step + total increment) and this pro-
where the superscript refers to the initial
O cess is repeated until the energy rises, with a
value of the quantity. In order to start the parabola being fitted through the last three
optimization properly, an initial estimate of points found. A check is made that the energy
the Hessian matrix must be supplied. Ideally, at the estimated minimum of the parabola is
this should be the final Hessian from a pre- the lowest energy found along the line; if not,
vious optimization at a lower level, but if then a new parabola is fitted through this
such a Hessian is unavailable then diagonal point and two of the others. If the original step
hessian matrix elements can be estimated was scaled because its length was greater
according to than the maximum allowed steplength
Algorithm for Geometry Optimization 567

(DMAX) then all subsequent steps along the lytical versions of the algorithm could be
line are of length DMAX; if the original step made. Initially, a series of minimizations at
resulted in the energy rising, then the step- the 3-21G level were done, starting from
length is halved repeatedly until a lower en- STO-3G geometries. Two sets of calculations
ergy is found, with, again, a parabola fitted were performed; one in which either an
through the last three points. Note that the approximate or an exact STO-3G hessian was
line search is only attempted if the hessian at used to start off the optimization, the other
the start of the search is positive-definite where the initial second derivatives were ob-
(since only then is the search direction guar- tained numerically, and within each set cal-
anteed to be a descent direction) and if the culations were done with and without; a line
RMS gradient is greater than a user supplied search and with and without finite-difference
tolerance (default 0.003). accuracy checking. Comparisons were also
The line search as described above requires made with the default GAUSSIAN 82 algo-
at least two additional function evaluations rithm (OPT = FP). Results are shown in
per cycle, and sometimes more (although Tables 11 and 111.
rarely more than three); consequently it is not The convergence criteria were the standard
really worthwhile if the number of variables default values of 0.0003 on the RMS gradient;
to be optimized is small, since the gain due to 0.00045 on the maximum gradient; 0.0012 on
the decrease in the number of cycles t o the RMS displacement and 0.0018 on the
achieve convergence is offset by the increase maximum displacement. These criteria are
in the number of function evaluations per normally sufficient to give convergence on
cycle; however, there can be considerable bond lengths to three decimal places and on
savings with a large number of parameters. angles to one decimal place. In all the ex-
The coding in GAUSSIAN 82 is such that the amples reported in this section, both for
time taken to evaluate the gradient analyti- minima and transition states, final struc-
cally is comparable to the time for a single tures obtained from the analytical and nu-
function evaluation, at least for SCF wave- merical versions of the algorithm agree with
functions,so that there is nothing to be gained each other to this accuracy.
from a line search, regardless of the number Table TI presents the minimization results
of variables. This is why the analytical ver- with the initial hessian read in. Comparison
sion of the algorithm does not incorporate a of the analytical with the numerical algo-
line search. rithm without a line search shows that, in
As well as the features discussed above, every case, the number of cycles to converge
most of the options available to the gradient was the same, suggesting that the numerical
version of the algorithm are also available in algorithm emulates the analytical very well;
the numerical version, including alternative this is further borne out by the convergence
Hessian updates and mode following for tran- pattern, which is essentially the same for
sition states (although, as mentioned in the both algorithms.
introduction, the idea behind the numerical As anticipated, there are more disadvan-
algorithm is to start an optimization from a tages than advantages with the line search,
previously located stationary point at a lower even when the number of variables is fairly
level and not to explore the energy surface in large. Only for NH,CH' were there any sig-
any detail). A full list of options is given in nificant savings. Of the smaller systems, only
Table I. Again, as with the gradient version, NH, benefitted from the line search. Judging
most of these options can be invoked by speci- from these results -admittedly not really
fying simple keywords on the route card. enough to form any general conclusions-for
systems with less than about five variables
it is probably advisable to switch off the
111. APPLICATIONS
line search.
Switching off the finite-difference BCCU-
In order to test the performance of the algo- racy checking generally results in savings of
rithm a series of optimizations were carried only one or two function evaluations, since
out at the SCF level. In this way a direct forward-differences are normally adequate
comparison between the numerical and ana- right up to the last cycle. In fact, as mentioned
568 Baker

Table I. Options available for the numerical FWO algorithm.


C OPT I ONS COMMON/IOP/
C
C IOPC5) n a t u r e of a required stationary point
C 0 find a TS (default 1
C 1 find a minimum
C
C IOPC6) maximum number of steps allowed
C 0 minC40,nvar + 20) Cdefaul t 1
C (where nvar = number of variables)
C N N steps
C
C IOPC7) c o n v e r g e n c e criteria o n RMS gradient
C 0 0.0003 Cdefaul t 1
C N 0.001/N
note: t h e other c o n v e r g e n c e criteria a r e
maximum radient = 1.5 * RMS gradient
9
RMS d i s p acement = 4.0 * RMS radient
max displacement = 1 .5 * R M S !isplacement
IOPC8) maximum stepsize allowed during optimization
0 DMAX = 0.3 Cdef aul t 1
N DMAX = 0.01 * N
C
C IOPCIO) input of initial second derivative matrix
C all values must be i n atomic units
C 0 e s t i m a t e diagonal matrix elements numerically
C 1 read in full F R C N S T matrix Clower triangle)
C format C8F10.6)
C 2 read in selected elements of F R C N S T
C READ I,J,FRCNSTCI,J) 213, F20.0
C 3 read F R C N S T matrix from t h e checkpoint f i l e
C 5 read c a r t e s i a n f o r c e constants from checkpoint f i l e
C
C IOPC13) type of Hessian update
C 0 Powell update CTS default)
C 1 B F G S update C M I N default)
C 2 B F G S update with safeguards to ensure retention
C of positive definiteness
C
C IOPC14) minimum RMS gradient for which a linear search will
C be a t t e m ted C M I N search only)
C 0 F S W Y C H = 0.003
C N F S W T C H = 0.0001*N
C .I switch off linear search
C
C IOPCl6) m a x i m u m a l l o w a b l e m a g n i t u d e of t h e Hessian e i g e n v a l u e s
C if this ma n i t u d e is exceeded, the e i g e n v a l u e is r e laced
C
C
0
N
EIGMJ = 25.0
EIGMAX = O.I*N
P
(defau t)
C
C IOPCI 7 ) minimum a l l o w a b l e magnitude of t h e Hessian eigenvalues
C similar to IOPC16)
C 0 EIGMIN = 0.0001 Cdefaul t 1
C N EIGMIN = 1 .O/N
C
C IOP( 19) search selection
C 0 P - R F O or RFO step only (default 1
C 1 P - R F O or RFO step for "wrong" Hessian
C o t h e r w i s e Newton-Raphson
C (note: depending on IOPC14) a linear search a l o n g t h e
C search direction may a l s o be attempted)
C
c IOPC21) expert switch
C 0 normal m o d e (default 1
C 1 expert m o d e
C c e r t a i n c u t o f f s to control optimization relaxed
C
Algorithm for Geometry Optimization 569

Table I. (continued)
C I OPC33) print option
C 0 on Cdefaul t 1
C 1 off turns off extra printing
C (default of "on" by popular request)
C
C I OPC34) dump o p t i o n
C 0 off Cdefaul t 1
C 1 on turns on debug printing
C
C IOPC35) restart option
C 0 normal optimization (default 1
C 1 first point of a restart
C recover geometry etc..from checkpoint f i l e
C
C I OP C 36 1 checkpoint option
C 0 checkpointing a s normal Cdefaul t 1
C 1 suppress checkpointing
C
C IOPC38) f i n i t e - d i f f e r e n c e control flag
C 0 forward-difference + accuracy check Cdefaul t 1
C forward-differences unless considered inaccurate;
C in which c a s e central differences
C 1 initial finite-difference approximation
C maintained throughout the optimization
C C5ee the IC array)
C
C IOPC39) 5tepsize control for finite-differencing
C 0 use internal default or IC a r r a Cdefaul t 1
C
C
1 Y
c a l c u l a t e "best" interval for a 1 variables
before starting optimization proper
C
C MODE FOLLOWING: m o d e f o l l o w i n is turned on via the IC a r r a y , which i 5
C input with the 2 - m a t r i x varia !I les in Link 101. Adding 4 to the IC
C entry for a particular variable will c a u s e a transition s t a t e search
C
C
C
that variable. Adding 10 to the IC entr for t e Kth variable will z
to f o l l o w t h e Hessian m o d e with t h e largest ma nitude component for
follow the K t h Hessian mode. ONLY ONE 1 8 ENTRY SHOULD BE MODIFIED IN
C THIS W A Y , I.E., ONLY ONE MODE SHOULD BE FOLLOWED AT A TIME.

Table 11. Minimization for various systems at 3-21G starting from STO-3G geometry and an approximate or exact
Hessian (see the text for more details)."
No. Cycles to Converge
(Function Evaluations in Brackets)
No. of with FD Checking Forward-Differences Analytical
System Variables Hessian No LS LS No LS LS Gradients
H2O 2 approximate 4 (13) 3 (13) 4 (12) 3 (12) 4
NH3 2 exact 5 (19) 3 (15) 5 (15) 3 (14) 5
HZCO 3 exact 3 (13) 3 (17) 3 (12) 3 (15) 3
MgCH3C1 4 approximate 5 (25) 5 (35) 5 (25) 5 (32) 5
C4H4 4 approximate 5 (29) 5 (37) 5 (25) 4 (26) 5
(BH)4Nzb 5 approximate 5 (35) 4 (37) 5 (30) 3 (24) 5
HZNCH+ 7 exact 4 (39) 3 (29) 4 (32) 3 (28) 4
HNCHOH' 9 approximate 4 (47) 4 (51) 4 (40) 4 (43) 4

"All systems are RHF except for H2NCH+(UHF).


cyclic

/OH
formimidic acid N=C
H/ \H
570 Baker

Table 111. Minimization for various systems at 3-21G starting fiom STO-3G geometry and a diagonal hessian
calculated numerically via equation (6) (see the text for more details)."
No. Cycles to Converge
(Function Evaluations in Brackets)
No. of with FD Checking Forward-Differences DFP
System Variables Hessian No LS LS No LS LS Algorithm
numerical 4 (14) 4 (20) 4 (14) 4 (17) 6 (34)
numerical 5 (18) 4 (21) 5 (17) 4 (20) 4 (22)
numerical 3 (15) 3 (18) 3 (15) 3 (17) 5 (40)
numerical 5 (32) 4 (32) 5 (29) 4 (30) 8 (70)
numerical 3 (21) 3 (23) 3 (21) 11 (107) F
numerical 6 (50) 5 (45) 6 (19)
(41) 5 (44) 11 (125+) F
HZNCH' 7 numerical 5 (51) 5 (63) 5 (47) 5 (51) 5 (55)
HNCHOHb 9 numerical 5 (73) 5 (73) 5 (59) 5 (64) 6 (93)
" All systems are RHF except for HZNCH+ (UHF);F indicates failure to converge.
" See comments under Table 11.
See comments under Table 11.

at the end of Section I, forward-differences least N function evaluations (the extra en-
can be used throughout the optimization, ergy calculations required to estimate the
even when the displacements are small. second derivatives on the first cycle) can be
However, although there were no problems saved, and often considerably more.
with any of the systems studied here, numer- The last column of Table I11 gives the
ical instability can easily arise if forward- results for the Davidon-Fletcher-Powell algo-
differences are persisted with when they rithm. Clearly this algorithm is markedly in-
are really inappropriate, and it is dangerous ferior to the one presented here, taking about
to switch off the accuracy checking, particu- twice as many function evaluations (and
larly if the convergence criteria are set any sometimes more) to reach convergence, and in
tighter than their default values. Assuming the case of the two cyclic systems, failing to
central-differences will be used only on the achieve convergence after 107 and more than
last or possibly the penultimate cycle, then 125 function evaluations. Only for NH, and
accuracy checking should result in around N NH2CH+are the two algorithms comparable.
extra function evaluations or less, where N If comparisons are made with Table I, then
is the number of variables. It is probably the DFP algorithm is worse in every case.
worth accepting the extra energy calculations The DFP algorithm,8 as modified for
than risking the possibility of numerical GAUSSIAN 82; involves an initial cycle to
instability. calculate second derivatives numerically
Table I11 presents the same series of results, (2N + 1 function evaluations) and then sub-
this time calculating (diagonal) second de- sequent cycles in which gradients are calcu-
rivatives numerically. There is very little dif- lated via a corrected backward-difference
ference between the two tables for the smaller formula, similar to (6), with a three step line
systems, the number of cycles to converge search (N + 4 function evaluations) or, if
being about the same and the total number of backward-differences are considered inaccu-
function evaluations being only slightly rate, by central-differenceswith a line search
greater with numerical second derivatives (2N + 4 function evaluations). Central-
than with a full initial Hessian. For the three differences are used if the RMS displacement
largest systems, however, there are consid- is less than RMSLO (0.001) or greater than
erable savings in using an initial hessian. RMSHI (0.161,in which event second deriva-
Surprisingly, for C4H4,convergence was tives need to be recalculated. Additionally,
faster with numerical second derivatives the variables are scaled so that the second
(this was probably due to the approximate derivatives are of order unity.
STO-3G hessian being a poor starting Hes- On the face of it, this seems a fairly reason-
sian in this case). In general, however, it is able procedure, so why does the DFP algo-
certainly worthwhile utilizing external de- rithm perform so poorly? There are a number
rivative information if it is available, since at of reasons for this, as follows:
Algorithm for Geometry Optimization 571

(a) The DFP algorithm itself is inherently parisons can be made with the DFP algorithm
inferior to the RFO method, as evidenced by for these systems since this algorithm cannot
the fact that more cycles are required to reach locate transition states.
convergence. Again, as with the minimizations, the nu-
(b) Many of the cycles involve at least 2N merical and analytical RFO algorithms follow
function evaluations either because second each other closely, taking identical numbers
derivatives need to be recalculated or because of cycles to reach convergence. The results
the RMS displacement is too small, and starting with a numerical (diagonal) hessian
central-differences are needed for the gra- are much worse than those with an initial
dient. In this latter case, the situation is ex- (STO-3G) hessian read in, the former taking
acerbated by the fact that there are no tests more cycles and about twice as many function
on individual displacements, i.e., central- evaluations to converge than the latter, and,
differences are either not used or used for all in one case-FCN-failing to converge at
the variables. all, although this was due to problems with
(c) A t least three additional function the SCF rather than with the optimization.
evaluations are required each cycle for the Clearly, there is more to be gained by using
line search, which in many cases may not be external second derivative information for
necessary. transition state searches than for minimiza-
(d) The method is poor for variables which tion. This is doubtless due to the fact that, in
are strongly coupled, i.e., have large off- many cases, a diagonal second derivative
diagonal hessian matrix elements. This is matrix simply does not have the correct
why the two cyclic systems fail to converge. eigenvalue structure (one negative eigen-
(e) The finite-difference intervals for value) appropriate to a transition state, since
OPT = FP are fixed at 0.01 A for bond it is often the off-diagonal matrix elements
lengths and 1" for angles. These intervals are that are "responsible" for the negative eigen-
at least an order of magnitude greater than value, all the diagonal second derivatives
those used in the numerical RFO algorithm, being positive. For the systems examined
and consequently are likely to lead to numer- here, only CHsN had an appropriate diagonal
ical instabilities when estimating the gra- Hessian; H,CO had two negative eigenvalues
dient by backward-differences much earlier initially and all the others had none. The fact
in the optimization,particularly if the second that convergence is really fairly rapid indi-
derivatives are not accurate enough. cates that the RFO method is quickly able to
Table IV presents a similar series of results recover from the poor starting Hessian.
for transition states, again at the 3-21G level Finally, Table V presents results for a selec-
starting from STO-3G geometries. Compari- tion of the systems from Tables 11-IV, this
sons are made between the analytical algo- time at MP2/3-21G starting from the final
rithm with an initial hessian, the numerical 3-21G geometries and approximate hessians
algorithm with an initial hessian, and the obtained from the previous optimizations. All
numerical algorithm with diagonal second MOs, including the core, were included in
derivatives calculated internally. No com- the correlation treatment (MP2 = FULL) in

Table IV. Transition state searches for various systems at 3-21G starting from STO-3G geometry and either an
approximate/exact hessian or a diagonal hessian calculated numerically via equation (6) (see the text for more
details)."
No. Cycles to Converge
(Function Evaluation in Brackets)
No. of with Hessian Read In Numerical Hessian
System Variables Hessian Analytical Numerical + FD Check Numerical + FD Check
HCN <-->HNC 3 approximate 4 4 (17) 5 (27)
FCN <-->FNC 3 approximate 8 8 (34) 6 (27) F
HCCH <-->HzCC 5 approximate 3 3 (19) 5 (45)
HzCO <-->HZ + CO 5 exact 6 6 (40) 10 (71)
H&N <-->HzCNH 6 approximate 3 3 (23) 5 (49)

a All systems are RHF except for H,CN (UHF triplet); F indicates failure to converge
572 Baker

Table V. Optimizations of various systems at MP2/3-21G starting from 3-21G geometry and approximate hessian
(see the text for more details)."
No. of No. of Cycles to Converge (Function Evaluations in Brackets)
Systems Variables Type Analytical Numerical + FD Check DFP Algorithm
Hz0 2 MIN 3 3 (15)
NH3 2 MIN 3 10 (58)
HCN <-->HNC 3 TS 4 nla
HZCO 3 MIN 4 7 (55)
MgCH3C1 4 MIN 3 4 (38)
HzCO <-->Hz + CO 5 TS 6 nla
HCCH <-->HzCC 5 TS 5 nla
HZNCH+ 7 MIN 3 4 (60)

"All systems are RHF except for H2NCH+(UHF)

order for a direct comparison with the gra- CH2FH+, (2)- two transition states, and
dient algorithm to be made. The last column various decomposition pathways. As men-
in the table gives results using the DFP algo- tioned in the introduction, this surface has
rithm where this is possible (Le., for the previously been characterized at HF/6-31G*
minima). Table V should give a better indi- by Bouma, Yates, and Radom," who also opti-
cation as to how the numerical algorithm is mized the minima at MP2/6-31G*, showing
likely to perform in practice, since the geo- the importance of correlation for determining
metrical changes on going from 3-21G to the structure of CH3F+,which has a very long
MP2/3-21G are probably closer to the kind of C-F bond length at the SCF level that short-
changes that will occur between, say, 6-31G" ens dramatically on reoptimization at MP2.
and MP3/6-31G*, which is an example of the They were not able to optimize the transition
type of optimization for which the algorithm states at correlated levels, however, and their
was intended, than those between STO-3G final energies for these structures were based
and 3-21G, which are often quite significant, on SCF geometries.
particularly for transition states. In this section all the stationary points on
As with all the previous optimizations, the the CH,F+ energy surface as characterized by
numerical and analytical versions of the algo- Bouma, Yates, and Radornls were reoptimized
rithm perform in essentially an identical at MP3/6-31G* (with frozen core). In all cases
manner. The minima converge in three to the optimizations were started from the
four cycles; the transition states take slightly HF/6-31G* structures -even for CH3F+ (to
more. Assuming these systems are fairly typi- see how the algorithm handled the very poor
cal, then (for the numerical algorithm) a rea- initial geometry)-and in most cases an ap-
sonable estimate of the number of function proximate 6-31G* hessian was also available.
evaluations required to reach convergence, The exceptions were for the dissociative tran-
starting from a suitable initial geometry and sition state (4), for which an approximate
hessian, would be around 4(N + 1) for 3-21G hessian was used, and CH:, for which
minima and, say, 6(N + 1) for transition the hessian was estimated numerically.
states, where N is the number of variables. The structures of the various species, both
The DFP algorithm, if anything, shows up at HF/6-31G* and MP3/6-31G*, are shown in
even worse than before: only H 2 0 is compar- Figure 2. Total and relative energies at these
able; NH, takes ten cycles and over six times levels are given in Table VI. Column 5 of
as many function evaluations to converge, this table gives the "best" relative energies
and H2C0 takes almost four times as many. of Bouma, Yates, and Radom15 (taken from
Table I1 of ref. 15) and column 6 the rela-
IV. THE CH,F+ ENERGY SURFACE tive energies from column 4 incorporating
Bouma's estimated zero-point vibrational en-
This is quite an interesting energy sur- ergy corrections. Most of the MP3 optimiza-
face, showing a t least two distinct minima - tions converged in six cycles or less,except for
the fluoromethane ion, CHsF+ (11, and the CH,F' minimum, which took 12 cycles
the methylene-fluoronium radical cation, (fairIy good, considering the bad starting
Algorithm for Geometry Optimization 573

1 +-

1+.
2.033
1.296
93.6 95.6

L HCH = 128.5
(3) 129.6

H, 1-423
1+*

CH3+ rCH 1.078


1.09 I

CHe*' rCH 1,080


1,094
LHCH 138.6
139.9
1,086
HF rHF 0 . 9 1I
0.932

Figure 2. Optimized geometries for CH3F+minima (1,2) transition states (3,4) and dissociation products.
Optimized parameters are shown in the order HF/6-31G*, MP3/6-31G*. Lengths in A, angles in degrees.

geometry), and, surprisingly, the dissociative good agreement with those of ref. 15,as can be
TS, which took 11 cycles. In this latter case seen by comparing the last two columns of
the fact that only a 3-21G hessian was used to Table VI. The various stationary points have
start off the optimization may have been a the same energy ordering in both cases, and,
contributory factor to the unexpectedly slow apart from a slight overall lowering through-
convergence. out column 6, the absolute values for the en-
Examination of Figure 2 shows that- ergy differences also agree very well. Only for
apart from CH3F+(11, for which the changes the relative energy of the CHzF++ H pair is
are major-there are only really minor there a significant difference.
changes between t h e HF/6-31G* a n d Basically, these results support those of
MP3/6-31G* structures. The geometry of Bouma, Yates, and Radom and indicate that,
CH3F+at the MP3 level is very close to that at least for the CH,F+ energy surface, single-
found by Bouma, Yates, and Radom at MP2. point calculations based on SCF structures
Energetically, the MP3/6-31G* results are in give a reasonable picture of the energetics,

Table VI. Total and relative energies for minima, transition states and decomposition products on the CH3F+
energy surface: SCF values from ref. 15; MP3 values this work. Values in column 5 are the "best"values from Table I1
of ref. 15 (see the text for more details).
Total Energies Relative Energies
System HF/6-31G* MP3/6-31G* HF MP3 ref. 15 MP3 + ZPE
CH3F + (1) - 138.60754 -138.89178 36 33 46 41
CHzFH + (2) -138.62133 - 138.90446 0 0 0 0
CHzF+ + H' - 138.59202 - 138.87305 77 82 96 64
bridge TS (3) -138.55361 -138.85439 178 131 129 123
CH2+ + HF -138.56910 - 138.83356 137 186 162 162
CH3++ F' - 138.59560 -138.83762 68 175 179 169
dissoc TS (4) - 138.53535 -138.83236 226 189 192 176
574 Baker

except in those cases where the geometry be frozen. However, the numerical algorithm
changes significantly with inclusion of cor- is heavily influenced by the time taken for
relation (i.e., for CH,F’ itself). the SCF, and if there are SCF convergence
problems this will favor use of the analytical
algorithm.
SUMMARY AND FINAL DISCUSSION
An efficient numerical algorithm for the AVAILABILITY OF CODE
location of both minima and transition states
has been presented. It is designed for use in The FORTRAN code for the above algo-
.
the GAUSSIAN 82 program package and, uti- rithm is available and can be supplied, along
lizing second derivative information from with the modifications needed to run in a
previous (gradient) optimizations at lower GAUSSIAN 82 environment, on request.
levels of theory, is considerably faster than Both VAX and IBM versions exist. In the near
the standard DFP algorithm supplied with future the source and full instructions for use
GAUSSIAN 82. With a reasonable starting for both the numerical and analyti~al’~ RFO
geometry and initial hessian, minima should algorithms will be sent to QCPE for distribu-
be located in about 4(N + 1) energy evalu- tion; in the meantime all interested users
ations and transition states in 6(N + 11, should contact the author direct.
where N is the number of variables.
Thanks to P. M. W. Gill for useful discussions.
Although designed for use with wavefunc-
tions for which analytical gradients are un-
available, the numerical algorithm does have References
certain advantages over the analytical for
post-SCF wavefunctions, e.g., MP2. The cur- 1. P. Pulay, in Applications of Electronic Structure
Theory, H.F. Schaefer, Ed., Plenum, New York,
rent post-SCF gradient coding in GAUSSIAN 1977.
82 makes use of the fact that the energy is 2. J.A. Pople, R. Krishnan, H. B. Schlegel, and J. S.
invarient to rotations amongst the occupied Binkley, Znt. J. Quantum Chem., 513, 225 (1979).
3. N. C. Handy and H. F. Schaefer, J. Chem. Phys., 81,
o r amongst the virtual orbitals. Conse- 5031 (1984).
quently, all the orbitals used in the SCF must 4. P. Jorgensen and J. Simons, 8.Chem.Phys., 79,334
be included in the post-SCF wave function, (1983).
even when those orbitals have negligible 5. J.S. Binkley, M.J. Frisch, D.J. Defrees, K. Rag-
havachari, R.A. Whiteside, H.B. Schlegel, E.M.
effect on structures or relative energies, e.g., Fluder, and J. A. Pople, Carnegie-Mellon Univer-
core orbitals or the highest virtuals. Such sity, Pittsburgh, PA 15213, USA
orbitals can easily be left out of a numerical 6 . R. D. Amos, P u b l i c a t i o n No. C C P 1 /84/4,
SERC, Daresbury, Warrington, WA4 4AD, United
optimization. Another factor is that, unlike Kingdom.
SCF gradients, the CPU time required for a 7. GAMESS (modified version from SERC, Daresbury,
post-SCF gradient is often two to three times UK), original ref., M. Dupois, D. Spangler and J. J.
Wendoloski,Nut. Resour. Comput. Chem., Software
that for the corresponding energy evalua- Cat., Vol. 1, Bog. QGO1, 1980.
tion; consequently -given the ability of the 8. R. Fletcher and M. J. D, Powell, Comput. J.,6,163
numerical algorithm to emulate the analyti- (1963).
cal -you are actually better off performing 9. J. B. Collins, P. von R. Schleyer, J. S. Binkley, and
J. A. Pople, J. Chem. Phys., 64, 5148 (1976).
the optimization numerically when the num- 10. W. J. Hehre, L. Radom, P. v. R. Schleyer, and J.A.
ber of variables is small. A further consid- Pople, A b Znitio Molecular Orbital Theory, Wiley,
eration is file storage; for the numerical New York, in press.
11. K. Raghavachari, R. A. Whiteside, J. A. Pople, and
algorithm no more disk space beyond that P. von R. Schleyer, J. A m . Chem. SOC.,103,5649
required for a single energy calculation is (1981).(See C2H7+.)
needed, whereas the additional storage for an 12. M. J. Frisch, A. C. Scheiner, H. F. Schaefer, and J. S.
analytical gradient calculation is consid- Binkley, J. Chem. Phys., 82, 4194 (1985).
13. J. Baker, J. Comp. Chem., 7, 385 (1986).
erable. In terms of CPU time, for a typical 14. A. Banerjee, N. Adams, J. Simons, and R. Shepard,
MP2 optimization, the numerical algorithm J. Phys. Chem., 89, 52 (1985).
is usually faster than the analytical for three 15. W.J. Bouma, B.F. Yates, and L. Radom, Chem.
variables or less, particularly for second row Phys. Letts., 92, 620 (1982).
16. P. E. Gill, W. Murray, and M. H. Wright, Practical
systems, when many more “core orbitals” can Optimization, Academic Press, London, 1981.

You might also like