Computer Program Optimization of Gas Hydrates

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Numer. Math.

t8, 289-297 (t972)


9 by Springer-Verlag 1972

Derivative Free Analogues of the Levenberg-Marquardt


and Gauss Algorithms
for Nonlinear Least Squares Approximation
KENNETH M. BROWN*
IBM Philadelphia Scientific Center and University of Minnesota
J. E. DENmS, JR.**
Cornell University
Received August 7, i970

Abstract. In this paper we give two derivative-free computational algorithms for


nonlinear least squares approximation. The algorithms are finite difference analogues
of the Levenberg-Marquardt and Gauss methods, Local convergence theorems for the
algorithms are proven. In the special case when the residuals are zero at the minimum,
we show that certain computationally simple choices of the parameters lead to
quadratic convergence. Numerical examples are included.

1. Introduction
Among the techniques used for unconstrained optimization of a nonlinear
objective function in the least squares sense, two of the most popular have been
Gauss' method (also called the Gauss-Newton method) [1, 14] and the Levenberg-
Marquardt algorithm [t, 1t, t2, t4]. In this paper we give derivative free (finite
difference) analogues of these methods and prove local convergence for the algo-
rithms. We show how to select parameters in such a way that quadratic conver-
gence is guaranteed for any function which is zero at the minimum. Results of
computer experiments show how the methods compare to other currently used
techniques.

2. Description of the Methods


L e t / 1 . . . . . /M be real valued functions of N real unknowns. We consider the
problem of finding an x* 6E N which minimizes
M
,~ (x) --liE(x)11~ = d=X
X I/,(x) I ~'.

(F (x) is often the residual vector obtained when approximating a data set by
a nonlinear form.) The minimum, x*, is contained among the zeros of t/2 Vr (x) =
J (x)TF (x), where Vr (x) denotes the gradient of r and J(x) is the Jacobian
matrix of F.
* On leave 1970-71 at Yale University
** The work of this author was supported in part by the National Science Founda-
tion under Grant GJ-844.
20 Numer. Math., Bd. t8
290 K.M. Brown and J. E. Dennis, Jr. :

Let x o be an initial approximation to a minimum point of ~b. Generate a


sequence of approximations to this minimum point by
x . + l = x.-- [14,1 + J (x,,)TJ (x.) ] - l J (x.) r F (x.)
where/~, is a sequence of nonnegative real constants. We shall refer to this as
the Levenberg-Marquardt or L.M. iteration [t, t l , 12, 14].
The special case in which/~-~ 0 for all n is called Gauss', or the Gauss-Newton,
method [t, 14].
The L.M. method requires the evaluation of the Jacobian of F at each iteration
point. PoweU [t 7] replaces the L.M. iteration with a technique in which Broyden's
single rank update [5] is used to approximate J (x.). We will use the more obvious
approach of approximating J(x,~) by the corresponding matrix of difference
quotients.
Let h be a real number and let AF(x, h) denote the matrix whose i-th, j-th
element is given by
h (x + h ui) - / ~ (x),
where u i is the j-th unit vector. We consider the following finite difference
analogue of the Levenberg-Marquardt algorithm (f.d.L.M.)
x,,+1 = x,, - - Lu,,I + h; 2A F (x., h.) TA F (x., h.)]-I h~lA F (x., hn)TF (x~)
= x,,--h,, [h~14,I + A F ( x . , h.)rAF(x,, h.)]-lAF(x~, h~)TF(x.).
The finite difference analogue of Gauss' method (f.d.G.) results when/~. = 0 for
each n.
In the sequel we show that if/z. and [h.[ are o(liF(x.)lh) then f.d.L.M.
converges quadratically whenever I IF(x )ll 0, moreover, we show that when-
ever ]h.[ is 0 ([ [F (x.) [[,) and [IF (x.) [[2 g* 0, f.d.G, is quadratically convergent.
We assume that h-lAF(x,h)-*J(x) as h-*0 so o-1AF(x,O)=--J(x) is a
notational convenience.

3. Convergence Results
It is well known (see [t2] and t4]) that the nonnegative sequence {/z,} can be
chosen so that $ (xn) forms a monotone decreasing sequence. If we make some
standard assumption, for example, that the level sets of $ are bounded and we
are careful in our choice of {#~} then some subsequence of {x,} is convergent to
an extreme point, say x*, of $. If the method is locally convergent, as soon as
some term of the convergent subsequence gets sufficiently close to x*, the entire
sequence will converge to it.
Lemma 1. Let D O be an open convex set and let K > 0 . If, for all x, yED o,
[[J (x) -J(y)[Is ~K[Ix --Y[[~,
then the following inequalities hold for every x, y EDo:
i) [[J (x) r - - J ( y ) T [I, <__K[[x--y[],
ii) there exist nonnegative constants C and C' such that
[[J(x) -J(y)[]l <=C. K [ [ x - y [ h
Derivative Free Nonlinear Least Squares 29t

and
][A[[~--C'][A[[1
/or any real rectangular matrix A.
iii) [ ] F ( x ) - - F ( y ) - J ( y ) (x--y)][l <=(CK/2)[[x--y[] ~.
Proo/. A proof of (i) follows from the fact that IIA II* =llA~ll, for any rectan-
gular matrix A. This in turn follows from page 54, [22].
Statement (ii) follows from a straightforward application of the equivalence
of the [1 and t~ vector norms.
Statement (iii) can be found in [7].
The following result is inspired by and closely related to the work in [3].
Theorem 1. Let J(x*)TF(x *) = 0 for some x*ED o and let
]]J(x)-J(y)][2<=K]]x-y[] , for x, yeD o.
If KHF(x* ) ]]2 is a strict lower bound for the spectrum of J(x*)Tj(x *) and {/,,}
is any bounded nonnegative sequence of real numbers, then there is an ~ > 0 such
that if every [h,~[< ~ and h, ,--> 0, the f.d.L.M, method converges locally to x*.
Proo/. For each xED o, let ~(x, h) denote the least eigenvalue of
h-*AF(x, h)TAF(x, h).
For ]hi sufficiently small and x in some smaller neighborhood (than Do) of x*, )Lis
a nonnegative jointly continuous function of x and h. Since by hypothesis
A(x*, 0) > K[IF(x*)II, > o ,
we can find 7' > 0 and D 1, an open convex neighborhood of x*, such that D1 m D o
and for xED 1, Ihl <=~', Z(x, h) > Z' =KI[F(x* ) II, + (Z(x*, 0) --KI]F(x*)11,)/2.
Let B denote the uniform bound on
[[h-XAF(x,h)Z][,. for xr 1, he[--~',V'].
(B exists by continuity.) Now set

and
L.~.I + hT~*AF(x,,, h.)r AF (x,,, It.)
and notice that the smallest eigenvalue of L . i s / t . +X(x., h.) ~ # . +X' > 0. Hence
L;~I exists and its [, norm is bounded by (/~. + $,)-i; moreover x.+ I (obtained
from the f.d.L.M, iteration) exists and, for some 0 E (0, I) we can write:

-~ (m + Z')-I{IIL. (~) - h : ' A F ( ~ . , h.)Tl(x.+O( x* -- ~.))II,


9l l ~ . - : l l ~ + l l J ( : ) - h : A F ( ~ , , h.)ll~ 9 I:(:)11..}
< (m + z')-l{[m + BIIh;,~AF(x., h.) - J (x.)II,
+BllJ(x,, ) -l(x,,+O(x*-*.))11,]. ~,, + Ell/(x*) - J ( x . ) II,
+ [l/ix.)--k;,~AF(x,,, h,) II,]llv(:) I1,}.
20*
292 K.M. Brown and J. E. Dennis, Jr. :

It is shown in [7] that

IIh;~AF(~,,, h.) - J (x.)ll~ < C Klh.l/2.


Hence by Lemma t
e,+ x ~ (l~. + ~.')-~(iz,, + B Ke,, + B C' C KIh.II2 + KIIF (x*) I[,)e.
(1)
+ ~ +z')-,c' CKIIF (~*)ll,lh.ll 2.
Now choose r > 0 and ~ " > 0 so small that N(x*, r) c 191, ~7" <~' and
a (,~*, o) --KIIF (x*) II, > 2 B K r + B C' C K~I".
This insures that ~ < t, where
8-----sup (/~ + 2 ' ) -x (/G + B K r + B C ' C K ~ " / 2 +KI[F(x*)[]~).
n

Finally select ~ < ~ " such that


2 (1 - ~ ) ~'r
for IIF(~*)ll,.o,
< C, CKIIF(**)II,,
or
~ 2 (2'-- B K r )
-- BC'CK , if IIF(~*)I],=0.
Now let us assume that for n > 0 , we have e,~_r and [ h , l ~ / . From (t) we
obtain
e,+l ~ e , + (l --~) r < r .
This shows that the f.d.L.M, iteration is locally well-defined. The convergence of
{x,} to x*, as long as h.-->0, follows directly from the discussion on page 4t of
Traub [2t].
Remark 1. For computational convenience, if some estimate of F (x*) is known,
the kt, and h, may be chosen to be O([[F(x,, ) --F(x*)H,,y ). Local convergence
can be proven for this choice of the parameters; moreover, as we shall see later,
this choice guarantees quadratic convergence whenever F ( x * ) = 0 .
CoroUary 1. Under the hypotheses of the precious theorem, the L.M. iteration
converges locally for any bounded nonnegative sequence {/~,} and the convergence
is of order at least one.
Proo]. The L.M. iteration is the special case of the f . d i . M , iteration with
h, = 0 for every n. Hence ~/ can be taken to be zero and from the proof of the
previous theorem, e,+ 1 ~ t~e, and so the convergence is at least linear.
If F ( x * ) = 0, the situation simplifies considerably.
Theorem 2. Let F satisfy the differentiability assumptions of Theorem t in a
neighborhood D o of x*, a zero of F. If {/4,} is a bounded nonnegative sequence,
there exists an , / > 0 such that if [h,] ~ , f.d.L.M, converges locally. If, in addition,
[k.I a n d / z , are 0 ([[F(x,)[I,,y) the method is quadratically convergent.
Proo/. The proof is the same as the proof of Theorem t until equation (I).
Eq. (t) now becomes
(t') e,+x ~ ~ , +2') -x ~ , , + B K e , + B C ' C K I h , , [ I 2 ) e , , .
Derivative Free Nonlinear Least Squares 293

The local convergence is trivial. To see that the convergence is quadratic,


note that since there is a uniform upper bound on [IJ (x) ll~ for 11x - x*[I ~ ~ r,

lk~(~.) II, = tlF(~.) -F (~*) Ih

where xE [x,, x*] ~ N (x*, r). Hence, using the above and the order equivalence
of norms, Ih~l and/z, are 0(e,) and so from (t'), e,+ 1 =0(e~).

Corollary 2. Under the hypotheses of Theorem 2, the L.M. iteration converges


quadratically.

R e m a r k 2. Theorems I and 2 and Corollaries t and 2 hold for the f.d.G, and
Gauss methods respectively by merely setting/,~ = 0 in each statement and proof.

4. Numerical Results
In some twenty numerical experiments run, the f.d.L.M, algorithm behaved
almost identical to its analytic counterpart (the L.M. method). When convergence
occurred, both algorithms converged at the same rate; when one of the algorithms
diverged (or failed to converge in the prescribed number of iterations, t00) its
analogue usually did likewise. The same behavior was observed for the f.d.G.
algorithm and Gauss' method. These results answer in the affirmative sense one
of the questions posed by Bard [t]. We give a portion of the computational
investigations below and show a comparison with other methods based on the
Box survey paper [4]. All computations were done in APL\360 on an IBM 360/50.
(The APL\360 system performs all calculations in double precision.)
Although Theorem 2 dictates that hn and/~n be chosen as 0(llF(x,)ll,n~),
these choices must be tempered in the actual implementation of the algorithm to
prevent an absurd choice of hn relative to xn; e.g., suppose Ilxoll~ =0.00a but
IIF (~0) 11~ =to00. In the computer program the algorithm was implemented as
follows:
x.+l = x. - [ ~ z + J (x., h.) r J (x., h~)]-' J (x., h.) r F (x.),

where the i-th, i-th element of J (x~, hn) is given by

h (x.+ h~=~) - h(-.)


h,.
where again u i denotes the j-th unit column vector and where/~, and h. were
chosen according to the rules:

~=c. IIF(x~)ll~,
t0 whenever t0 <IIF(~,)II=
c= whenever t <llF(x.)ll~ <to
[ 0.04. whenever I IF(~.)II= ~ a ;
294 K . M . Brown and J. E. Dennis, Jr. :

where h , a n d 0. are vectors of l e n g t h N a n d t h e superscript/" denotes t h e / ' - t h


c o m p o n e n t of these v e c t o r s a n d where, moreover,

0,: It0-s if Ix' l <t0-e


=to.oot• if lo-e<lx'nl;
t h a t is, t h e c o m p o n e n t s , xin, of x~ are i n c r e m e n t e d b y (possibly) different a m o u n t s ,
h l , as o p p o s e d to using t h e u n i f o r m i n c r e m e n t hn of T h e o r e m 2. A s y m p t o t i c a l l y ,
however, t h e definition of the h / assures us t h a t t h e conditions of t h e t h e o r e m will
be m e t as t h e zero x* is a p p r o a c h e d .
R e m a r k 3. T h e selection p r o c e d u r e for c (and h e n c e / z , ) is m o t i v a t e d b y t h e
fact t h a t t h e m e t h o d of descent [14] has global convergence p r o p e r t i e s n o t h e l d
b y Gauss' m e t h o d . W h e n one is far a w a y from t h e solution/~, is chosen to be large
in o r d e r t o weight t h e descent p a r t of t h e correction. As t h e i t e r a t e s proceed
t o w a r d t h e s o l u t i o n / ~ is decreased to weight t h e Gauss p a r t of t h e correction.
W h e n we are far from t h e solution we are i n t e r e s t e d in s t a b i l i t y ; when we are
close, s t a b i l i t y is g u a r a n t e e d b y t h e local convergence results a n d we strive for
r a p i d i t y of convergence.
Example.
(2) r (Xl, X$, XS) = Z [( e-xtp -- e--ZiP) --X3 (e-p -- e--10p)] 2
p

where t h e s u m m a t i o n is over t h e values p = 0 . t (0.1) t.0. This p r o b l e m has a


zero residual a t (t, t0, 1), (10, t , - - t ) a n d w h e n e v e r x 1 = x 2 w i t h x 8 = 0 .
C a s e A . T h e m i n i m u m of r 1, x 2, t) is to be found; i.e. x 3 is held fixed a t
xa=l.

Table A. Number of function evaluations required to reduce ~ to less than 10-5 (Case A :
2 dimensions)

Method Starting point x a and ~ (x0)

(0, 0) (0, 20) (5, 0) (5, 20) (2.5, 10)


3.064 2.087 19.588 1.808 0.808

Swann [20] 78 57 23t 53 6


Rosenbrock [t8] 96 68 t03 t03 109
Nelder and Mead [13], 41 43 39 39 4o
Spendley, Hext and Himsworth [19]
Powell (1964) [15] 64 51 84 77 23
Fletcher and Reeves [9] 93 t44 102 84 21
Davidon [6], 51 48 45 63 27
Fletcher and Powell [8]
Powell (1965) [16] 38 22 46 29 12
Barnes [2] 27 15 F 36 24
Greenstadt I [10, p. t4] 73 47 F 49 t2
Greenstadt I [t0, p. t6] 46 53 F 54 17
Gauss F F F F t6
F.D.G. F F F F 16
L.M. 22 25 25 31 t6
F.D.L.M. 22 25 25 31 t6
Derivative Free Nonlinear Least Squares 295

Case B . T h e m i n i m u m of $ (xl, x~, x3) as g i v e n i n (2) is t o b e f o u n d .

Table B. Number o] ]unction evaluations required to reduce q~ to less than 10-5 (Case B:
3 dimensions)

Method Starting point x o and ~ (x0)

Swarm [20] 448 F F 3t 3 F F 25 F F


Rosenbrock [t8] 347 281 200 198 292 350 273 460 246
Nelder and Mead [13], 119 128 112 73 1t0 307 79 164 3t5
Spendley, Hext and
Himsworth [t9]
Powell (t964) [15] 78 F F 56 F F 19 F F
Fletcher and Reeves [9] 564 1t2 104 56 92 92 344 608 188
Davidon [6], 92 68 t44 24 t48 140 96 148 t40
Fletcher and Powell [8]
Powell (t965) [16] 34 N.R. N.R. 29 28 28 43 46 33
Barnes [2] 42 N.R. ixl.R. t5 48 N.R. 37 51 59
Greenstadt I [10, p. 17] 58 31 46* 26 64 8t 8* 84 92
Greenstadt I [10, p. 18] 54 45 32* t7 10t 90 4* t03 t06
Gauss 21 21(a) F 17 17 17 21 21 21
F.D.G. 21 21(a) 13 (b) t7 17 17 21 21 21
L.M. 41 33 4t(e) t7 41 93 41 61 ~09
F.D.L.M. 4t 33 41(e) t7 41 93 41 6t t09

Notes concerning the tables.


t. The starting points used are given in t h e tables.
2. For the Gauss, f.d.G., L.M. and f.d.L.M, methods, the n u m b e r of evaluations was
arrived a t b y counting "one" each time F was evaluated and " N " each time f (x) or J (x, h)
was evaluated.
3. Unless noted, convergence was to the extremum (t, 10) in Table A and to (t, 10, 1)
in Table B. The exceptions are identified as:
* convergence to another solution,
(a} convergence to (t0, 1, --1)
(bl convergence to (0.338 .... 0.338 .... 2 . 9 E - - I 3 )
(e) convergence to (1.43 .... 1.43 .... 7.14E--13).
4. The notations " F " and " N . R . " mean failed and not reported respectively.

R e m a r k 4. T h e c o n v e r g e n c e o b t a i n e d b y t h e G a u s s a n d f . d . G , m e t h o d s t o t h e
e x t r e m u m (t 0, 1, - - 1 ) is i n t e r e s t i n g b e c a u s e B o x s t a t e s [4, p. 69] t h a t t h i s o p t i m u m
h a s n e v e r b e e n f o u n d w i t h a n y c o m b i n a t i o n of t h e e i g h t m e t h o d s a n d all s t a r t i n g
guesses which he tried.

R e m a r k a. I n o r d e r t o see j u s t h o w close t h e f . d . L . M , a l g o r i t h m a p p r o x i m a t e s
t h e L . M . m e t h o d w h e n o n e u s e s t h e c r i t e r i a f o r s e l e c t i n g h~ a n d / ~ , g i v e n a b o v e ,
:296 K . M . Brown and J. E. Dennis, Jr. :

we list below t h e error v e c t o r s I1 -x* II~ p r o d u c e d b y each m e t h o d for t h e


s t a r t i n g guess (0, t0, 20).

F.D.L.M. L~
t.858t02805E0t 1.858t0 2806E0t
1.8t 716 4122E01 t.817t64391E0t
1.77806 1116E01 1.77806 2546E0t
1.74218 6983 E 01 1.742t9 t 53t E01
1.71019 2030E01 1.710199933E01
1.68023 4391 E01 1.68024 2170E0t
1.65076 3651 E01 1.65076 9860E01
1.62143 6158E01 1.62t 44 0580E01
1.592190436E0t 1.59219 3053E0t
1.56301 4585E01 t.56301 5403E01
1.3t 709 9176E01 1.31708 9027E 01
1.08275 9847E01 t .08273 83t2E01
8.62326 5064E00 8.62295 3369E00
6.58882 6077E00 6.58844 2804 E 00
4.76392 5553E00 4.76350 3998E00
3.19750 9939E00 3.19709 3467 E 00
1.94335 0326E 00 I. 94299 0431 E 00
1.15671 7605E00 1.15668 7275E00
7.70363 2711E--01 7.71008 07t8E--01
7.O7093 2946E--01 7.07718 6346E--0t
3.161890351E--0t 3.16772 2772E--0t
2.75553 9832E--02 2.77907 5637E--02
2.03121 t656E--04 2.10392 4422 E - - 04
1.265326821 E--08 1.43025 5381E--08

N o t e how n i c e l y t h e a s y m p t o t i c q u a d r a t i c convergence p r e d i c t e d b y T h e o r e m 2
is borne o u t b y this e x a m p l e .

Conclusions.
F . d . G . m a y be used i n s t e a d of Gauss' m e t h o d a n d f.d.L.M, i n s t e a d of t h e L.M.
m e t h o d w i t h little, if any, difference in t h e q u a l i t a t i v e results. T h e L.M. a n d
f.d.L.M, m e t h o d s a p p e a r m o r e s t a b l e t h a n t h e Gauss a n d f.d.G, m e t h o d s ; however,
w h e n t h e l a t t e r do converge t h e y seem to be m u c h f a s t e r t h a n t h e L.M. a n d
f.d.L.M, a l g o r i t h m s .

References
1. Bard, Y. : Comparison of gradient methods for the solution of nonlinear p a r a m e t e r
estimation problems. SIAM J. Numer. Anal. 7, t 5 7 - t 8 6 (t970).
2. Barnes, J. G. P. : An algorithm for solving nonlinear equations based on the
secant method. The Computer Journal 8, 66-72 (1965).
3. Boggs, P . T . , Dennis, J . E . , Jr. : Function minimization b y descent along a
difference a p p r o x i m a t i o n to the gradient. To appear.
4. Box, M. J. : A comparison of several current optimization methods, and the use
of transformations in constrained problems. The Computer Journal 9, 67-77
(t966).
5. Broyden, C. G. : A class of methods for solving nonlinear simultaneous equations.
Math. Comp. 19, 577-593 (1965).
6. Davidon, W. C. : Variable metric methods for minimization. A. E. C. Research
and Development Report, ANL-5990. (t 959) (Rev.).
Derivative Free Nonlinear Least Squares 297

7. Dennis, J . E . , Jr.: On the convergence of Newtonlike methods. To appear in:


Numerical methods for nonlinear algebraic equations, ed. P. Rabinowitz. London :
Gordon and Breach t 970.
8. Fletcher, R., Powell, M. J. D. : A rapidly convergent descent method for minimiza-
tion. The Computer Journal 6, 163-t68 (t963).
9. - - Reeves, C. M. : Function minimization by conjugate gradients. The Computer
Journal 7, 149-t 54 (t964).
l o. Greenstadt, J. : variations of variable-metric methods. Math. Comp. 24, t-:22
(t970).
11. Levenberg, K. : A method for the solution of certain nonlinear problems in least
squares. Quart. App1. Math. 2, t 6 4 - t 6 8 (t944).
12. Marquardt, D. W. : An algorithm for least squares estimation of nonlinear para-
meters. SIAM J. App1. Math. 11, 43t-44t (t963).
t 3. Nelder, J. A., Mead, R. : A simplex method for function minimization. The Com-
puter Journal 7, 308-3t3 (1965).
14. Ortega, J.M., Rheinboldt, W. C. : Iterative solution of nonlinear equations in
several variables. Chapt. 8. New York: Academic Press 1970.
! 5. Powell, M. J. D. : An efficient method of finding the m i n i m u m of a function of
several variables without calculating derivatives. The Computer Journal 7,
t 55-t62 (t964).
16. - - A method for minimizing a sum of squares of nonlinear functions without
calculating derivatives. The Computer Journal 7, 303-307 (t965).
t 7. - - A hybrid method for nonlinear equations. U.K.A.E.R.E. Technical Paper
No. 364, January. t969.
18. Rosenbrock, H. H. : An automatic method for finding the greatest or least value
of a function. The Computer Journal 3, t 75-184 (1960).
19. Spendley, W., Hext, G. R., Himsworth, F. R. : Sequential applications of simplex
designs in optimisation and evolutionary operation. Technometrics 4, 44t-461
(t962).
20. Swarm, W. H. : Report on the development of a new direct searching method of
optimisation. I.C.I. Ltd., Central I n s t r u m e n t Laboratory Research Note 64/3
(1964).
2t. Traub, J. F. : Iterative methods for the solution of equations. Englewood Chffs:
Prentice-Hall t964.
22. Wilkinson, J. H. : The algebraic eigenvalue problem, p. 54. Clarendon: Oxford
t 965.

K e n n e t h M. Brown j . E. Dennis, Jr.


Associate Prof. of Computer Science Associate Prof. of Computer Science
I n s t i t u t e of Technology CorneU University
University of Minnesota Ithaca, NewYork 14850
Minneapolis, Minnesota 55 455 USA
USA

You might also like