Professional Documents
Culture Documents
The Convergence of Quasi-Gauss-Newton Methods For Nonlinear Problems
The Convergence of Quasi-Gauss-Newton Methods For Nonlinear Problems
27-38, 1995
Pergamon Copyright@1995 Elsevier Science Ltd
Printed in Great Britain. All rights reserved
0898-1221/95 $9.50 + 0.00
0898-1221(95)00027-5
The Convergence of
Quasi-Gauss-Newton Methods
for Nonlinear Problems
S. KIM t
Department of Mathematics, Ewha Women's University
Dahyun-Dong, Seoul, Korea, 120-750
skim©ams, sunysb, edu
R. P. TEWARSON*~
Institute For Mathematical Modeling, Department of Applied Mathematics
State University of New York, Stony Brook, Stony Brook, NY 11794-3600, U.S.A.
t ewarson©ams, sunysb, edu
1. INTRODUCTI(~N
In t h i s p a p e r , we consider m e t h o d s for finding a solution, x* say, to a n o n l i n e a r s y s t e m of a l g e b r a i c
equations
f ( x ) -- 0, (1)
w h e r e t h e function f : R n ---* R n is n o n l i n e a r in x E R n.
T h e classical m e t h o d to d e t e r m i n e x* for (1) is t h e N e w t o n m e t h o d , which a p p r o x i m a t e s f i ,
i = 1 , . . . , n , by a linear function. T h u s ,
f ( x + s) = f ( x ) + J ( x ) s + 0 (lls][2) ,
J(x)s = - f ( x ) ,
or, equivalently, b y solving t h e n o r m a l e q u a t i o n
m i n i m i z e ~ [ [ f ( x + s)[t~.
Typeset by A.h/~S-TEX
27
28 S. KIM AND R . P . TEWARSON
One well-known approximation for J(x) is by updating the initial Jacobian at each step with
Broyden's update. It has been shown in [1] that using L D L r factorization of B T B leads to more
superior computational results than the so-called S Q R T method for a given set of test problems.
It is also shown that, if the modified Cholesky factorization in [2] is used, the number of operations
is reduced from O(n 3) to O(n 2) + n. In this paper, a convex combination of Broyden's update
and a weighted update is used for the Jacobian approximation B in (3) instead of Broyden's
update. This leads to a better convergence rate. We now describe Broyden's update and its
convex combination with another update.
Jacobian Approximations
Solving the systems of nonlinear equations (1) involves the computation of the Jacobian. It is
known that the computation of the Jacobian is expensive, especially when functions are difficult
to evaluate. The Jacobian approximations have been widely used to save time. One of the most
successful approximations is known as Broyden's update [3,4].
Using linearization, we have
Let xk be an approximation to x*, Bk ~, the Jacobian at the k th step and x* = xk + s. Then the
k th step is
At the (k + 1) th step,
X* ,~ Xk+l : - 2:k Jr- 8
or
X k = X k + 1 -- S
Since Bk+lS can be written as (Bk + A B ) s , from the last equation, we have
A B s = f (xk+l). (4)
f (Xk+l) s T
AB1 - sT s (5)
Since - B T f ( x ) in (3) is the steepest descent direction computed at each iteration, we will
utilize this information in approximating the Jacobian to get a better estimate. A solution of (4)
is
sT B T B tT
AB2 = f ( x k + l ) s r B T B s - f (xk+l) sT t,
where t = - B T f .
Quasi-Gauss-Newton Methods 29
We now combine two updates to approximate the update A B to the Jacobian. It was shown
in [5] t h a t this leads to a better u p d a t e The convex combination of the updates is
A B = (1 - #)AB1 + # A B 2 (6)
[IABII[F blAB211F
IIABIlIF- [[~B2[[F = ~ II~B211F,
sTt)
therefore, p = 7 - / i ~ '
Next, we describe how the equation (3) can be solved effectively when the update is given
by (5).
2. Q U A S I - G A U S S - N E W T O N METHODS
In this section, we describe how the L D L T factorization in [2] can be utilized for solving (3)
with the Jacobian approximation given in the previous section.
The method uses an algorithm in [2], which is for a symmetric matrix A modified by a sym-
metric matrix of rank one,
= A + azz r (7)
and finds the Cholesky factors of A = LD---LT from the factors of A = L D L T. If A is replaced
by B T B in (7) and B T B is modified by a rank one update, then
D + ceppT = LDL T,
the required modified Cholesky factors are of the form,
B r B = LLD~-CL r.
Therefore,
L: LL, D = [9.
Initially, the orthogonal factorization of B is such that B T B = R T R and initial L and D can be
obtained from R T R . The algorithm for updating L and D is:
ALGORITHM 2.1.
Do for j = 1 , . . . n:
pj = wJ j),
dj = dj + ajpj,2
aj
ctj+ 1 = dj~-~j
Do for 7 " = j + 1 . . . . . n.
W(j+l) ---- W}j ) -- pjlrj
m --8 T
If B = B + ~ is used in (8),
-- --T
s? T ?sT
,-s + B + - - (9)
8-78 s T s "
Prom the above equation, we can see that B T B is modified by a rank-2 update and (9) can be
rewritten as
-~-c-~ = BT B + z l z [ - z~z~,
where
B r 7 + (1 -- f T 7 / 2 ) sT
Z1 -- V'~
and
7-;
Z2 = X'~
The algorithm for Quasi-Gauss-Newton method [1] using Broyden's update is as follows.
ALGORITHM 2 . 2 .
Given f : R n --+ R n, xo c R n, Bo E R n x n .
Get QoRo = Bo
L0 from /~T
Do = (r121,...,r2nn).
We will first describe the Quasi-Gauss-Newton method using the convex update:
A B = (1 - #)AB~ + #AB2,
where z = (1 - #);4-; + # F ~ " If (10) is used for the Jacobian approximation in (8), then we are
led to an updating scheme to get B T B from B-CB as follows.
Quasi-Gauss-Newton Methods 31
I f we let
-t+(1-r/2)z and z2 = -~ - (i + ~/2)z
Z1 =
v~ v~
then
-~m-~ = B T B + z i z • - z2zT2 •
PROOF.
-~T-~ = ( B + A B ) m ( B + AB)
= BTB + ABTB + BTAB + ABTAB.
Since B = ( B - AB),
Similarly, A B T B = -zt T - "l-zz T. From A B T A B = r z z m and the above equations, (11) follows.
In view of
and
z2z;= + + + k---c--) zz j,
we have
Zl zT -- Z2Z T : --tZ T -- zTt T _ TZZ T
must be solved for s and this involves O(n a) operations per iteration, we apply the techniques in
Algorithm 2.2 for implementing this method. The initial L and D are obtained from
B T B = R T Q T Q R = R T R,
D~,i = (r~i),
then L is obtained from R T by dividing the ith row of R T by the ith diagonal element of R,
i= 1,...,n.
Algorithm 2.1 is for rank-1 update and B T B is rank-2, as shown in Lemma 2.3, hence, Algo-
rithm 2.1 will be applied twice. The algorithm for the proposed method is as follows.
32 S. KlIvl AND R. P. TEWARSON
ALGORITHM 2 . 4 .
Do for k=l,...:
Solve L k D k L ~ s k = - B kT f ( x k ) forsk,
Xk+ 1 : = X k 4" 8k~
Yk : = f ( X k + l ) -- f ( X k ) ,
tk := - B [ f ( x k ) .
T
1?k+l := 1?k + (1 -- #) (Yk -- BkSk) S k (yk - 1?ksk) t~
T +#
s k 8k t kT sk
Get L b L T, L D L T by Algorithm 2.1.
3. C O N V E R G E N C E A N A L Y S I S
In this section, we prove that the methods defined by (9) and (11) are well defined and converge
to a solution of (1). We also give a comparison of the convergence rates of two methods.
Convergence of QGN M e t h o d
THEOREM 3.1. (THE BOUNDED DETERIORATION THEOREM).Let D C_ R '~ be an open convex
T
set containing x , g , with x C x*. Let f : R n ~ R n, B E R n x n , B B defined by (9). I f x* E D
and J ( x ) obeys the weaker Lipschitz condition,
IIJ(x)-J(x*)ll<_711x-x*ll, forallxED,
then, for both the k2robenius and 12 matrix norms,
(-~. _ j (~,))m (-~ _ J (x*)) < [lib - # (=*)11 + ~7 (11~ - x*ll~ + I1:~ - x, ll~)] 2 (12)
-~ T -~ _ j T. -~ _ -~ T j + j r . j . = B T B _ jT. - ~ _ - ~ T j + j r . j . + B T (Y - B s ) s T
ST8
+ s ( y - B s ) s B + s ( y - B s ) s (y - B s ) s s
sT s 8T s sT s
- - - 17 - z. + (y : B s ) s s ]
8T8 sT8 J (i3)
= (l?-J,) I-sTsj + sT s
Using
-- sTs 2 = 1,
and
7
Ily - Y, sll2 -< 7 (11~ - z, l12 + IIx - x.ll2)Ilsl12
LEMMA 3.2. Let s E R n be nonzero, E E R nxn, and let II • II denote the 12 vector norm, then,
(, __ ssT ~T E T E (1
7;) -
88T ~
T ZlIF
: (IIE..I.Eii,~
t, -
ETE88T 2~(1/2)
F/
1 (lls~ssll=)
--<IIETEIIF211ETEIIF\ Ilsll 2 -
PROOF. W e have
and
ETE ( I ssTh
--8TS) F
= (NETEI[~-- ETESs-~s F)
T H2 \ ( 1 / 2 )
<NETEIIF 21IETEIIF(llsTs~l]
\ ~) 1
NsJl = '
+ (llYk--Y*sk[12~ 2
IIskN~ )
29:$-D
34 S. KIM AND R. P. TEWARSON
Using Ily-J.s~ll2
IIs~l12 -< (3'/2)(1lek112 + Ilek+lH2), Lemma 3.2, and Ilek+111 < (1/2)llekll,
From the proof of the linear convergence, I[ETEk[IF _< 452 and HEk[[ _< 25 for all k _> ,
E~°=o Ilekll -< 2~, and ~-~°=o Ilekll 2 ___(4/3)e,
£T2EHEkEkSkH
k=0 Ilskll 2
<452
-
~ Ile II +
IIET EolIF - IIET,+,E +,IIF + 353",_.,
k=0
93'k~=~0]Ilekll 2
C o n v e r g e n c e of t h e P r o p o s e d M e t h o d
Next, we show the convergence of the method defined by (11) by starting with the bounded
deterioration theorem.
THEOREM 3.4. (THE BOUNDED DETERIORATION THEOREM). Let an the assumptions of The-
orem 3.1 hold and B E R nxn, ..~T-~ defined by (11). If x* E D and J(x) obeys the weaker
Lipschitz condition, then, for both the Frobenius and 12 matrix norms,
J(x*)) T ( B - J(x*)) F ~ [lib - J(x*)IIF + ~3" (11~-- x*l12 + Itx -- x*l12)] = (15)
= [B -- J. + (I -- #) (y -- Bs)sT
s + p (y --tTs
Bs)tT1T
j B - J.(l - it) (y -- Bs)SsTsT + ~(Y - B~)t~ l
tTs J
T
sT s sT s '
(16)
Quasi-Gauss-NewtonMethods 35
ttT ] Then,
where P = 85T [(1 - # ) I + FrT].
Ily J, sll2] 2
( ~ _ j,)T (~ _ j.) F ~ [II(B- J,)(I - P)IIF +
-
T~g
Using
III - PIb = i,
and
7
Ily- J, sll2 < ~ (113- x*l12 + IIx - x*l12)Ibl12,
we have (15).
We need the following lemmas to prove the convergence of the method defined by (11).
[
LEMMA 3.5. I f P = ~88T (1 - # ) I + t-~tJ,
tts] where t = --BTf and # = s-v-/T~,
(sit)2 then ( I - P ) ( I - P ) T
is a projector.
PROOF. We first show that [ ( I - P ) ( I - p)T]2 = (I-- P ) ( I - p ) T . Using p p T = (1 - #2 + #) 7-;~,
ss T
-_ IIETE(I - P)l12~.
Since p2 = p , we have
1 -/,2 + # IIETE~II 2
= IIETEII~ 2 IIETEIIF Ilsll 2
from IIETEIIr _ IIETEPIIF.
36 S. KIM AND R . P . TEWARSON
THEOREM 3.7. Let all the assumptions of Theorem 3.1 hold. Let E = Bk - J.. Then, {xk} gen-
erated by Algorithm 2.4 converges superlinearly.
PROOF. From (16),
II(Yk - J , sk)ll2
IIEL~Ek+IIIF <-II(I- p)m E - ~ E k . ( I - P)IIF + 2 llEk(1- P)IIF
llskll2
+ /'ll(Yk- J, sk)ll2 2 )
By I(yk-J.s,)ll=
8k 2
< (7/2)(11ek112 ÷ Ilek+lll2) and Ilek+lll < ~ ,
--
we have
97
IIEL~Ek+IIIF <~ll(z- p ) T E [ E k ( I -- P)IIF ÷ ~ llEk(/- P)IIF llekll ÷ ~ llekll 2
97
= IIE-:Ek( z - P)IIF + ~ llEk( I -- P)IIF llekll + ~ llekll2
97
IIE~÷IE~+~II~-<IIE~E~II~ I - . 2 ÷ . [lE[Ekskl[
211E-[Zkll~ lls~II 2
2÷ llEkllFlle~ll÷~llekll 2.
Since IIEklIF < 26 for all k > O, ~-~k°°__ollekll _< 2e, and Ek°°__ollekll u ___(4/3)e,
IIE/E~s~II2
llskll 2
<
- (1_#24#) EIlE/E~II~ -IIELIEk+IIIF ÷ 30'6 llekll ÷ ~ llekll 2
l •
Summing for k = 0, 1 , . . . , i,
T 2 462
HE:EollF --HE~IEi+IHF + 376 E [tekH + ~-~ [[ek[[2
k=o llskll 2 -(I-.2+,) k=0 k=0
462
<
- (1_#2+#) I1~:~o11~+ 667~+ ~] (17)
462
<
- (1 - t, 2 + t*)
is finite. This implies (14). Therefore, Algorithm 2.4 converges superlinearly. Since (1--]~2-~-~) ~ 1,
the bound for the new algorithm is smaller than that of Theorem 3.4.
Quasi-Gauss-Newton Methods 37
C o m p a r i s o n o f t h e C o n v e r g e n c e Rates
We show that the new algorithm's approximation to F'(xk) at each iteration is better than
that of Broyden's. In the next lemma, we compare the bounds of the bounded deterioration
theorems of the both methods.
THEOREM 3.8. I!f Bk and Bk are, respectively, approximations for the Jacobians of the methods
defined by (9) and (11), then,
II(Yk-1- J.s)ll2] 2
and
:E (I IIEsll
Since#~ land 1-#2+#>_l, wehave
Computational Results
Results of computational experiments are summarized in Table 1. The computations were
done on a Sun Sparc II. All the nonlinear nonsymmetric problems in the set of test problems [7,8]
were utilized. The problems are numbered as in [7] and Problem 31 was modified as in [1]. Initial
Jacobians were evaluated numerically by finite differences. In Table 1, the performance of the
proposed method is compared with Quasi-Gauss-Newton method when n = 100. It is clear from
Table 1, that the proposed method shows better performance than Quasi-Gauss-Newton method.
This agrees with the proof in the previous section. It may appear that the number of operations
for the proposed method is larger than that of Quasi-Gauss-Newton method, since it combines the
two updates for the Jacobian approximation, however, Quasi-Gauss-Newton method also includes
the operations to compute t = - B r f according to (3). Hence, the number of operations for the
two methods is almost same. For all problems tested, the proposed method has the same or
better rate of convergence and run time than Quasi-Gauss-Newton method. For Problem 31, the
proposed method performs better than Quasi-Gauss-Newton method as the scaling gets worse.
38 S. KIM AND R. P. TEWARSON
21 Diverged Diverged
22 19 4.900 19 4.900
26 83 20.740 82 20.630
27 Diverged Diverged
28 2 2.390 2 2.390
29 4 2.810 4 2.810
30 8 3.230 8 3.230
31(1) 14 5.840 14 5.830
32(2) 18 6.500 18 6.490
31(3) 28 8.110 18 6.510
31(4) 30 8.440 22 7.140
31(5) 6 9.420 30 8.450
31(6) 55 12.520 37 9.600
31(7) 139 26.280 44 10.870
REFERENCES
1. H. Wang and R.P. Tewarson, Quasi-Gauss-Newton methods for nonlinear algebraic equations, Computers
Math. Applic. 25 (1), 53-63 (1993).
2. P.E. Gill, G.H. Golub, W. Murray and M.A. Saunders, Methods for modifying matrix factorizations, Math.
Comp. 28 (126), 505-535 (1974).
3. C.G. Broyden, A class of methods for solving nonlinear simultaneous equations, Math. Comp. 19, 577-593
(1965).
4. J.E. Dennis and J.J. Mor@, Quasi-Newton methods, motivation and theory, S I A M Review 19, 46-89 (1977).
5. S. Kim and R.P. Tewarson, A Quasi-Newton method for solving nonlinear algebraic equations, Computers
Math. Applic. 24 (4), 93-97 (1992).
6. J.E. Dennis and R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equa-
tions, Prentice-Hall, Englewood Cliffs, N J, (1983).
7. J.J. Mor~, B.S. Garbow and K.E. Hillstrom, Fortran subroutines for testing unconstrained optimization
software, T O M S 'T, 136-140 (1981).
8. J.J. Mor@, B.S. Garbow and K.E. Hillstrom, Testing unconstrained optimization software, A C M Trans.
Math. Software T, 17-41 (1981).
9. K.L. Hilbert, An evaluation of mathematical software t h a t solves systems of nonlinear equations, A C M
Tran. Math. Soft. 8 (1), 5-20 (1982).