D. C. Sorensen: SIAM Journal On Numerical Analysis, Vol. 19, No. 2. (Apr., 1982), Pp. 409-426

Newton's Method with a Model Trust Region Modification
D. C. Sorensen
SIAM Journal on Numerical Analysis, Vol. 19, No. 2. (Apr., 1982), pp. 409-426.
Stable URL:
http://links.jstor.org/sici?sici=0036-1429%28198204%2919%3A2%3C409%3ANMWAMT%3E2.0.CO%3B2-F
SIAM Journal on Numerical Analysis is currently published by Society for Industrial and Applied Mathematics.
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/siam.html.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.
http://www.jstor.org
Sat Oct 20 12:58:12 2007
SIAM J N U M E R . A N A L . a 1982 S o o e t ) for Industr~ala nd Applted \lathemat~cr
Vol 19, S o . 2, Aprli 1982 0036-14:9'82;1902-0014 $01 00:O
NEWTON'S METHOD WITH A MODEL
TRUST REGION MODIFICATION*
Abstract. A modified Newton method for unconstrained minimization is presented and analyzed. The
modification is based upon the model trust region approach. This report contains a thorough analysis of
the locally constrained quadratic minimizations that arise as subproblems in the modified Newton iteration.
Several promising alternatives are presented for solving these subproblems in ways that overcome certain
theoretical difficulties exposed by this analysis. Very strong convergence results are presented concerning
the minimization algorithm. In particular, the explicit use of second order information is justified by
demonstrating that the iterates converge to a point which satisfies the second order necessary conditions
for minimization. With the exception of very pathological cases this occurs whenever the algorithm is
applied to problems with continuous second partial derivatives.
1. Introduction. The problem of minimizing a real-valued function f of several

real variables is generally attacked by some variant of Newton's method for finding
a zero of the gradient of f. The term variant here is meant to include any method
based upon maintaining an approximation to the Hessian matrix of mixed second
order partial derivatives off. When this matrix is actually computable, then Newton's
method is probably the method of choice for the minimization problem.
As we shall point out in D 3, there are several things to consider when attempting
to provide a practical implementation of Newton's method for general use. Not the
least of these is the problem of forcing convergence of the method when a good initial
guess at the solution is not available. The main purpose of this report is to describe
and analyze a technique for the solution of this problem. The approach we shall
present is well known. It is appropriately called a model trust region approach in that
the step to a new iterate is obtained by minimizing a local quadratic model to the
objective function over a restricted ellipsoidal region centered about the current
iterate. The diameter of this region is expanded and contracted in a controlled way
based upon how well the local model predicts behavior of the objective function. It
is possible to control the iteration in this way so that convergence is forced from any
starting value assuming reasonable conditions on the objective function. In fact, we
shall prove some very strong convergence properties for this method in 5 4. There it
is shown that one can expect (but not ensure) that the iteration will converge to a
point which satisfies the second order necessary conditions for a minimum.
The origin of this method properly lies with the work of Levenberg [17] and
Marquardt [18] for nonlinear least squares calculations. The method was first discussed
in connection with general minimization by Goldfeld, Quandt, and Trotter [14]. Powell
[26] applied the modification in a more general situation of a quasi-Newton iteration.
Hebden [15] made some important computational observations. This paper is most
heavily influenced by the work of More [21] for the nonlinear least squares case.
The current interest stems from several recent efforts to obtain a practical
implementation of a modified Newton method that takes full advantage of the second
order information. Several of the more recent works have attempted to explicitly use
*Received by the editors November 11, 1980, and in final revised form April 23, 1981. This work
was supported by the Applied Mathematical Sciences Research Program (KC-04-02) of the Office of
Energy Research, U.S. Department of Energy under contract W-31-109-Eng-38.
t Argonne National Laboratory, Argonne, Illinois 60439.
410 D. C . SORENSEN
directions of negative curvature to accomplish various tasks such as escape from saddle
points [ 5 ] ,[ 9 ] ,[13],search along more general paths [13],[19],[22],[23],[31],obtain
convergence to points that satisfy second order necessary conditions [13],[19],[22],
[23] etc. We observe along with Gay [ l o ] ,[ I l l that the method proposed here will
accomplish these things in a very elegant and intuitively appealing way.
It is hoped that this report will present a succinct but thorough analysis of this
method. In particular, we feel it is important to clearly describe the theoretical nature
of the locally constrained quadratic minimization in § 2. The analysis given in § 4 is
made sufficiently general to apply to several possible implementations. These
possibilities are described in 8 5 , where particular attention is paid to overcoming a
practical problem of implementation exposed by the theoretical discussion in 9 2. We
make an effort to offer several alternatives to implementation but shall make no
recommendations until there is numerical evidence to present.
2. Constrained quadratic minimization. An important portion of the uncon-
strained minimization procedure presented in rj 3 will be concerned with the solution
of the following problem:
~ e$ (t w ) = f +g T +;w
~ T ~ ~ .
(2.1)
Find p E R" such that 4 ( p )= min { ( ~ ( w 11)w: 115 A}.
In (2.1),B = B T E R'lxn; W , g E [Wfl ;f , A E [W with A > 0 , and 11. /I throughout is the 2-norm.
There are some important subtleties to this problem. The purpose of this section is
to give a complete discussion of the theoretical aspects of problem (2.1) and to expose
the nature of the computational difficulties that may be present.
Several authors have considered problem (2.1)or related problems. This problem
appears implicitly as a subsidiary calculation in Levenberg-Marquardt type algorithms
for nonlinear least squares [17], [18].The computational aspect of this calculation
was fully discussed by More in [21].A relatively early paper by Forsythe and Golub
[ 8 ]considers a closely related problem concerning minimization of the form
(2.2) min { ( x- b l T B ( x- 6 ) :Ilxll= 1).
While their work gives an extensive study of problem (2.2), it is not fully applicable
to problem (2.1) since g E range ( B ) may not hold, and the interior is not considered.
Gander [9]has also done closely related work, but does not explicitly consider problem
(2.1).Problem (2.1)first appeared as a subsidiary calculation in unconstrained minimiz-
ation in the work of Goldfeld, Quandt and Trotter [ l l ] .Hebden [15] made an
important contribution concerning the practical computation of a solution to (2.1).
More recently the problem has been discussed by Gay [ l o ] .H e gives a different proof
of the results in Lemmas 2.4 and 2.8.
If the method of Lagrange is applied to the equivalent problem
(2.3) min $ ( w ) s.t. w T~ 5 h2,
it is a straightforward conclusion of the first order necessary conditions [4, Chap. 21
that p solves (2.3) and hence (2.1) only if p satisfies an equation of the form
( B + AI)p = -g with A ( w Tw - A2) = 0 , where A 2 0 is the Lagrange multiplier associated
with the constraint w Tw 5 h2 .
LEMMA2.4. If p is a solution to (2.1) then p is a solution to an equation of the form
with A 2 0 , A ( wT~ - A*) = 0 and B + A 1 positiue semidefinite.

TRUST REGION MODIFICATION OF NEWTON'S METHOD 411
Proof. It has been noted that p must solve an equation of the form of (2.5) with
A E 0 and A (\vTw- A2)= 0. It remains to show that B + A I is positive semidefinite.
Suppose that p # 0. Since p solves (2.1), it also solves min {*(w): llwll= llpll}. It follows
that *(w) 2 *(p) for all w such that llwll= llpll. This inequality together with (2.5) gives
(2.6) f-PT(~
+AI)W + + W ~ B2
Wf - P T ( ~+ A I ) ~+ + p T ~ p .
Rearranging terms in (2.6) gives
for all w such that llw\l= llpll. Since p #O, it follows readily from (2.7) that B +AI is
positive semidefinite. If p = 0, it follows from (2.5) that g = 0. Therefore p = 0 solves
min.{tw T ~ :w 11 w 11 5 A} and we must conclude that B is positive semidefinite. Since
A 2 0 is necessary, B + A I is positive semidefinite. 0
Lemma 2.4 establishes necessary conditions concerning the pair A, p when p
solves (2.1). Our next result establishes sufficient conditions that will ensure p is a
solution to (2.1). These results are essentially given in [ I l l . However, we wish to
present a statement and proof of these results that is more complete and better suited
to this presentation.
LEMMA 2.8. Let A E R, p E R" satisfy
(2.9) ( B + AI)p = -g with B + A I positive semidefinite.
(i) If A = 0 and llpll5 A then p solves (2.1).
(ii) If llpll= A then p solves
(iii) If A Z 0 and llpll= A then p solves (2.1).

If; in firct, B + A I is positive definite then p is unique in each of the cases (i), (ii), (iii).
Proof. If A, p satisfy (2.9) then
holds for any w E Rn. It follows that
Statements (i), (ii), (iii) are immediate consequences of (2.11). The uniqueness state-
ment follows from (2.10) because the inequality is strict when B + A I is positive definite
and w # p . 0
The solution of problem (2.1) is closely related to solving the nonlinear equation
(2.12) 4(a)= A, where 4 ( a ) = ll(B + a ~ ) - ' ~ l l .
Using the eigensystem of the symmetric matrix B together with the invariance of 11. ( 1
under orthogonal transformations it is easy to show if g # 0 that c$'(cu) is a rational
function with second order poles all belonging to a subset of the eigenvalues of -B.
Since lirn,,,+,4(a) = 0, it follows that (2.12) has a solution whenever A > 0 and g # 0.
We can construct a solution to problem (2.1) using a particular solution of (2.12).
Let A be the smallest eigenvalue of B ; let S I = {q E Rn:Bq = Alq}; let I3 be the largest
root of (2.12) when g # 0 and I3 = 0 when g = 0. If there is any q E S1such that gTq # 0
then I3 > - A l must hold. If ~ E S : then - A l is not a pole of 4. Thus &(-Al) is
well defined when g E S t and this is the only possibility for I3 5 - A t to occur. Put
412 D. C. SORENSEN
A = max (0, -Al, &) and let

o2 = A2 - 4 2 ( ~ if) A = -Al, 8 = 0 otherwise.
We construct a solution p to problem (2.1) by the formula
where q E S1,llqll= 1, and (t) denotes pseudo-inverse [28]. Note B + A 1 must be positive
semidefinite with this choice of A. Since q T ( +~ A I ) ~= 0 when A = -Al, it is easily
checked that p is a solution to (2.9) and satisfies either condition (i) or (iii) of Lemma
2.8. Thus p solves (2.1) and IIpII = A whenever A1 S O . The solution given by (2.13)
shows that p is not unique whenever g E S1 and 4(-Al) < A due to the arbitrary choice
of sign in defining 8.
This discussion of the theoretical subtleties of solving (2.1) indicates numerical
difficulties may arise when a solution to problem (2.1) is sought. The case g E ST,
A = - A l in (2.13) will give rise to a very sensitive numerical problem. Any computa-
tional technique for solving (2.9) will introduce roundoff error. However, in this
sensitive case, small perturbations in the quantities B, g, A can lead to large perturba-
tions of the solution p due to the fact that B + A 1 will be nearly singular. Apparently
the true nature of the difficulty here is the nonuniqueness of the solution p given by
(2.13).We illustrate this point with a simple example. Let
The difficult case arises for values of 77 such that 1/(1- T ) 2 5 A2. For these values of
77 the solutions to (2.1) are of the form
1 1
PT=-(l-n, 8)
(1-77)
,
with --+ o2 = A2.
The perturbation gT = (1, E ) gives a solution

2
1 1 E
PE -- - with -----+ ------
- 1 1 ( t l + l 1 2 - A2
Clearly for any choice of sign for 8 there is a perturbation E of opposite sign such
that lip -p,ll/llpll is "large". In case 77 < 0, we must have IlpII = A to solve (2.1), and we
can be led to extremely different solutions as a result of error introduced by roundoff.
The convergence analysis to be given in 5 4 will depend heavily upon the following
technical result concerning the amount of decrease in the local quadratic model. A
geometric interpretation of the result is that, for a quadratic function, any solution p
to (2.1) produces a decrease f - $(p) that is at least as much as the decrease a search
along the steepest descent direction -g would provide.
LEMMA2.14. Let p be a solution to (2.1). Then
A proof of this result may be found in [23].

In fact, the inequality in Lemma 2.14 is obtained by Powell's "dog-leg" step [25].
This inequality is the main ingredient used to show the sequence of gradients tend to
zero for the modified Newton's method we are about to present. The reason for
solving (2.1) rather than using the dog-leg step is that second order information is
used to greater advantage. This will become evident as we present some very strong
convergence results in # 4.
A particular method for obtaining numerical solutions to (2.1) will be suggested
in # 5 . For the moment we assume that a numerical solution p to problem (2.1) can
be obtained which satisfies.
(i) ( B + AI)p = -g + Sg with B + A I positive semidefinite,
(ii)
and
(iii)
I (IpII- A1 5 E ~ Awhen A > 0,
Ilpll d (1 + e2)A when A = 0,
some fixed 0 < e l < e2 in ( 0 , l ) that are consistent with the finite precision arithmetic.
The results of Lemma 2.8 imply that such a p solves the modified problem
where ( - e2)h5 5 (1+ e2)A and = g + Sg with 11Sgll5 elllgl( when g # 0. In our
A
analysis we shall assume e l = e2 = 0. A trivial but tedious modification of the analysis
would apply to a computed step p which satisfies the above criteria. This is primarily
because the crucial inequality of Lemma 2.14 will become
1
(2.16) f - ~ ( p )z~~~~~~
min(A,M, z L ( l e l ) ~ min
11B11 - 2
- ~g~~
It is straightforward to see that the inequality of (2.16) is sufficient for purposes

of the ensuing analysis, but we wish to refrain from including such complicated
expressions at each stage of the analysis.
3. A modified Newton iteration. A well-known method for solving the uncon-
strained minimization problem is Newton's method applied to finding a zero of the
gradient of the objective function. However, this iteration is not suitable as a general
algorithm without modification. The basic iteration is
where an initial iterate xo must be specified, Vf(xk) is the gradient of f, G k = V2f(xk)

is the n x n (symmetric) Hessian matrix of mixed second partial derivatives of f. The
algorithm we shall discuss will require that f is twice differentiable at any point x in
the domain o f f and that these derivatives can be evaluated explicitly.
There are three fundamental reasons why this basic method must be modified.
First, the initial iterate may have to be very "close" to a local minimizer in order to
be assured that the iteration will converge. Second, even if the iteration converges to
a stationary value x*(Vf(x*) = 0) there is no guarantee that x* will be a local minimizer.
Third, the iterate x k + may
~ not be well defined by (3.1) if the Hessian Gk is singular
or it may not be a sensible move if Gk is indefinite. Our purpose here is to discuss
certain theoretical properties of a modification of the basic iteration (3.1). Our
approach is not a new one; however, we feel that the theoretical and numerical
properties of the proposed method should be fully treated and that is the main goal
of this discussion. The method we shall consider is called the model trust region
method. We have already mentioned the history of this approach. The main concern
here is the implementation of this type of algorithm. Therefore, this discussion is
414 D. C . SORENSEN
intended to apply to several possible implementations. Specific implementations are

presented in § 5.
Before the iteration is defined let us set out some of the properties desired of a
modified Newton iteration:
a) For a sufficiently general class of functions the iteration should be well
defined and convergent given any initial iterate xo.
b) When the iteration converges to a point x*, this point should satisfy as
many necessary conditions for a minimizer as possible.
c) The modification should not detract from the local quadratic rate of
(3.2) convergence enjoyed by Newton's method.
d) The method should be invariant under linear affine scalings of the vari-
ables. That is, if we replace f ( x ) by f(w) = f(Jw + z), where J E R n x nis
nonsingular and w, z E Rn,then applying the iteration to f with initial
guess wo satisfying xo = Jwo+ z should produce a sequence {wk}related
to the sequence {xk} by xk = Jwk + Z, where {xk)is produced by applying
the algorithm to f with initial guess xo.
The algorithm we are about to define will be shown to meet criteria a), b), c) for
all practical purposes. The last criterion d) will be discussed in § 6. To begin we
introduce a factorization of the Hessian matrix. For each k we let
B~= J ~ T G ~ J ~
be a factorization of Gk with B k = B ~ REn x nand Jknonsingular. Let p, q, be the
number of positive, negative and zero eigenvalues respectively of the symmetric matrix
Gk. The triple (p, q , ( ) is called the inertia of Gk and Sylvester's theorem [17, p. 3771
shows that Bk has the same inertia as Gk.
For each k, we put 4 k ( ~=)f(xk + Jkw). This function ~ $ ~ ( w may
) be regarded as
a locally scaled objective function. The first three terms of the Taylor series of dk
about w = 0 will define a local quadratic model
4 k ( ~=) fk f ~ w ~ B ~ w ,
where g; = vf(xklTJkand fk = f(xk)= 4k(0).Along with the local quadratic model we
shall maintain a control parameter Ak > 0 which defines a local region of trust {w : llwjj 5
Ak) where the model is considered valid. This parameter Ak will be revised during the
iteration according to specific rules which are designed to force convergence of the
iterates {xk}.
We are potentially considering any symmetric factorization of the matrix Gk, but
certain requirements should be kept in mind. For example, gf = vf(xklTJkshould be
easily computed either explicitly or by solving vf(xklT= gfJk1. Also, it will be an
advantage if the eigensystem of Bk is relatively inexpensive to compute or if the
smallest eigenvalue and corresponding eigenvector(s) are easy to obtain. The reason
for this is that the solution to problem (2.1) will play an important role in this iteration
and as we have seen the eigensystem information may be required. This is especially
true at points xk where Gk is indefinite or singular.
Now we are ready to define the iteration.
ALGORITHM3.3.
(1) Let k = 1, and let 0 < q1 < q 2< 1, 0 < y1 < 1< y2 be prespecified constants;
(2) Let xl E Rn,A1 > 0 be given;
(3) If "convergence" then STOP;
(4) Evaluate fk:= f(xk); Gk :=v2f (xk);
Factor B k:=J ~ T G ~ J ~; Evaluate gk:=J C V ~( xk);
(5) Compute a solution wk to the problem min {t+hkjw):l/~vIl%

Ak};
Comment: +hk(w)=fk +g;fw +;wTgkw;
(6) Put ared:= (bk (0) - 4 k (wk); pred:=#k (0) - Jlk(wk);
Comment: 4k(w) = f(xk + J ~ w ) ;
ared
(7) If -< q1 then begin Ak:=ylAk; goto 5; end;

pred
ared
(8) If q ,S -then
pred
ared
2) if -> 772 then Ak:=y2hk;
pred
3) hk+,:=hk; k : = k + I ;
(9) Go to 3;
end.
There are ways to update the value of A at step 7 and step 8.2 which make better
use of the information available at the current iterate xk. For example, the cubic
polynomial that fits @?(a)= &(awk) by interpolating @(O), @?'(O),Q?"(O)and @(l) will
have a minimum & in (0, I ) when the test at step 7 is passed. The region is contracted
by setting yl = G if 4 is not "too close" to 0 or 1. Details of this type of idea appear
in [4], [8], [12], [18]. Similar ideas may be applied at (8.2) to obtain an expansion
factor y2 2 1 that depends upon available information. Other variations involving step
7 include accepting the predicted minimizer if 0 < qo5 aredlpred 5 q l but reducing
the trust region. The analysis we shall perform on Algorithm 3.3 can be adapted to
cover these possibilities in a fairly straightforward way. However, the gain in generality
will result in a substantial loss in clarity of exposition in the analysis so we shall analyze
the simple choices set forth in Algorithm 3.3.
Finally, it should be pointed out that this iteration is well defined because step
7 will produce a sufficiently small Ak to obtain ared/pred> q 1 after a finite number
of steps since the quadratic function t,!tk(w)is defined by the first three terms of the
Taylor series for <bk(w).Our statement of the strategy is slightly different than the
usual description in that xktl is always different from xk. By doing this we avoid having
to distinguish between "successful" and "unsuccessful" iterates in the analysis. With
this exception, the statement of the algorithm and the ensuing analysis are in the spirit
of the paper presented by Powell [26]. Numerical schemes for producing the con-
strained quadratic minimization at step 5 will be presented in § 5 .
4. Convergence of the modified Newton iteration. In this section we shall estab-
lish that some very strong convergence properties are possessed by Algorithm 3.3.
The first result is a slight modification of Powell's result in [26]. Our proof is much
simpler due to the fact that here second order information is explicitly available.
Throughout the analysis the notation for a level set o f f is
THEOREM4.1. Let f : Rn+ R be bounded below and let G(x) = v2f (x) be contirtuous
and satisfy IlG(x)lfa /3 fbr all x E ~ ' ( x o ) .Let {xk]c Rn be the sequence produced by
Algorithm 3.3 applied to f giuen starting ualue XO. Assume llJkll, 1 1 ~ 11~ s' c,k = 0 , 1 , 2 , .
f i r some rr 2 1. Then there is no constant E > 0 such that IlVf (xk)l(a E for all k.
Proof. Assume there is an E > 0 such that l(Df(xk)jl2 E for all k. Since gk = ~ ; f(xk), ~ f
416 D. C . SORENSEN
we have
From step 8 of Algorithm 3.3 and from Lemma 2.14 we have
where / / B ~ / ~ = I I J ~ T G ~ JSince ~ ~ I ~ fP ~is' . bounded below and f k + l < f k , k=

0, 1,2, . . . , we have f k - f k + l + O. Since ~ ~ g k ~ ~2/ yl /l p~ ~k 2/ it/, follows that ak + 0, from
(4.2).However,
is obtained from the inequality

llgkll = ll(Bk + ~ ' ~ ' ~ ) 5
w Ilwkl\(llBkll+
kl\
where A ' ~ 'is the multiplier associated with the solution to Step 5 of Algorithm 3.3.
Thus inequality (4.3) shows + a . Now, from Taylor's theorem and Lemma 2.4
it readily follows (for k sufficiently large) that
ared ( k ) Iwf ( B k ( @ ) - B k ) (-@)d@i+)kI

l
(4.4)
la- w : ( ~+ A~ ( ~ ) I )+wA~( ~ )Fwk
w
1
5- sup lIBk(@)-Bkll,
050-1
where B ~ ( o=)J F [ G ( x ~ + B S ~ ) ] J ~with sk = x k +-~x k . Since A ' ~ ' + + a , we obtain

ared (k)/pred ( k ) + 1, and thus, the test at step 8 of Algorithm 3.3 is passed the first
time through for all k sufficiently large. This implies the existence of a K >O such
that Ak 2 AK for all k Z K. Therefore, the assumption (lVf(xk)(l Z E for all k has led to
a contradiction. Cl
We remark that the continuity of G ( x ) is only used to obtain the numerator on
the right-hand side of (4.4),and that the theorem can also be established without this
assumption. See Powell [26]for an example.
This result has shown that at least one subsequence of { x k } converges to a
critical point of f. The next result, which is due to Thomas [29], will establish the
much stronger fact that every accumulation point of the sequence { x k }is a critical
point of f.
THEOREM4.5. Let the hypotheses of Theorem 4.1 hold. Then limk+mllVf(xk)ll= 0.
Proof. Suppose there is a subsequence {xk,}c { x k }such that IlVf (xk,)llr e > 0, for
all j = 1 , 2 , . . . . As in Theorem 4.1, this implies llgk,l12y>O. Moreover, due to
Theorem 4.1, we may select an integer I, corresponding to each j such that
(4.6)
i Y kJ 5 i
I, = max 1 E [ k , kj+,):IlgiI? g, =l .
I
From inequality (4.2)we obtain that
will hold for all k j 5 1 5 lj,j = 1 , 2 , . . . . From (4.7) it follows that
fki -f1,+1=
'i
Z fi -
1 =k,
fl-12
4u
min ( 1 A,, 2PuY
1,
l=ki
---7).
From inequality (4.8) it follows that

IIxk, -x~,+lIl+0 as I + a,
because
and the right-hand side of (4.9) is forced to zero due to (4.8). The uniform bound on
G ( . )implies the uniform continuity of V f ( x )on ~ ( x o and
) , it follows that
for all j sufficiently large. Therefore,

5 flIIVf(xki)ll5 ~ ( ( (xk,)
((gk,(( ( v f-Vf(~l,+l)llf (IVf(~l,+l)(()
for all j sufficiently large. The assumption that (IVf(xk,)((Z e > 0 has led to a contradiction
and we must conclude that limk+mllVf(xk)ll= 0. 0
This result has established that every limit point of the sequence { k k )satisfies the
first order necessary conditions for a minimum. Now we shall establish results which
give added justification to the use of second order information when it is available.
Several authors [13],[19],[22],[23]have proposed modified Newton methods which
guarantee convergence to a critical point x* with the additional feature that the
Hessian G ( x * ) be positive semidefinite. Thus second order necessary conditions for
a minimum are satisfied by x*. The following series of results show that Algorithm
3.3 shares this property.
LEMMA4.10. Let the hypotheses of Theorem 4.1 be satisfied. If G ( x ) is uniformly
continuous on 2 ( x k o )then
, there is no positive number A > 0 such that A ' ~ 2
' A fork 2 ko.
Proof. If A ' ~ '2 A, then Ilwk((= Ak, due to Lemma 2.4. We conclude from inequality
(4.4) that
ared ( k ) 1
1 -
pred ( k )
I / 5 - sup ( ( ~ ~-(~0 ~) ( 1 ,
A oceci
where B k ( 0 )= J : ( G ~ ( x+~Osk))Jk.Since pred ( k )E A ' k ' ~ 2 k 2A A:, it follows that Ak + 0
because pred ( k )+ 0 . Now the boundedness of (IJk((, (IJkl(1, together with the uniform
continuity of G ( x )on 2 ( x k o ) g, ives
We must conclude, as in the proof of Theorem 4.1, that there exists a K>O such that
Ak 2 AK for some K > 0 . This contradiction establishes the result. 0
Since 5 h i k ) ,which is the smallest eigenvalue of B k , the next theorem follows
easily from the boundedness of llJkll,( ( J k (1l together with Lemma 4.10.
418 D. C . SORENSEN
THEOREM4.11 Let the hypotheses of Lemma 4.10 be satisfied. If the sequence

{xk}is convergent to a lirnit x* say, then Vf(x*) = O and G(x*) is positive semidefinite.
At this point we should remark that failure of this iteration to converge will
require an extremely pathological situation. A moment's reflection will convince the
reader that every limit point of the sequence {xk}must be a critical point of f , and f
must have the same value at each of these critical points. Moreover, at least one of
these critical points has a positive semidefinite Hessian.
The next result shows that every limit point of the sequence {xk) satisfies the
second order necessary optimality conditions under the stronger assumption of a
bounded level set containing finitely many critical points.
THEOREM4.12. Let the hypothesis of Theorem 4.1 hold. Moreover, assume Lk'(x0)
is bounded and that there are only finitely many critical points o f f in Lk'(x0). If x*
is any limit point of the sequence {xk} then Vf(x*) = O and G(x*) is positive semi-
definite.
Proof. Let x* be a limit point of {xk}with xk,+ x * a convergent subsequence.
Suppose that a subsequence of {~k,rl}converges to z * , with s* = z* -x* # 0. Then
the corresponding subsequence of A ' k ~ ' converges to zero since for every k
"71
ared ( k ) Z q l pred (k) = -[W;(B~
2
+ A ' ~ ) I ) w+~ ' ~ ) j ( w ~ ( l ~ ] .
Let J*~G(x*)J*= B". This inequality shows that the corresponding subsequence of
Bk + A ( ~ ~converges
)I to B*. Thus B * is positive semidefinite and singular since
w i T ~ * w *= o , where J * w * = s * .
Now, let xT = x* and x:, 2 5 i 5 m be the finitely many limit points of the sequence
{xk}.Since there are only a finite number of points x? and since the sequence {xk} is
bounded, it is possible to partition the sequence {xk}into the disjoint union of sequences
A i = { ~ ~ ( ~ , , )i }S 1m5 such that
(1) xk(:,l)+x:;
(2) k ( i , j + l ) > k ( i , j ) , f o r l s i s m ;
(3) {xk}= U 1 A: ;
(4) A, nA , = 0 for i # 1.
Define X, = {j: xk(l,li+lE A:}, 2 5 i 5 m . If some XI is an infinite set then G(x*) is

positive semidefinite and singular due to the first part of this proof. If, on the other
hand, every XI is finite then there is an N such that j > N implies ~ k ( ~ , , )€+A~ 1 .
Therefore, x k [ ~ , , +=~xk(l,li+l.
) A simple induction gives x k ( ~ , ~=
+ xl )k ( l , l ) c i for all posi-
tive integers I. We must conclude that, in fact, xk +x*, and that G(x*) is positive
semidefinite from Lemma 4.10. 0
COROLLARY 4 .13. Let the hypothesis of Theorem 4.12 be satisfied. If x* is a limit
point of {xk} and G(x*) is positive definite then the entire sequence converges to x*.
Proof. The proof of Theorem 4.12 shows that if there is more than one limit
point of the sequence {xk}then Gix*) must be singular. 0
It would be more desirable to obtain a result that would ensure convergence of
the sequence {xk) without assuming a subsequence converges to a strong local
minimum. However, just extending this argument to the case of an isolated local
minimum with singular Hessian would be difficult since one can no longer rely on the
Newton step. Our final result will show, in conjunction with Corollary 4.13, that if
there is a subsequence which converges to a strong local minimum then the entire
sequence converges, and ultimately, the rate of convergence is quadratic.
THEOREM4.14. Let the hypotheses of Theorem 4.1 be satisfied. Suppose further

that xk + x * with G ( X *) positive definite and
(4.15) ( ( G ( x ) - G ( x * ) ( ( s L ( (-xx * ( (
for all x in some neighborhood of x*. Then there is a constant X > 0 such that
for all k sufficiently large.

Proof. Since xk + X * and G ( x * )is positive definite, it follows from continuity that
there are positive constants pl 5 I ( G ( x ~ ) ~ ( ( ~ P2 for k sufficiently large. Thus Jpred( k ) l Z
skT~(xk)sk2 I I s ~ I I ~ I I I G ( x ~ ) ~ ~ I I 2 llsk I12/p2, and
1
ared(k)-pred ( k ) l (sklI2
~ IlGk(0)- Gkll(1- 0 ) do,
0
where G k ( 0 )= G ( x k+ Osk)with sk = x ~ - x+k . ~It follows easily from (4.15) that

ared ( k )
11-0 ask-oo
and we must conclude that there is some K>O such that Ak 2 AK for all k 2 K .
< ahK,so the Newton
Since xk + X * with V f ( x * )= 0 , it follows that I I G ( x k ) - ' ~(xk)(l
f
step is accepted for all k sufficiently large. Hence the tail of the sequence { x k }is the
unmodified Newton iteration which is quadratically convergent to x* since G ( x * ) is
positive definite [24, p. 4211. U
While these results hold little computational meaning in the presence of roundoff
error, it is satisfying to have established such strong results about the iteration. This
is especially true since the method has such an intuitive appeal. Our aim in this section
has been to establish these theoretical results in a framework that is general enough
to encompass many possible implementations. We shall consider some of these
implementations in the next section.
5. Implementation. Numerical performance of the algorithm described in 9 3
and analyzed in 9 4 is obviously going to depend upon a careful implementation of
the locally constrained minimization of the quadratic model. In $ 2 we pointed out
several theoretical facts that indicate great care should be exercised in this computation.
In this section we shall put forth several possible implementations. Each of these will
have certain advantages and disadvantages depending upon the nature of the optimiz-
ation problem at hand. The convergence theory provided in $ 4 was purposely made
sufficiently general to apply to all of the alternative implementations to be presented
here.
Our main concern is to provide an efficient and stable method for the solution
of problem (2.1).To this end we consider factorizations
of the symmetric n x n matrix G . We are assuming that llJ((,IIJP'((5a, where a > 1 is

some fixed number that is independent of G . Recall that the matrix B is also symmetric
and must have the same inertia as G . Some specific examples are: (a) J orthogonal
and B diagonal; (b) J orthogonal and B tridiagonal; (c) J T = L-'P, where L is unit
lower triangular, P is a permutation matrix and B is either tridiagonal [ I ] or block
diagonal with I x 1 or 2 x 2 diagonal blocks [ 2 ] .We shall also consider the case when
J is just a diagonal nonsingular matrix.
420 D. C. SORENSEN
If the eigensystem of B is easily obtained (i.e., in case (a) or case (c) when B is
block diagonal) then we are able to solve problem (2.1) directly by solving the nonlinear
equation (2.12) for the largest root and then constructing a solution to (2.1) using
formula (2.13). This method of solution has the particular advantage that the case
when g E St is explicitly revealed. (Recall S1 = {q E R": Bq = Alq), where A l is the
smallest eigenvalue of B.)
A disadvantage of using factorization (a) is that it is relatively expensive to
compute. One of the reasons for introducing generality into the model trust region
calculation was to allow use of the Bunch-Parlett factorization [2]. This factorization
is very efficient due to the fact that symmetry is exploited. The matrix B for this
factorization has an eigensystem that is easily computed. Moreover, the matrices J
satisfy the criteria (IJ(1,( l ~ - ' l ( Z
a, so in theory all of the results of § 4 apply. There may
be some cause for concern regarding the effect of the transformation J on the descent
direction, because the triangular coordinate system may be very skewed even though
the matrix J is well conditioned.
Nevertheless, our main concern with either of these factorizations is the efficient
and reliable solution to an equation of the form
for the largest root A. The left-hand side of (5.1) is precisely the form of 4(a)=
l\(B+ a ~ ) ~ ' gin
\ I (2.12) regardless of whether or not B is diagonal. Several authors
[15], [21], [27] discuss the solution of equations that closely resemble (5.1). The key
observation is that Newton's method which is based on a local linear approximation
to 4(a) is not likely to be the best method for solving (5.1) because the rational
structure of d 2 ( a ) is ignored. Instead, an iteration for solving (5.1) can be derived
based upon a local rational approximation to 4 . The iteration is obtained by requiring
& ( a )= y/(a + @ )to satisfy
where we regard a as the current approximation to the root A. This approximation

is then improved by solving for an & that satisfies & ( & ) = A .The resulting iteration is
If the form of 4 ( a ) is known explicitly then it is straightforward to safeguard (5.1).

The local rate of convergence of this iteration is quadratic but the most important
feature of (5.1) is that usually the number of iterations required to produce an
acceptable approximation to A is very small because the iteration is based upon the
rational structure of q52.
Iteration (5.2) can be implemented without explicit knowledge of the eigensystem
of B. This important observation which is due to Hebden [15] makes it possible to
implement (5.2) merely by solving linear systems with B + a1 as the coefficient matrix.
This is easy to see since & ( a )= llp,jl, and 4 ' ( a ) = - ( l / d ( a ) ) p : ( ~ + a ~ ) ~ l p where
,,
( B + a I ) p = -g. Hebden [15] suggests a way to obtain cu > -Al during the process of
attempting to compute the Cholesky factorization of B + aI. This is discussed in more
detail by Gay in [ l o ] where the difficult case g E S: is addressed. Within this context
we could allow J to be taken as a nonsingular diagonal matrix for scaling purposes.
MorC has used this idea in his adaptation of Hebden's work to the nonlinear least
TRUST REGION MODIFICATION OF NEWTON'S METHOD 42 1
squares problem [21]. The result of MorC's work is a very elegant robust algorithm
for nonlinear least squares. In [21] careful attention is paid to safeguarding the step
calculation. The safeguarding task is somewhat more difficult in the present setting
due to the fact that B may have negative eigenvalues. The essential difficulty seems
to stem from the fact that without explicit knowledge of the eigensystem it is difficult
to detect the case g E S:. Moreover, it seems to be necessary to have an estimate of
the smallest eigenvalue and a corresponding eigenvector in order to obtain a solution
to (2.1) ih case g E S: (see (2.13)). This was recognized by Hebden but he did not
provide a suitable solution. Gay [lo] suggests obtaining an eigenvector using inverse
iteration if the case g E S: is detected because a factorization of the (nearly) singular
matrix B + A I will be available.
Here we suggest an alternative to the methods which have been proposed
previously. In the following we are considering J to be diaginal nonsingular matrix.
Let us return to the derivation of iteration (5.2). Another way to obtain this iteration
is to apply Newton's method to the problem.
From this observation we can see that iteration (5.2) is closely related to Newton's
method applied to the problem
where we use the notation r ( p , a ) = B,p + g with B, = B + aI. There is a serious

disadvantage to this iteration when g E S: or nearly so. This is because the Jacobian
of (5.4) is
and this matrix is singular at a solution A, p, of (2.1) in the sensitive case g~ s:,
IIB:gll< A, where A = - A l .
Of course, this situation impairs the local rate of convergence. Moreover, as the
iteration converges to such a solution the method requires solving linear systems which
have increasingly ill-conditioned coefficient matrices.
As an alternative, we suggest removing the explicit dependence of 4(a) on the
variable a in (5.4). Instead of (5.4) we shall apply Newton's method to solve
Due to Lemma 2.8 a solution a = A , p = p , to (5.6) provides a solution to problem

(2.1) whenever BA is positive (semi) definite and A 2 0. The Jacobian of (5.6) is
422 D. C. SORENSEN
and this matrix is nonsingular at a solution to (2.1) in the cases that are most likely
to occur. This is important since it follows that Newton's method applied to (5.6) will
usually enjoy a quadratic rate of convergence. A precise statement of when (5.7) is
nonsingular at a solution is given in the following lemma.
LEMMA5.7. Let a = A 2 0, p = pA # 0 be a solution to problem (5.6) with B Apositive
semidefinite. If BA is positive definite or if IIBig((= A and dim (S1)= 1 then the Jacobian
matrix (5.7) is nonsingular.
Proof. Let p = pA, a = A. It is sufficient to show
is nonsingular. Suppose that
Then
(i) B A z + p5= 0 and (ii) pTz = 0.
These two equations imply zTB,z = 0. Therefore, either z = 0 or BA is singular, and

z E S1. Both of these possibilities imply 5 = 0 since p # 0 and B,z = 0. Thus, when BA
is positive definite the only solution to (5.9) is z = 0, 5 = 0 so (5.8) is nonsingular. If,
on the other hand, B A is singular and ( I B ; ~ J ( < A then p = -Big + Oq with q E S1, llq((=1,
and 0 # 0. Since dim (S1)= 1 it follows that z = yq and thus 0 = z T p = O y which shows
y = 0. Again we conclude (5.9) only has the trivial solution so (5.8) is nonsingular. U
The basic iteration (without safeguards) for solving (5.6) will be given now. The
details of various suggested implementations will follow.
ALGORITHM 5.10.
(1) Obtain an initial guess po and a. such that B o = B,, is positive definite;
(2) fork=0,1,2,...
P lpk
1) rk=Bkpk+g; pk=-(II~kll=A);
A
2) Solve
We must address several computational questions concerning this iteration. These

include what initial guess should be used, how to solve the linear systems at step 2.2,
how to safeguard the basic iteration and finally how to stop the iteration.
First of all we shall discuss some methods for solving the linear system at step
2.2. For matrices B that are of moderate size and those which have no particular
structure we recommend the following. Compute an orthogonal matrix Q through a
product of Householder transformations such that
Q O B o p Q ~ O
0 I)(,; R)( 0 1) = T(: T;n),
where T is tridiagonal and e: = (0, . . . , 0 , 1). The details of this factorization are
given in Algorithm 1.2 of Stewart's book [28]. Initially this factorization is more
expensive than some alternatives (such as the Bunch-Kaufman [3] factorization).
However, it presents several advantages as we shall see. First of all, since T = Q B Q ~

has the same eigenvalues as B we can easily compute a Sturm-sequence for T to tell
very reliably whether or not T is positive definite [30]. Moreover, since good upper
and lower bounds for the smallest eigenvalues of T are available, a good safeguarding
scheme can be obtained. After applying transformation (5.11) to the linear system at
step 2.2 of Algorithm 5.10, a solution can be obtained using ordinary Gaussian
elimination with partial pivoting. It is preferable to ignore symmetry in this case for
the same reason it is preferable in the case of inverse iteration for the computation
of an eigenvector. See Wilkinson [30] for more detail. A more important observation
to make here is that the iteration in Algorithm 5.10 is invariant under transformation
(5.11). Once the correction S p = Q16p is obtained we have the updated matrix
The form of the matrix on the right-hand side of (5.12) is
::jl
X X
X X X
X X X X
X X X x 9
X X X
x x
x x x x x x o
where the x's denote nonzero elements. Gaussian elimination with partial pivoting
preserves this structure if the pivots are taken from the tridiagonal part until the very
last elimination step. The result of this strategy applied to (5.13) will be of the form
m x
+
x x
where m's denote multipliers which have overwritten the original matrix elements,
and the +'s denote possible fill-in due to pivoting.
With this scheme only one expensive factorization is required. The rest of the
iteration is performed under the transformation (5.11) and only after convergence to
a vector j? is obtained do we transform back to get p = QTp^as a solution to problem
(2.1).
Since the factorization given in (5.11) is roughly four times as expensive as a
Cholesky factorization we might wish to consider the following alternative scheme.
The system at step 2.2 of Algorithm 5.10 is equivalent (via symmetric permutation)
to one of the form
424 D. C. SORENSEN
Use a single Householder transformation Q1 to obtain
(5.16)
where
( i) = QIBQ: and ~=llpll

with v E Rnpl, 0 E R, & E R ( ~ - ' ) ~ (. ~The
- ~ )matrix on the right of (5.16) has the exact
factorization
The eigenvalues of l?separate the eigenvalues of B, so & is positive definite when B

is. Moreover, & is better conditioned than B whenever the separation is strict. (For
a proof of separation see Wilkinson [30, pp. 95-1041.) A solution to (5.15) is now
possible using a Cholesky factorization of i together with factorization (5.17). The
purpose of arranging the calculation this way is to avoid "pivoting" on the matrix B
which is the essential result of factoring forward at step 2.2 of Algorithm 5.10.
This second scheme is much better suited to the problem of obtaining an initial
guess ao, PO at step 1 of Algorithm 5.10. If B is positive definite then we want to
compute p = - B - ' ~ and check to see if ( ( p l ( A.
5 If ((p((>
A then take po = (Alllpll)p.
Thus, it will be advantageous to attempt the computation of the Cholesky factorization
of B. If B is not positive definite then we should compute a. so that B,, is positive
definite and then take
as an initial guess. Various schemes for computing a. are possible. See Gay [lo], for
example.
Safeguarding this iteration is possible. At present several schemes are being
considered but none of these are elegant. Therefore we shall postpone discussion of
safeguarding at this time.
The decision to stop the iteration should be based upon the following tests:
Require pk+lto satisfy
(a) Bktl :=B + ( Y ~positive
+ ~ Isemidefinite;
where 6a = ( Y ~ + ~ 8p - ( =YP~~, + ~ - P ~ ;
(c) 1 I ( ~ k + i l l - A1 5 E ~ A .
Note (from step 2.2 of Algorithm 5.10) that (B + a k + ' I ) p k + =
' -g+SaSp. Therefore,
if g # 0 then conditions (a) and (b) together with Lemma 2.8 imply that pk+, solves
m i n { f + g " T ~ + ~( (~~ T
( 1~5 6~1:,
A 6 5 (1+ E ~ ) Aand g" = g + 6g with ((6gl(5
where (1 - E ~ ) 5 ~lllglland I(pk+lll= A. On the
other hand, if g = 0 then pk+, will be an approximate eigenvector for B which satisfies
with on approximation to - A l . Thus,

( Y ~ + ~ > 0 should be taken quite small and
> 0 moderately small.
When these stopping rules are in effect the remarks at the end of § 2 will apply.
Therefore, the analysis of 0 4 will apply to the modified Newton iteration when the
step is computed in the way described here.
6. Conclusions. The main purpose of this work has been to discuss the theory
of the model trust region modification of Newton's method with an aim towards
understanding the best way to implement it. Because of this goal we introduced
sufficient generality into the analysis so that it would apply to many possible
implementations based upon various factorizations of the Hessian matrix. Results
similar to the second order properties given in § 4 have been stated without proof by
Gay in [ l l ] . Very recently Fletcher [7, pp. 78-80] gave proof of the existence of an
accumulation point of { x k ) which satisfies first and second order necessary conditions.
He also showed that if this accumulation point satisfied second order sufficiency
conditions then the sequence { x k }will converge. The results given here are stronger,
and we have also introduced the more general analysis which allows for many variations
on a practical implementation of the method. It is of particular interest to give the
strongest possible results because no proof has been given that ensures convergence
of the entire sequence (unless we make the assumption of a nonsingular Hessian at
any critical point). This is despite the fact that the situation would have to be extremely
pathological even in theory for convergence not to occur.
The basic ideas for possible implementations we have set forth in § 5 are new
alternatives which have been directed towards overcoming the theoretical difficulties
of the locally constrained quadratic minimization discussed in § 2. In particular, we
considered using the Bunch-Parlett factorization, and we also considered basing our
method of solution on a more properly posed problem. It will be interesting to examine
the behavior of these implementations in practice.
Finally, we have not overcome the problem of invariance under linear affine
scalings of the variables. There is sufficient generality in the method to introduce
uniformly bounded diagonal scalings of the variables. Ways to choose these scalings
has been discussed by Fletcher [5], Gay [ll] and MorC [21]. It is most appropriate
to note here that the reason is that our method of proof of convergence is essentially
based upon not doing worse than steepest descent at any step and this introduces a
term that makes calculation of the step scale dependent. Nevertheless, we expect good
performance on practical problems especially in the case that the variables can be
well scaled.
7. Acknowledgment. Much of this work was done originally in 1977 at the
Applied Mathematics Division of Argonne National Laboratory. Since that time it
has undergone several revisions due to the encouragement of Jorge MorC. It is a
pleasure to take this opportunity to thank him for the numerous discussions we have
had over this work and to acknowledge what a valuable contribution his advice has
been. I would also like to thank Roger Fletcher for several penetrating discussions
concerning the material in 8 5. In particular, the factorization (5.17) came directly
from one of these discussions. Finally I would like to thank the referee for a very
prompt careful reading of the manuscript.
D. C . SORENSEN
REFERENCES
[I] J. 0. AASEN,O n the reduction of a symmetric matrix to tridiagonal form, BIT 11 (1971), pp. 233-242.
[2] J. R. BUNCH AND B. N. PARLETT, Direct methods for solving symmetric indefinite systems of linear
equations, this Journal, 8 (1971), pp. 639-655.
[3] J. R. BUNCHAND L. KAUFMAN,Some stable methods for calculating inertia and solving symmetric
linear equations, Math. Comp. 31 (1977), pp. 163-179.
[4] A. V. FIACCOAND G. P. MCCORMICK,Nonlinear Programming: Sequential Unconstrained Minimiz-
ation Techniques, John Wiley, New York, 1968.
[5] R. FLETCHER,A modified Marquardt subroutine for nonlinear least squares, Atomic Energy Research
Establishment report R6799, Harwell, England, 1971.
[6] R. FLETCHERAND T. L. FREEMAN,A modified Newton method for minimization, J. Optim. Theory
Appl., 23 (1977), pp. 357-372.
[7] R. FLETCHER,Practical Methods of Optimization, vol. 1, John Wiley, New York;1980.
[8] G. E. FORSYTHEAND G. H. GOLUB, O n the stationary values of a second-degree polynomial on the
unit sphere, J. Soc. Indust. Appl. Math., 13 (1965), pp. 1050-1068.
[9] W. GANDER,O n the linear least squares problem with a quadratic constraint, Dept. Comp. Sci. Rept.,
STAN-CS-78-697, Stanford University, Stanford, CA, 1978.
[ l o ] D. M. GAY, Computing optimal locally constrained steps, MRC Rept. #2013, Mathematics Research
Center, Univ. of Wisconsin, Madison, October 1979.
[Ill - , O n solving robust and generalized linear regressionsproblems, MRC Rept. # 2000, Mathematics
Research Center, Univ. of Wisconsin, Madison, September 1979.
[12] P. E. GILL AND W. MURRAY, Newton type methods for unconstrained and linearly constrained
optimization, Math. Prog., 7 (1974), pp. 311-350.
[13] D. GOLDFARB,Curvilinear path steplength algorithms for minimization which use directions of negative
curvature, Math. Prog., 18 (1980), pp. 31-40.
[14] S. M. GOLDFELD,R. E. QUANDTAND H. F. TROTTER, Maximization by quadratic hill-climbing,
Econometrica, 34 (1966), pp. 541-551.
[15] M. D. HEBDEN,A n algorithm for minimization usingexactsecond derivatices, Atomic Energy Research
Establishment report TP5 15, Harwell, England, 1973.
[16] S. KANIELA N D A. DAX, A modified Newton's method for unconsfrained minimization, this Journal,
16 (1979), pp. 324-331.
[17] K. LEVENBERG,A method for the solution of certain nonlinear problems in least squares, Quart. Appl.
Math., 2 (19441, pp. 164-168.
[18] D. W. MARQUARDT,A n algorithm for least squares estimation of nonlinear parameters, SIAM J.
Appl. Math., 11 (1963), pp. 431-441.
[19] G. MCCORMICK,A modification of Armijo's step-size rule for negative curvature, Math. Prog., 1 3
(1977), pp. 111-115.
[20] L. MIRSKY,A n Infroduction to Linear Algebra, Oxford University Press (Clarendon), London and
New York, 1955.
[21] J. J . MORE, The Levenberg-Marquardt algorithm: implementation and theory, Lecture Notes in
Mathematics 630, G. A. Watson, ed., Springer-Verlag, Berlin-Heidelberg-New York, 1978, pp.
105-116.
[22] J. J. MORE AND D. C. SORENSEN,O n the use of directions of negative curnature in a modified Newton
method, Math. Prog. 16 (1979), pp. 1-20.
[23] H. MUKAI AND E. POLAK,A second order method for unconstrained optirnization, J. Optim. Theory
Appl., 26 (1978), pp. 501-513.
[24] J. M. ORTEGA AND W. C. RHEINBOLDT,Iteratice Solution of Nonlinear Equations in Seceral
Variables, Academic Press, New York, 1970.
[25] M. J. D. POWELL, A hybrid method for nonlinear equations, in Numerical Methods for Nonlinear
Algebraic Equations, P. Rabinowitz, ed., Gordon and Breach, London, 1970, pp. 87-114.
[261 - , Concergence properties of a class of minimization algorithms, in Nonlinear Programming 2,
0. L. Mangasarian, R. R. Meyer and S. M. Robinson, eds., Academic Press, New York, 1975.
[27] C. H. REINSCH,Smoothing by spline functions. II, Numer. Math. 16 (1971), pp. 451-454.
[28] G. W. STEWART,Introduction to Matrix Computations, Academic Press, New York, 1973.
[29] S. W. THOMAS,Sequential estimation techniques for quasi-Newton algorithms, Ph.D. thesis, Department
of Computer Science, Cornell University, Report TR75-227, January 1975.
[30] J. H. WILKINSON,The Algebraic Eigenvalue Problem, Oxford University Press (Clarendon), London
and New York, 1965.
[31] J.-P. VIAI. AND I. ZANG, Unconstrained optimization by approximation of the gradient path, Math.
Oper. Res., 2 (1977), pp. 253-265.
http://www.jstor.org
LINKED CITATIONS
- Page 1 of 1 -
You have printed the following article:

Newton's Method with a Model Trust Region Modification
D. C. Sorensen
Stable URL:
http://links.jstor.org/sici?sici=0036-1429%28198204%2919%3A2%3C409%3ANMWAMT%3E2.0.CO%3B2-F
This article references the following linked citations. If you are trying to access articles from an
off-campus location, you may be required to first logon via your library web site to access JSTOR. Please
visit your library's website or contact a librarian to learn about options for remote access to JSTOR.
References
2
Direct Methods for Solving Symmetric Indefinite Systems of Linear Equations
J. R. Bunch; B. N. Parlett
SIAM Journal on Numerical Analysis, Vol. 8, No. 4. (Dec., 1971), pp. 639-655.
Stable URL:
http://links.jstor.org/sici?sici=0036-1429%28197112%298%3A4%3C639%3ADMFSSI%3E2.0.CO%3B2-9
3
Some Stable Methods for Calculating Inertia and Solving Symmetric Linear Systems
James R. Bunch; Linda Kaufman
Mathematics of Computation, Vol. 31, No. 137. (Jan., 1977), pp. 163-179.
Stable URL:
http://links.jstor.org/sici?sici=0025-5718%28197701%2931%3A137%3C163%3ASSMFCI%3E2.0.CO%3B2-X
14
Maximization by Quadratic Hill-Climbing
Stephen M. Goldfeld; Richard E. Quandt; Hale F. Trotter
Econometrica, Vol. 34, No. 3. (Jul., 1966), pp. 541-551.
Stable URL:
http://links.jstor.org/sici?sici=0012-9682%28196607%2934%3A3%3C541%3AMBQH%3E2.0.CO%3B2-K
16
A Modified Newton's Method for Unconstrained Minimization
Shmuel Kaniel; Achiya Dax
Stable URL:
http://links.jstor.org/sici?sici=0036-1429%28197904%2916%3A2%3C324%3AAMNMFU%3E2.0.CO%3B2-P
NOTE: The reference numbering from the original has been maintained in this citation list.

D. C. Sorensen: SIAM Journal On Numerical Analysis, Vol. 19, No. 2. (Apr., 1982), Pp. 409-426

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

D. C. Sorensen: SIAM Journal On Numerical Analysis, Vol. 19, No. 2. (Apr., 1982), Pp. 409-426

Uploaded by

Copyright:

Available Formats

Newton's Method with a Model Trust Region Modification

Vol 19, S o . 2, Aprli 1982 0036-14:9'82;1902-0014 $01 00:O

NEWTON'S METHOD WITH A MODEL

TRUST REGION MODIFICATION*

1. Introduction. The problem of minimizing a real-valued function f of several

with A 2 0 , A ( wT~ - A*) = 0 and B + A 1 positiue semidefinite.

(iii) If A Z 0 and llpll= A then p solves (2.1).

holds for any w E Rn. It follows that

A = max (0, -Al, &) and let

The perturbation gT = (1, E ) gives a solution

A proof of this result may be found in [23].

It is straightforward to see that the inequality of (2.16) is sufficient for purposes

where an initial iterate xo must be specified, Vf(xk) is the gradient of f, G k = V2f(xk)

intended to apply to several possible implementations. Specific implementations are

(5) Compute a solution wk to the problem min {t+hkjw):l/~vIl%

(7) If -< q1 then begin Ak:=ylAk; goto 5; end;

From step 8 of Algorithm 3.3 and from Lemma 2.14 we have

where / / B ~ / ~ = I I J ~ T G ~ JSince ~ ~ I ~ fP ~is' . bounded below and f k + l < f k , k=

is obtained from the inequality

ared ( k ) Iwf ( B k ( @ ) - B k ) (-@)d@i+)kI

where B ~ ( o=)J F [ G ( x ~ + B S ~ ) ] J ~with sk = x k +-~x k . Since A ' ~ ' + + a , we obtain

will hold for all k j 5 1 5 lj,j = 1 , 2 , . . . . From (4.7) it follows that

From inequality (4.8) it follows that

for all j sufficiently large. Therefore,

THEOREM4.11 Let the hypotheses of Lemma 4.10 be satisfied. If the sequence

Define X, = {j: xk(l,li+lE A:}, 2 5 i 5 m . If some XI is an infinite set then G(x*) is

THEOREM4.14. Let the hypotheses of Theorem 4.1 be satisfied. Suppose further

for all k sufficiently large.

where G k ( 0 )= G ( x k+ Osk)with sk = x ~ - x+k . ~It follows easily from (4.15) that

of the symmetric n x n matrix G . We are assuming that llJ((,IIJP'((5a, where a > 1 is

where we regard a as the current approximation to the root A. This approximation

If the form of 4 ( a ) is known explicitly then it is straightforward to safeguard (5.1).

where we use the notation r ( p , a ) = B,p + g with B, = B + aI. There is a serious

Due to Lemma 2.8 a solution a = A , p = p , to (5.6) provides a solution to problem

is nonsingular. Suppose that

These two equations imply zTB,z = 0. Therefore, either z = 0 or BA is singular, and

We must address several computational questions concerning this iteration. These

However, it presents several advantages as we shall see. First of all, since T = Q B Q ~

The form of the matrix on the right-hand side of (5.12) is

Use a single Householder transformation Q1 to obtain

( i) = QIBQ: and ~=llpll

The eigenvalues of l?separate the eigenvalues of B, so & is positive definite when B

with on approximation to - A l . Thus,

You have printed the following article:

You might also like