Ad 0744313

nilwpip«pipp*pnimnmn>P«<n^nH"mRip*i*ipnin)nHiP)9 illipwPWilW"IPW!
IWBiBiillPilWi
CO
CO TOPICS !N OPTIMIZATION
^5
BY
MICHAEL OSBORNE
STAN-CS-72-279
APRIL 1972
Reproduced by
■ I ■
NATIONAL TECHNICAL
INFORMATION SERVICE
U S Deparfment of Comm«rc«
Springfield VA22I}1
COMPUTER SCIENCE DEPARTMENT

School of Humanities and Sciences
STANFORD UNIVERSITY
D D C
ill 3 1372
nins
M
--'-'■ ...-.--.... - „,.:-... .
^
•V.Kjry^T^r'rr^ iTT-'.:'--'r--7^.-y'", V7~t^r»- i ji^^,.:'-!! j«"'"" "T""'-71'7''"" S^CTTT«»nn n-wr^T- '-T-TS-T «giimi'. in J'Ûfff) V
BWMBWWWWWWI ►'wsfftffww^ff'W^^
Unclassified
SIM .intv Classification
DOCUMENT CONTROL DATA -R&D
iSrrurity rlamtirtiion of iltlu, body ol abilrmrl «iirf Mailing tmnofllon muni be «nfrad when Hi» ov»tall rapott It clmttllM)
ONiaiN«rrNC «CTIVITV (Coipont» mulhot) 1«. «EPO«T SECURITY CLASSIFICATION
Unclassified
Stanford University 26. OftOUP
I BFPORT TITLE
Topics in Optimization
4 DESCRIPTIVE NOTts (Typ* ol r»port mnd Ineluiiv» dmfi)

Technical Report -
5 AU THORISI (Firti name, middlm inlllml, lm»l name;
Michael Osborne
6 REPORT 0*TE ?a. TOTAL NO. OP PACES 7b. NO. OF REFS

April, 1972 lUl 1U8
•a. CONTRACT OR GRANT NO •a. ORlOINATOn't REPORT NUMBER(SI
ONR N0001U-67-A-0112-OOO29 NR 0^-211
b. PROJEC T NO STAN-CS-72- 279
c
NSF GJ 29988X ab. OTHER REPORT NO(S) (Any olhtt numbar* (bal may b» attlgnad
Ihl* tapott)
10 DISTRIBUTION STATEMENT
Approved for public release; distribution unlimited.

11 *jPPLEMENTABV NOTES 12. SPONSORING MILITARY ACTIVITY
Office of Naval Research
Pasadena Branch
U ABSTRACT
These notes are based on a course of lectures given at Stanford,

and cover three major topics relevant to optimization theory. First
an introduction is given to those results in mathematical programming
which appear to be most important for the development and analysis of
practical algorithms. Next unconstrained optimization problems are
considered. The main emphasis is on that subclass of descent methods
which (a) requires the evaluation of first derivatives of the objective
function, and (b) has a family connection with the conjugate direction
methods. Numerical results obtained using a program based on this
material are discussed in an Appendix. In the third section, penalty
and barrier function methods for mathematical programming problems are
studied in some detail, and possible methods for accelerating their
convergence indicated./
X
i i
DD.'r..1473 """*
I NOV I
I)
Unclassified
S/N 0101.807.6801 Security Cltstification
...t„....-,.,^,..,..,Jl.MJ..A^.^,J......^^„-||r|.|| ;.,,ri,,.,.. L;i..-i-...^^-'.. ■ .^ .: .... _L

PPÜpignNi !VrappiiPPHmMPI|IW<( pnmn^W^P? ■wn»jj-ii;*p^.f».(iTO.i-Pw.w.i^TA'^ ^Wî,^;'W^"F!^-|!^J-iîl.iWlPl)iipWiiilWf!îil" "'1 .. •'-"■ '■.'»■. pM . 'W " ".I WPIHUJWiii.îiiifS»^!,.
Unclassified
Sacurity CU«»lftc>tion
LIMK * LINK • LINK C
KtV WONOt
NOLI NOLI MOLK
Mathematical Programming, unconstrained

optimization, descent methods, penalty functions
barrier functions, rate of convergence
DD .^..1473 <BACK) Unclassified

(PAGE 2) Steuritjr Clasalflcatloii
f,....-, .L ■,.■... ... . ■..

■■-'■■■-■-•'"-^ ' ■-MMKiftirriiirii ir ii nin1 m i--—^ ■'-^.-
^^^^^^ mmmmmmmnmmmm
'B??w5?(^^^!?''f?HÄ!p"(W*iHBW9'Ww i ■ iiMimmirr-'"———"
TOPICS IN OPTIMIZATION
BY
Michael Osborne
STAN-CS-72-279
April 1972
Computer Science Department

School of Humanities and Sciences
Stanford University
UL
-— ■
naMaaMMui aa— mjaaiMMMittitt 1
TOPICS IN OPTIMIZATION
Michael Osborne
Computer Science Department
Stanford University
and
Australian National University-

Canberra, Australia
Abstract
These notes are based on a course of lectures given at Stanford,
and cover three major topics relevant to optimization theory. First
an introduction is given to those results in mathematical programming
which appear to be most important for the development and analysis of
practical algorithms. Next unconstrained optimization problems are
considered. The main emphasis is on that subclass of descent methods
which (a) requires the evaluation of first derivatives of the objective
function, and (b) has a family connection with the conjugate direction
methods. Numerical results obtained using a program based on this
material are discussed in an Appendix. In the third section, penalty
and barrier function methods for mathematical programming problems are
studied in some detail, and possible methods for accelerating their
convergence indicated.
This research was supported in part by the National. Science Foundation

under grant number 29988X, and the Office of Naval Research under contract
number N-00011f-67-A-0112-00029 NR 044-211 . Reproduction in whole or in
part is permitted for any purpose of the United States Government.
^
Introduction
These notes were prepared for a course on optimization given in
the Computer Science Department at Stanford University during the fall
quarter of 1971. In part they ara based on lectures given during the
year of study in numerical analysis funded by the United Kingdon Science
Research Council at the University of Dundee, and on courses given at the
Australian National University.
The choice of material has been regulated by limitations of time as
well as by personal preference. Also, much material appropriate to the

development of algorithms for linearly constrained optimization problems
was covered in the parallel course on numerical linear algebra given by
Professor Golub. Thus, despite some ambition to cover a larger range,
the course eventually consisted of three main sections. These notes
cover these sections and have been supplemented by brief additional
comments and a list of references. A more extensive bibliography is
also included. This is an amended version of a bibliography prepared
by my former student Dr. D. M. Ryan.
The first section is intended to provide a solid introduction to
the main results in mathematical programming (or at least to those results
which appear to be the most important for the development and analysis
of practical algorithms). The main aim has been to characterize local
extrema, so that convexity and duality theory are not treated in any
great detail. However, the material given is more than adequate for the
purposes of the remaining sections. Opportunity has been taken to

prevent the recent results of Gould and Tolle which provide an accessible
and rather complete description of the first order conditions for an
extremum. The second order conditions are also considered in detail.
■
mmm
r3T»1WW^-T-nT-7i^«ff^TOJ'W^!W?W^,'*,^**B^^^
The sGcond section on unconstrained optimization is largely restricted
to that imbclass of descent mothodc which (a) requires the evaluation
of first derivatives of the objective function, and (b) has some ü
family connection with the so-called conjugate direction methods. This
is an area in which there has been considerable recent activity, and
here an attempt is made both to summarize significant recent developments

and to indicate their algorithmic possibilities. An appendix (prepared
with the help of M. A. Saunders) summarizes numerical results obtained
with a program based on this material. One significant omission from ^
this section is any detailed discussion of convergence. However, the
convergence of certain algorithms (those that reset the Hessian estimate

2
periodically or according to appropriate criteria) is an easy consequence
of the material given.
In the third section, penalty and barrier function methods for non-
linear programming are considered. This turns out to be a very nice
application, in particular, of the results of the first section. These
methods have advantages of robustness and simplicity but carry a definite
cost penalty. However, attempts to remedy this situation chow some
promiso. The material presented in this section has Jiiiportant connections
with other areas: for example, with the method of regularization for
the approximate solution of improperly posed problems.
Ac knowledgment s
O
The material presented here has benefited greatly from discussions
with Roger Fletcher, Gene Golub, John Greenstadt, Shmuel Oren, Mike Powell,
and Mike Saunders. The presentation of the course was shaped more by the
tüft ■- i i ■■ ■ ■■ i ■■ ■ ■ i —.■ .— == ...,■...,,., ,, t.

l^g^PPiPWPBIBPg?!^^?!^ w-tm-yii w •••mtmw -II . m ^■.•^•fnvim*'HM*.WMV?:rv^**?W^T*n^^ i»^Tr?7r'"rf?'a>»>1i7'^^>';'^™"i7r,7r-:sw"'
determination of the lecturer to cover as large a field as possible
than by any consideration for the audience. In spite of this the level
of continued interest was most gratifying, and it is a pleasure to single
out the credit students Linda Kauftaan, John Lewis and Margaret Wright
in this respect.
A special vote of thanks is required for George Forsythe who was
responsible for the invitation to Stanford, and for completing the
subsequent arrangements despite the efforts of the Australian and U.K.
Post Offices. In spite of this he was still prepared to combine with
John Herriot in keeping the lecturer in reasonable check, and to provide
the incentive to produce these notes as a CS report.
mm "trt'iir^nVi-i'fciaV! ir-iiifi i-NT! r , _;^^^.MÜ^-^»^.iWWM.*^ i

MU».''^»«'^'«'"»«^»^^ ^fuiw»'^'^^«^»!^^»^«..»^»«^
Contents
I. Introduction to Mathematical Programming 5
1. Minimum of a constrained function 6
2. Some properties of linear inequalities 12
't>. Multiplier relations ih
h. Second order conditions 20
5« Convex programming problems 25
II. Descent Methods for Unconstrained Optimization 36
1. General properties of descent methods 37
2. Methods based on conjugate directions ^
Appendix. Numerical questions relating to Fletcher's algorithm . . 68
III. Barrier and Penalty Function Methods 78
1. Basic properties of barrier functions 79
2. Multiplier relations (first order analysis) 83
3. Second order conditions 86
U. Rate of convergence results 91
5. Analysis of penalty function methods Ill
6. Accelerated penalty and barrier methods 117
Note. Equations, Theorems, etc. are numbered in sequence according to
subsection. Cross references between sections are indicated explicitly.
For example, equation 3 of subsection h in the Mathematical Programming
section is referenced as MP equation U.3 in other sections.

'i i
- - - — - ■ - ■-'■ ■■- ■■— ■-

wypj.iT,Twwiffg|,.rg^^.r.Twf'ry:îv.,^v^ yiy0tri5.yrrj.yrgTW7-7yjy- 'iT'FtTw^f g^^-f "iWJ'.T'^'F H w ■ -j,1" .'^. ■■! i t-ufwmn'y i »y f "B w w ^"'"^ '*w w yrir fl^'-q.tf *--l.:; ~^T=- 7 ■
I. Introduction to Mathematical Programming
] i
mm *t*m iiiMttmmm ■
pitl.lilWl|UWWlW.iW
1. Minimuro of a constrained function«

Consider a function f(x)
x on S c E -* E, where S is a given
~ — n 1
point set.
Definition; x is the global mininum of f on S if
f(x*) < f(x) VxeS • (1.1)
Remark; x exists, for example, if S is finite, or if S is
compact and f(x) continuous on S .
Definition; x is a local minimum of f on S if 51 8 > 0 such that
f(x*) < f(x) VxcN(x*,6) (1.2)
where
N(x,&) = {t ; S n [t ; l|t -xlji/ < 6}} . (1.5)
If strict inequality holds in either (1.1) or (1.2) whenever x / x
then the minimum is said to be isolated.
Definition; S is convex if x ,x e S =» 9x + (l-3)x e S for 0 < 9 < 1 .
Example; If S is convex all finite combinations of points in S is

m m
again in S . That is J^\.x. eS where x. e S , ^ \. = 1 ,
i=l *-* ~' i=l
X.i —>0,' l<m<<»
—
Definition; f(x) is a convex function on the convex set S if
f(öx + (l-9)x2) < 9f(x1) + (l-9)f(x2) , 0 < 9 < 1 . (1.1*)
Ä 2 l/2
11*11 ^ ty,'1'-} > ^he euclidean vector norm of t .
i-1 1
«MMMMMt
-■- ■ -
PPBfWWWBg^lWy ?"• [Hp^jj^yagg—ppyi -n
■rr .y. -■■ »J -w- ^ in ■■..am
If strict inequality holds when 0 < 9 < 1 then f is strictly convex.
Say c(x) is concave (strictly concave) if -g is convex (strictly
convex).
Lemma 1.1; If f(x) is a convex function on the convex set S then
a local minimum of f is the global minimum. If f is strictly convex
then the minimum is unique.
Proof; It is necessary to consider only the case f bounded below.

-x- **
If x is a local minimum but not the global minimum g x such that
x-x* x- *
f(x ) < f(x ) . Now, by assumption, 3 6 > 0 such that f(x) > f(x )
for x r N(x ,6) . Choose 0 > 0 sufficiently small for

#*• 'V *
Ox + (,l-e)x cN(x ,5) then
(i) f(x ) < f(öx + (l-O)x ) as x is a local minimum, and
(ii) ±'(9x + (l-e)x ) < 9f(x )+ (l-9)f(x ) by convexity
< f(x ) unless f(x ) = f(x ) .
Now assume x , x both aro global minima and that f is strictly
convex. Then
l>(9x*+ (l-9)xXX) < wf(:-:,() + (l^^x**) , '» < 9 < 1
which gives a contradiction. 3
Definition; A set C is a cone with vertex at the origin if
xeC=»?wxeC, \>0. C is a cone with vertex at p if
[x-p ; xcC] is a cone with vertex at the origin.
II mm, i 1 ■■! ■■■■

^^m^^^^K^KfmK^^^^"' ' W-iu. i.^ ■»-»^.■■IUWI,U..J ^-..IIUL»^..^. i-. ........ I.,IIIIIJ ■ - ,i. . .-..-v. ji m.»......,,^,,,.,,,.,,,^.
Definition; x is in the tangent cone T(S,x ) to S at x if
3 sequences {A ] > 0 , fx } - xri , [x_} c S such that
lim llV>c-x0)-xll =0 . (1.5)

n -••
Example; (i) S = [x ; (jx - wjj = r} , T(S,x0) = tx;xT(x0-w) = 0} .
(ii) S = (x ; Hx-w'j < r} . T(S,x0) = En if x0 in interior of S ,
otherwise T(S,x ) = [x ; xT (x -w) < 0} .
Lemma 1.2; T(S,x0) is closed.
Proof; Consider a sequence (t.jeTCS^x ) suchthat ljt.-t|j -» 0 , i -♦ •
It is required to show that t eT(S,x ) . Now t. eT(S,x ) ^ 3
[\x] > 0 , [x1} c S such that lijn (|\ (x^-x ) -t.jl =0 . Prescribe
[e.} iO . Select t. such that \\t. -t\\ < £^2 , and j = i(d) such
that ||\J(xJ - x0) -tj < £^2 . Then |l\J(xJ -x0) -t|| < ^ =» t e T(S,x0) .
' 3
Leroma 1.3; (Necessary condition for a local minimum.) If f(x) e (T -'
and if x is a local minimum of f on S then yf(x )x > 0 ,
VxrT(G,x0) .
Proof; Let x be defined by sequences [\ } , [x } . As x is a local
minimum 3 & > 0 such that f(x ) > f(x0) Vx £N(xn,6) . Consider now
the restriction of the sequences (X.n } , [x

~n } suchthat x G N(x
~n ~u ,6) .
i/fTc1 at x0 if f(x) = f(x0) + vf(x0)(x-x0) + o(Hx-x0||) . Higher

2
rrder continuity classes are defined similarly. For example, f cC
if the o( ) term can be estimated in the form
| (x-x0)T 72f(x0)(x-x0) + o(l|x-x0|l2) .
8
m mmmmmiKmm wipppw^îp^w^mnyppwiywBW"^1 «'■"■■ I"1'! V.-H.M . II,.IIII..I.II twwmpawappwiwwwwww—o
We have
0 <f(xn) -f(x0)
V.i
whence (note it is sufficient to consider x such that \\x\\ = 1 )

CJ
0 <7f(x0)\ (x -x^) + o(X ilx -x-ll)
<7f(x )x+o(l) as n -« . r-i
u
I'lxiunplc: (i) IT x ( G (the jnterior of S ) then 1(3,0 = E .
Thur. x can bo chosen arbitrarily so that Vf (x ) = 0 .
then 7 s =
(ii) If S - {x;||x-wH = r} ( >*o) ^;$T^0"^ = 0
^ '
In particular if xeT(S,x ) then -xeT(S,xn) . Thus we must have
T
7f(x0)x = 0 Yx suchthat x (x -w) = 0 . Thus Vf(x0) = a(x -w)
for some a .
Ü
(iii) If S = [x ; l|x-w|l < r} and Hx0-wH=r then

m
T(G,x ) - [x ; x (x -w) < 0} . In this case we have yf(x )x > 0
Vx euch that xT (x -w) <0 . Thus 7f(x ) = a(w - x ) for some U

nonneâtivc <.t .
Let A be a set in E
n
Definition; The polar cone to A is the set A = [x ; x y < 0 Vy e A}

*■
A has the following properties.
(i) A is a closed convex cone.
(ii) If A., c A^ then Ag c A

**
(iii) A = A if and only if A is a closed convex cone.
MMa ..^■^v ■ ... ;^^:ti--..J,/,.-..,;',,.,,,wrTl^n|-af.vBJ^»^;-.--i.--:-.i,.,-....<... .-,-.- ....„„..^ , ,...;..,. .,„.-, ^..u..,:^^

'I i ' IIIMI
WPHJBHPfPWI WUPPPUBBPWffPBWl pffBpWBfWWI WHWilBW !^j',^'l Iffwww fKjwwwjffWwwwwgww w^^wgwwwig'ywPBWW
iMWnfBMWWWW'Wggilga^W»»'!^^ WWWWWWBW^WWMWWMWWi
(iv) A = (A ) — the polar cone of the closure of the convex hull
of A . The convex hull of a set is the smallest convex set
containing it. Thus AC=nX, AcX, X convex.
(v) If A is a subspace then A1 = A .
Remark; Lemma 1.5 can be restated; • if x is a local minimum of f

* •*■ #
on S then -7f(x ) eT(S,x ) ».
Lemma l.k: If y€T(S,x ) then -y is the gradient of a function
having a local minimum on S at x. .
Remark; It is sufficient to consider the case jj y || = 1 , x0 = 0 .
Proof; Let C = (x ; x y < } e = 1,2,.... We first show that
for each e, ae(e)>0 such that N(0,e(e)) cz C . For assume this is
not the case. Then 3 {x } c E -C with x €N(0,l/p) , p = 1,2,...
such that
T
~p ~ > ^- > V = 1,2,... . (1.6)
e
itepit
The sequence ( ~2 ) is bounded and therefore contains a convergent
#
subsequence z . By definition z e T(S,0) , but, by (1.6),
zT y > -^
1 > 0
which contradicts y€T(S,0)
10
MM aaia i ...■.it-, .m -.î..^-. -.„. -^ ..^V^V.^1.K.VI. ^.^ -,.^ aaiteiiaaaaaMt anü MMMüiitMifcflfliii
mmmm IPWW ■iiWip»|i»l!fll^.tiP>jllJliMlJ:»^»--N^.v^îiiijitj|..|Wiji|i|Wi-iMiBiip.il||iWi.Jii.wi,,,|ig^
Now let e. = sup{e,N(0, E) c C. } . We define

K K
e^ = min(l, e^ ,
E = min e
k ^2 k-l,^k^ ' k > 1 ,
and
P(z) =2ll2 H , jlzjl >ep
z -e k+l 2 z e
k-Lzll G [
k-1 e
k " ekH e
k ' k+1 E ' 11 5 11 ^+1' ek] '
o , 11 z ll = o
It is clear that ev > 0 and e monotonically decreasing. Further

K K
P(z) > 0 , ?(z) is an increasing function of H z H , and
2
p
1I * 11
II21! < ^k ^ (f) < k-l •
u
Thuc P(z) = o(llzl|) so that vP(0) =0
Now let z = x - (x y)y . We show that, under appropriate conditions,

T T v i
x y < P(z) . It is sufficient to consider x y > 0 , and in this case
iM!-xTy < 1MI < ll^ll + fy . (1.7)
X
If xeC then xT y < —^
11 11 • , ^ vre have
Using (I.?)
^11^11 < Ml < ^|!x (1.8)
Now assume xeN(0, e) , e < E, • Then jj x H e [ e^.-,, £k] for some k>5
whence x c C. . This gives
ii
iâmMim
mimammatiaiiiamamiikäutik maMüMafaMfeMMÜlMIMIMi -'■■—'^Mi-iinf in-m-^:--^- -■■- A. .... .....■., ■....;.;.- ... ..^L...
,
^^^^*^^^^^^^WWIWWIW^tWWWP^»wiw Mimmn wmun-iM wn ■■■winu i... nn.uBmi mi. HIP.. ■!. -. ->^-^~ . in
e > E whence
However, II f II > -^ kfl k+2
2|| 2 ||
P(z) > ^ (1-^)
so that, combining (1.9) and (1.10)
xTy < ^jy P(z) < P(z) . (1.11)
Thus the function
has a local minimum on S at x = 0 . Further feC at 0 , and
7f(0) = -y . Q
2. Some properties of linear inequalities.

T
Definition; The set H(u,v) = (x ;u x = v} is a hypsrplane. Note that
the hyperplane separates E into two disjoint half spaces

T T
R4 = [x ; u x > v} ^ R^ = [x ; u x < v} •
Lemma 2.1: (lemma of separating hyperplane). Let S be a closed convex set
in E , and let x ^S . Then 3 a hyperplane separating x and S .
Proof: Let x be any point in S . Then min jjx - x jj < |jx1 - xJ| = r .
~-L xe S ~ ~U ~ ~
The function \\x -xJj is continuous on the closed set S (1 tx 5 Hx "xoll S T)

*
and hence the minimum is attained. Let this point be x . From
Figure 2.1 it is suggested that
(x-x*)T(x*-x ) =0 (2.1)
12
^.■rtViftt.»-^^^.-^^^ H —.■•■■■■^-^-tii.riiiiiirfi.'niiittuiiiiUKi mi j , , ,.11

mtmm w^m&mmmmîFmsmmwimi'rw*mmmmiinm, uim.mr'w'^vrvvi-m.mw. -
■\ % T1 , *
(x-^ )-(x -x0) = C
Figure 2.1 u
is an appropriate hyperplane. To verify this, note that X0GR_ SO
that it remains to show that S c R+ . Let xeS then for 0 < 9 < 1 ,
|l9x+(l-9)/-x0|l2 > |lx*-x0ll2
a
so that
.2,1jx-x
_*II2 . ^/_. .*\T/.*
|f+29(x-x )1(x -x0) > 0
and, letting 9 -. 0 , D
(x-x )1(x -x0) >0
whence x c R+ . Q
Definition: C is finitely generated if

P
C = [x;x=y[\. c.,X. >0,i= 1,2, ...,p} . It is clear that C
J
is a cone. It can he shown that C is closed.
Lemma 2.2; (Farkas Lemma). Let A be a pxn matrix. If for every

■. •
solution y of the system of linear inequalities
Ay > 0 (2.2)
15
"^"m"- ,.M.Tl..i,M.. r..^^„.,,...,„-..,.,.. .|.|

" liM l<lllfl
' <' mmammmmmmm ■mamigajBBäsm .,,.^..... ^^^/g
mni ^■.piwwiiUM , ...i.m i ij mn.i ..^r... ,..,•
üffRVWffîmRnivq^wvpip7^ wmfmmwmamwiimmm*ii*vm*f*vmKrimi*rmw<**w**i**r**rw- ■
W/m^mmmK^mmmMmmmmmmmmmmmmmmmmmmmmmmmmmmmmmium m*mmmmmimm0mmmm*mmm»mmmmmmmmim0K$HRGtßi «»iflirwjWW^^^'ftWI'WWI'ffll
it is true that
aTy > 0 (2.5)
T
then a x > 0 such that A x = a .
Proof; Let C be the cone generated by p.(A) , i = 1,2, ...,p .
Then the result of Parkas lemma is that if (2.2) =» (2.5) then acC .
We assume a^C and seek a contradiction. By Lemma 2.1 there exists
a separating hyperplane. To construct it let x be the closest point
in C to a . Then IJXx* - a||2 has a minimum at \ = 1 . Differentiating
and setting \ = 1 gives
(x - a) x - 0 (2.10
By (2.1) the equation of the separating hyperplane is
(x-x ) (x -a) = x (x -a) = 0 (2.5)
which shows that it passes through the origin.
By Lemma 2.1 C c R+ whence
vTA(x - a) > 0
for arbitrary v > 0 so that
A(x - a) > 0 , (2.6)
but aeR whence
aT(x* -a) < 0 (2.7)
which gives the desired contradiction. D
15.1
t^-^^.^ mMMmmmmmtmMmmkitäiMmaäaiiiiummi^^ -^.û..-^.^,^.^.,,^..,.^,.,.^.,:,.,,,...,.^:,.,

i^^!»)^^^PW^^»P«!I^T^Wll1wrg'11 .'I.'.I','»NW.il'iiJIWII'IT"1"""" .■'"■'"" '■■ '•■'•" »l».»'l.!i'ilP»«,WiH»"HW.HBW"""g"' ^-'l-'■»■■■i'll'"-T«T7^-f ^v. mi .I.UI^.I.;^
Remark; Another way of looking at this result is that at most one of
the following pair of systems can have a solution.
(1) Ax = b , x > 0
(ii) ATy > 0 , bTy < 0 .
This is an example of a 'theorem of the alternative'.
3. Multiplier relations.
We consider now the mathematical programming problem (MPP)
min f(x) subject to
g1(x) > 0 , i- ^ ,
hi(x) = 0 , i cI2
Wc assume that f , g. , i G I, , and h. , i c !„ , are in C and that

' 0i ' 1 ' i 2 '
Llie constraints on the problem are not contradictory. This corresponds
to the problem discussed in Section 1 with S given by
S = [x ; gi(x) > 0 , i c^ , h^x) = 0 , i e I2} (5.1)
At any point x ?S let Bn be the index set for the constraints
:at icfying g.(x ) = 0 . If i c Bn we say that g. is active at x
Definition: S is Lagrange regular at x. iff for every f such that
(i) f has a minimum on S at x , and (ii) f cC at x (i.e.,
fcF ) 3 u , v such that
(i) vf(x0) = £ ui76i^+ X vi7hi^0^ (5.2)

ieB. id.
€ B
(ii) ^ > 0 , i o •
Ik
frMMmMMmMMr iii.iaifivirniri .n.-n ,«*Mläilimililtlmaaiinitttl-m*mr. - • .n. mi i--........^-^^-..,^^..,,,...,. ■...,J,„....J..„....-„,..,J:...,^,„^.„... ....■■-,.».„.,._,.i.....,^

m^^*mmmmfmmm UWMi Ji H ii mm. IHUIIIIUIPIJIIIIJIII .>|ijyj|>|l|jii.iiuî||i)M>i!Biiii>iBiwiiajiiiWiii,iiljA.yiW.|,,i|1,ii,.iT,w
mmmmmmmmmmmmm
This can also be written
(i) ^f(x0) = ifl

X Vgi(^+ iel.
^ V11!^ '
(ii) uTg(x ) = 0 , and
(iii) u > 0
where zero raultipliers axe introduced corresponding to the inactive
constraints.
Remark; If (5.2) holds for f e FQ ; then f satisfies the Kuhn-Tucker
conditions.
Example; It is important to realize that (5.2) need not hold. Consider
the MPP
min f = -x1 ,
subject to g1 = x1 > 0 , gg <= x2 > 0 , g5 = (1 - x^ - x2 > 0 .
Frcsn Figure 5.1 it is clear that the minimum is attained at x1 = 1 ,
x = 0 , and here g. and g, are active. We nave
Vg-L = -7g5 = e2
while
7f = -^
so that a relation of the form (5-2) is Impossible.
Figure 5.1
15
V
L'i>w^tfl.il,Hiyiitt-rfiWajhU,i.r^-.lv,-vr, ., ■■^^.v^^tL.i^^. ^....^„j.^^,**^^.^^.* -■^.—--^ ^k^-^-..-,.-,,..'...-.—■ ....... .,._.■.. ......
m ■ • I,. ,)M "P »i||!W>"iii!»fiJji» l ,|iiilni)M .».wiwv-v ...-wi.i.11,.. i .lim m I.UXIV PDH
^mmm wm wtm
Let
H0 = {x jy^Cx^x = 0 , iel2} ,
G = ;Vg 0 :LcB
0 fe i^0^ ^ ' o^ '
Lemma j.l; S is Lagrange regular at x iff -yf(x ) e (Gn n Hn)

v I
for all f EF .
Proof: If -7f(x0) e (GQ fl H0) then
-Vf(x0)y > 0
Y.Y suc?i that
tf'i^h ^ ü ,
Vg.1(x0)y > 0 , i r B0 .
Thus, by Fai'kas Lemma, 7f(x ) is a linear combination with nonnegative
weights of Vgi(x0) , i r B0 , and V\{x0) , -7hi(x0) , i e I2 Thus
{5.2.) holds. On the other hand, if (5.2) holds then 7f(x )y > 0 for V !
all y r G0 n H0 . 3
Remark: Lemma 5-1 shows the difficulty with the above example. Here
T ^ {x = -ae, , a > 0] , G- 0 H0 = (x = ae, , a unconstrained} . We have
T -- right half plane , (Gn 0 H.) the %n axis. By Lemma 1.4 for
every xcT there is a function with a minimum at (1,0) and such that
-yf = x . Thus the conditions of Lemma 5.1 e e not met in this case.
16
--"■"■"■-■—■"! iriiiiiUM-"—^-— — - ■- ^■^^■■^L.^;...^/^..-...-.. ■^.:. ..■...^.i.^^.i^.'v^-.J..^ .,v^..';-i.^ ■■-....; „^ .^ ■ ■■■.^:;,.^^..^..-..:;.^.U-.-.JJ^^:t.^. a

wummm '■ WgpiliplgjiPBPWipppiBpiliB WBîîwwwifiw^wiPwpffwwwwpqwpCTiiiBifffiwi ■■WBWWBqiWWgtBHW i»»/1 '^jf uvi i^î ^mqp "T7" .7""rv«f»7Tii; rITTC'f •-TI"W-' .-rnK^^ 'irTTV^Mirv*"
Lemma 3.2; (GQ n HQ) C T(S,X0) .
Proof; This result follows if we show that T(S,x0) c HQ 0 G0 . If
x eT(S,x0) 3 [xn} - x , [xn} c S , {\n} > 0 such that
^Ä " ^0) ^ ^ ' We have
0 = h^^) = ^(XQ) + Vhfy) (x^ - x0) + odl^ - x0ll) , i e I2 ,
and
O^g^) =61U0)+7gi(x0)(xn-x0)+o(llxn-x0ll) , i€B0
Multiplying by \ and repeating the argument used in Lemma 1.5 we have

n
Vh^x^x = 0 , iel2 , ^(XQ)^ >0 , ieB ,
so that x e G- n H0 . D
Theorem 3.1; The set S is Lagrange regular at x0 iff
T(S,x0)* = (G0 nH0)* .
Proof; If T(S,x0)* = (G0 n H0)* then -yf(x0) e (GQ n HQ)* VfeF0 by
Lemma 1.5. Thus (5-2) holds by Lemma 5.1. If S is Lagrange regular at
x0 then by Lemma 5.1 -7f(x0) c (GQ 0 HQ) Vf eF0 . .*. , by Lemma l.k,
T(S,x0)* c (GQ 0 HQ)* . Thus T(S,XQ)* = (GQ 0 HQ)* by Lemma 5.2. 0
Remark: Conditions which ensure that S is Lagrange regular at XQ
are called restraint conditions. Theorem 5.1 gives a necessary and
sufficient restraint condition.
Corollary 5.1; (Kuhn Tucker restraint condition). If vg^x^t > 0 ,
i e B0 , and vh. (xjt = 0 , i€l0 =» t is tangent at x to a once
17
a*^ titmûmmmmm JciMniiiiri aa»Mm»âäiu»i.v...v..-,,-.i.,-,.^w^ r ^.J..;;..^.......,.,..,...^

P^WI^WJ.WW.W*WIIP;^^W^»IHIIIIIIUI.I,»I. .H)mmvu>m\mHitiim«m.'i*»m9tmnii\i.mwv'"' ~
differentlable arc x = x(9) , x(0) = x contained in N(x0,R) for
some ft > 0 then S in lia^ranê regular at x .
Proof; It is clear that t c T(S,x0) for consider a sequence (9^ i 0
and define [xjJ = [*(%)] , [\} = [^-} then

^n
dx(0)
Thus the Kuhn Tucker restraint condition implies (G0 0 HQ) C T(S,X0) .
The result now follows from Lemma 3.2 and Theorem 3.1. G
Lemma j.'j: Lot k.(x) c C , k.(x ) = 0 , and ^(x )t = 0 ,
i - 1,L',...,s <n . We assume ^ E > 0 such that the 7k.(x) ,
;i. 1,:',...,L; are linearly independent for j|x-x || < e . Then ^
a smooth arc x - >:(9) , x(0) -= x , such that k.(x(0)) - 0 ,
x-,.
dx(ü)
1 - l,i!, ...,s , for i:jx(9) -x i| < e and
d9
= t .
Proof: Let P(x) = KT(K KT) "1 K where p.(K) = 7k. (x) , i = 1,2, ...,s
Then x(9) can be found by integrating the differential equation
dx
= (I-P(x))t (5.3)
"d9"
subject to the initial condition x(0) = x . G
Remark: Let the k, be as given in the statement of Lemma 3.3. Then
the linear independence of the vk. in a region containing x is a
consequence of the linear independence at x . For consider the matrix
KK . At x = x this matrix is positive definite as K has rank s .
18
-n i - ikM. i-ii'Miiriini-ini - j m I agn^ni '■'^--"■■■■^-•îiiaiiii^jii-iiîa^jt

l.„^„.^Wi...il».l .■■»■.I.JIII»I»J|I«,H»),J.IIU..1WI..I». ilWIIJl|»|l»lJ»^JIWIIMUHWMJtJ!^^^
Thus the smallest eigenvalue is positive. Clearly it is a continuous

function of x so that it remains positive in a small enough neighborhood
of x , and in this neighborhood the 7k. (x) are linearly independent.
Lemma J.k: (Restraint condition A). S is Lagrange regular at x
if the set of vectors yg^Cx^) , XeB. , Vh.{x ) , iel2 are linearly
independent.
Proof; This is a consequence of Corollary 3.1 and Lemma 5.5« For
let t e G0 n H- , and let B(t) be the index set such that
7g.(x )t = 0 , ieB(t) . Then by Lemma 5.5 a smooth arc can be constructed
such that x = x(e)) , g.(x(e)) = 0 , ieBft) , h (x(9)) , ielp ,

äx(O)
ü.(x(e)) >0 ,
X M
id -B(t) ,
X
x(©) €N(x,6)
t*t *j
for some 8 >0 , and dQ
=t .a
Lemma 3.5: (Restraint condition B). If 7hi(xC)) , ielg are linearly
independent, and if 3 t such that 76-(xr.)t >0 , ieB. , 7h. (xn)t = 0 ,
icl? , then S is Lagrange regular at x- .

»
Proof; Assume w e GQ n H0 but w^T(S,x0) . Prescribe Uk} 10 **&
w
set k =w+ ei^ • Then vg
i(xo'wk >0
' ieB
0 ' 7h
i^0^k = 0
' ieI
2
Now construct x. = x]^(6) such that ^(0) = x , -~— = wk ,
hi(xk(e)) = 0 , iel2 , for ^(6) is some neighborhood of x0 . By
continuity there will be a subneighborhood (say N(x ,6.) for some
dx (©)
6k>0) suchthat (i) ^(^(9)) -=g >0, ieB0,and
ie
(ü) gi(xk(e))>o, VBo for
*k(e) eN
(*0,6lP ' The
^g""1611*
I I
of Corollary 5.1 now gives w cT(S,x0) . But, by construction,
{w } -♦ w . Thus w e T(S,x ) as T is closed. G
19
--*^-*----1 —■■ii'■ ■■■■■■■UM-* ■■-. - ■'JI,"';"'J"imiriiliMiMafiifiiiiiiiiaMiiii«wiiiniiiii --^h^« ^■^.;.>,„...^;...:,-.^.^ ..

PW"!'«11" ^t^^^WKinr-m.* . ..i^m^mpn i,,,, , ■ *■■*£■>?■ fW^SIB^B twi^'-Jiû»
U. Second order conditions.
In certain cases it is possible to furthor characterize local minima
of f on S by looking at second derviative information.
Lemma h.l: Let w(x) eC , w have a local minimum on S at x0 ,

.T_2
and ^w(x0) = 0 . Then t1 7 w(x0)t > 0 Vt € T(S,x0) . If
tLT 7 2w(x0)t >0 VteT(S,x0) then 38 > 0 , m >0 suchthat
w(x) > w(x0) + m||x - x0i|ii2 , X€N(x0,6)
Proof: Let [x } , [\ } be defining sequences for t€T(S,x ) . Then
for n larê enough we have, as 7w(xn) = 0 ,
0 <w(xn) -w(x0) =| (^-^VwCx^C^-^+od^-xJ2)
■'• 0
- \<V(*r) "w^0^ =
2 * 7 w
(x0)t + o(l) as xn -* x0 .
Now assume t Tv 2w(x )t >0 Vt€T(S,x0) and 3 no m >0 suchthat
w(x) > w(x ) + m||x-x I] for x in any neighborhoc jf x . This
implies that for any integer q , 3 x cS such that (i) x fN(x_,l/q) , LJ
1 2
(ii) w(x ) -w(x ) <- Ijx -x-ll . Select a subsequence of the x such
-q
X - X,.
thai. C if^ 11 / - teT(S,x_) . Then (ii) =» tT7 w(x_)t < 0 which U
l-q'Ô'1 J ~ ~u - -U -
c;ives a contradiction. □
Definition; The Lagrangian function associated with the MPP is given by u
£(x,u,v) = f(x) - Eu<8<(x) " E v-Jx (x) . (U.l)

^ —* <** m* . —■ X X «M • «•■ X X <M
itl iel.
20
amaBü n ■ - ■ ■ ■ l^m^m^m,mmmmmm^mmt^ UK. I ■ ' -■ - mmm^^mam^m^

n^ !■■' IIBB 11 m ' l ■■ ll" i ■TWi.i.ii>M>jiii,i!iji.J,gP..giw...^.jJ.illtl in ^t/Bî
wmm ■■■■■■■
It will frequently be convenient to suppress the dependence of x on u
and v in the case where these are Implied by the Kuhn Tucker conditions.
In this case (3-2) becomes
7l(x0) = 0 . (U.2)
Lanma U.2; Let S be Lagrange regulAr at x0 , f(x) have a local
rainlmum on S at x0 , and S1 = [x ; xcS , g^x) - 0, ICBQ} then
LT_2
t^ x(x0)t > o , n c TCS^XQ) (^.5)
Proof; Note that I = f on S1 so that X has a minlaam on S1
at x . Also, as S is Lagrange regular at x0 , 7X(x0) = 0 . Thus
the result follows from Lemma U.l. O
T 2
Remark; If Sn is Lagrange regular at x. then t 7 X(xn)t > 0 Yt
such that 7gi(x0)t = 0 , ICBQ and 7hi(x0)t = 0 , i€l2 .
Example; Consider
g
l =
X^+(X2+1)2-1 >0 , gg l-x2-(x2-l)2 >0
S is illustrated diagramatically in Figure U.l. At x1 = x2 = 0 ,

•
76-, = 7gp = (0*2) • However S is Lagrange regular at the origin —
for example, e. satisfies
7g > 0 7g 0 80
lf2 ' 2f2 >
that restraint condition B
applies. In this case S. is
the single point x = 6 so that

> I
TCS^ö) is null.
Figure U.l
21
^^
WMIUPPPpBUjPIIIPBimP^BBPIB " '. HUIIJI.Himi^Hlwmpiffpw.iaiilJ.il'il ■ rimiijrr.lMVW?*' '^r^-™^!-.--*^^'-*™--^,^ f~v~^™^ r:r ■.-.^r.m.i-B.raJOT
Lemma >io; If t 7 ,£(x )t >0 Vt e T(S,x0) suchthat yg (x )t =0
V;i < I5#, such that u. > 0 then 'im, 6 > 0 such that
0 1
f (x) > f (x0) + mljx - x0f , x e N^ft) . (h.h)
Proof: Assume :>1 no n. , 5 > 0 such that (h.h) holds. Then for each
integer qax suchthat (i) x eN(x,l/q) , (ii) f (x ) - f (x )
<— ijx -x il . Select a subsequence of the x such that
X • X
0
II"^ Z \\ - t ' T(S,xJ . Set G = r u S-Cx) . Then G>0 on S,
I'-q'ôll ~ ~U ieB0 1 ~
i;(x ) ^ 0 , and f = l+G . For the subsequence defining t we have
r(x ) - r(x ) G(X )

--a p2.+ —=^2 < T >• ^-5)
ifeqôl! teq-&ll '
Thus
G X
T ? ( J
t 7 l(x0)t + lim sup =3—^ < 0 (U.6)
- ~ q .• ||xq-x0|l
A:- G(x ) > 0 , the second term is bounded and nonnegative. Therefore
ü X )
( Q r^
1 1
q -• IKq ' Soil i€B0 ~u ~
Thus
Vgi(x0)t = 0 , VicB0 such that ^ > 0 (1+.8)
so that (U.6) states that 3tcT(S,x0) suchthat t satisfies (1+.8) and

T 2
that tt 7 X(x )t < 0 . This gives a contradiction. D
22
"^^ -•-■-- - - -—-—-— -^^ - -- - .-.. •■-/>*^J" —-

)l|.!i'.ii..niPMiviniiiii,iw,w>.tili,i.lwl,.wwwl|, ————, mmmmmmmmmmmmmmmmmmmmmmmmmmmmimimmmmmmmmHt&IBK^^
Consider noK the system
VZi*0) = 0
uigi(x) = 0 , i = 1,2, ...,ra ,
^(x) = 0 , i = 1,2, ...,p (M)
where explicit envunerations of I1 "ani I- are assumed.
Definition; J(x0) is ê jacobian of the system (^.9) with respect to

(x,u,v) .
-VII^XQ)
1
... -7h (x0)T
^^ -^1^0)T •*• -7gm^0)T
ul7g1^0^ g
lW
0
J(x0) = g (
m ^)
0 0
^pô)
(^.10)
Lemma k.k: If J(x0) is nonsingular, then x0 is an isolated local
minimum of f on S .
Remark; Note that the condition J(x0) nonsingular imposes strong
conditions on the problem. For example,

► •
(i) the active constraint gradients must be linearly independent, and
25
mi*li**iä*Mm*tä*ui*b*i.^^.^^. .*. _ ,. ,.,....„■,. ...-

'"•■■■I"1 'l,l I WIIWWI><PI|PipWIPWP^P»ipwplWPWWIl|pgll>pwiBIWP|j^BIB|^^
(ii) if gi(xn) = 0 then u, > 0 (this condition is called strict
complementarity).
•u
In particular S, is Lagrange regular at x .
Proof; If J is singular there is a vector a satisfying

u
a = 0 . (4.11)
u
b
This relation cives

U
7h = 0
(i) i(ô^ ' i = l,2,...,p,
u 7
(ii) i Si(ô^+aigi^0^ =0
' i = 1,2, ...,ra , and
v 2 X(x0)y - ^Ä a 7g (x ) T - T^- ^^(XQ) T =0

(iii) i i 0 . Ü
i=l i=l
From (ii) we see that u. > 0 =* 7g.(xrt)y = 0 while u, = 0 ^ a. = 0 .
Now consider the problem
T 2
min y y Zi*0)y
= 0 ieI arld = 1
subject to 7Bi(x0)y = 0 , irB0 , ^(^Q)^ > 2 ' H2H
Clearly the constraint gradients are linearly independent as
2y = v(|ly|| ) is in the orthogonal complement of the set spanned by the
other constraint gradients. Thus the set of feasible y is Lagrange
regular at every point by restraint condition A. Let yn minimize the
objective function (the minimum exists as the constraint set is compact),
then the Lagrange regularity ensures that 3 multipliers \ , a. , ieB- ,
b.i ,' iGl02 such that
2h
^■■■■^-^-.-..■. ..... —|Mt|ttt

■■—--' ■ • -
"~—— -.--^-...V ^-.-.—..^.. ^ ...... ..^
i mm*—mmmmmmmmmmmmmmmm
^(&)& - ô - J^y*^ - J: >7hi(-°)T = 0 ( 12)

''-
v/hence
x
= y^^ôô= min i^î z0
Now if \ = 0 , (^.IS) shows that conditions (i) - (iii) above are
satisfied and hence J(x0) singular. Thus if J(x0) nonsingular,
then \ > 0 . In this case Lemma U.5 shows that the minimum of the MPP
is isolated. O
5. Convex programming problems.
If g.(x) concave, iel1 , then the set S = [x;g.(x) >0 , ielol
is convex. The problem of minimizing a convex function on S is called
a convex programming problem. In this section certain properties of this
problem are studied. We require the following characterization of convex
functions.
Lemma ;?.!; If f(x) eC then f(x) is convex on S iff
f(x) +7f(x)(y-x) <f(y) , x,yeS (5.1)
Proof: If f convex then, for 0 < \ < 1 ,
f(x+ (l-\)(y-x)) < f(x) + (l-\)(f(y) - f(x))
whence, if \ < 1 ,
f(x+(l-\)(y-x))-f(x)
25
hiiiiâ-iifiîi^rmitfliiläliifhirtrtifiKiTi-tf' iitfiitrtttîfiWriirirfiHiiniYiîitMfaWTiOiniTi-irtiiW-'iiiirtOTrTiWr'rr^ ., ■.■ . ■-.■

,.WI'WUWW»li|!ll|W IIIJiWLy^lWTipiJ.JB^ywiwiyyjl^.iw^HW^Wjjw^ wjfiwr.M'î'"' wpyiwy^mmmiij^'nim.w^fT.'nnj'jin -■'■»—■
The necessity follows on letting \ -. 1 . Now if (5'1) holds then
f(\x+ (l-\)y) + \7f(Nx+ (l-\)y)(y-x) < f(y) , (5.2)

M n« ^» ****** **
f(\x+ (i-My)
-*
- (i-M^(^•# + (i-My)(y-x)
******
< f(x)
**
. (5.5)
M
Multiplying (5.2) by (l-\) , (5.3) by \ and adding gives (1.1+) which u

demonstrates sufficiency. D
Lemma 5.2; If S = {x ; gi(x) > 0 , g^^ concave, iel^ has an interior
point x , then every point of S is Lagrange regular.
Proof; Consider x eS . Let i€B0 then Lemma 5.1 gives
,i
Vg.^)(x* -x0) ^.(x*) >0 (5A)
as g.(xj = 0 , ieBn . Thus restraint condition B is satisfied. D

i „0 0
Lemma 5.?: If f convex satisfies the Kuhn Tucker conditions at x0

3S
then f has a minimum on S at xrt .

j
Proof; In this case (5.2) gives , I

I
uT x 0 u 0
Vf(x0) = £ V^Ô^ ' 6( o) = ' i^ *
iel1
j
Let x be any other point of S , then
f(x) > f(x) - E u.g,(x) = X(x) (5-5)

~ iel1 a :L ~
where £(x) is convex on S as the gi(x) , iel1 , are concave. Thus
f(x) > XCx^+VlCx^Cx-^)
= f(x0) . D
26
flaalHaijiaâiMg—j....;.^,...,,.-^
-■-••-■M"-j-i"-ii niiiiMi
PPIJ^»WWI|IPII(||»|IIIIIII|JMPPW«WV»!»II.II1P«',?VW1"™.™''«.".I"MIP<'P.-™,|.».I' )■ Tr.iiii.iii ii. „mmm*mi«fmtwr-l~wmwiwm.»WWIweiwiVWl-i'>. »m™-»™
Ranark; If f has an interior the Kuhn Tucker conditions are both
necessary and sufficient for a rainimun of the convex programning problem.
Definition; The primal function for the convex programming problem is
a)(z) = inf f(x) , Sz = [x ; g(x) >f} (5-6)

xeS„
Note that if z1 > z then S z c S Z ' so that u)(z-) —> (i)(z ) and that
~x — „d l *~ 2
if S has an interior then Sw nonempty for z > 0 and small enough
Lemma S3: (ü(Z) is convex.
Proof: If x. e S„ , x_ e S^ then, by concavity of gi , ie^ ,

——— „X Z- -.t "p
g(\x1+ (1-Mx2) > ^t (l-\)z2 , 0 < X < 1 .
Thus \x1+ (l-\)x2 € S^^.^ We have
vCKz^ {1--K)z0) < inf f^t (l-\)x2)

€SeS
^l Zl^2 z2
< inf (^(x^ + Cl-MfCxg)) by convexity

^l€S2l^2eSZ2
<\ inf f(x.)+(l-X) inf f(x )

GS eS
^l z ^ z,
< Mz^+Cl-MwCzg) • Ü
Definition; The dual function is
0(z*) = inf f(x) - gT(x)z* , z* > 0 (5.7)

xen
where n is the region on which f , -g^ , ^\ > aa« convex.
27
-"-'-'--- "■ —- -- --"' "-^-~ v- M '-1- îMt ..^...-~î^^..^--^.. , ...

^PHMMMOTP IWJP««WIIIP,niip"ll(«lll'l'WWJ.i«»F|llil.>l,iiWi!llll,.|»,.lii|"W'JWl""M'|i»'1.'■'."■■■ . -■..Twrw'-r,™-™^.^-^"^-^—^-.—- —r^rwipnnrwnn^
Lemma jjiA; 0(z ) is concave.
* *
Proof; Let 0 < X < 1 , and z", z" > 0 , then ,)
0(\z*+ (1-Mz*) = inf (f(x) -gT(x)(\z*+ (1-Mz2)}
= inf {\(f-gTz*) + (l-M(f-gTZ2)}

x
> \ inf (f - g^z*) + (1-M inf (f - gTZ2)

- x ~ ~1 x
ü
a
> \ 0(z*) + (lAWz*) •
Lemma ^; Let r = [z ; 3 xen such that g(x) > z} . Then

L>
^(z*) = inf (u)(z) -z z ) . (5.8)

zer ~ - ~
Proof:
T *N
[z ) = inf (f(x) -g(x) z ) ,
< inf (f(x) -z z ) ,

xeS
T *
= cü(z) - z z
(z*) < inf (u)(z) - zTz*) . (5-9)

~ ~ zer ~ ~ ~
Now let gC^) = ^ . Then
28
"•"-'--'---■•-- -—^"-^ >.--■ - -—~~ ~—m^,~.,..^ .._ , „.■_.■■■■,—^, ——■.■^.■.^—

jify.iiinu.jtmiiiii.i i.i »pf.niiiwpnnDHNi mmmmmmmmmmm***i***immmm*Tm, ■"i-f^T*"!^»" '?*fT* 'fTf**
-r^—WVfr^f,-^,^
f(x1).gT(x)z > inf (f(x)-Z^z )

z
- l
> vizj) - ^ z
> inf (u)(z) - z z )

~ zer ~ ~ ~
inf (fCx.) -gT(x1)z*) > inf (u)(z) - zTz*) . (5.10)

zcr
ll
The result follows from the inequalities (5.9) and (5«10). 0
Theorem 5.1: (Duality theorem). (i) sup 0(z ) < inf f(x) .
z*>0 xeS
(ii) If S has an interior, and 3 x0 such that the Kuhn Tucker
conditions are satisfied, then 3 z maximizing JÖf^ ) and equality
holds in (i).
Proof; From Lanma 5.5 we have that
0(z*) < u)(©) = inf f(x)

xeS
X
holdc for each z > 0 . Thus
sup 0(z ) < inf f(x) (5.11)

xeS
z >0
If 3 x such that the Kuhn Tucker conditions are satisfied then x
minimizes f on S . Defining z = lu1*"ûm} where the ^ > 0
are the multipliers in the Kuhn Tucker conditions we see that
0(2*) = f(x0) . D
29
•^^^^•^^■-■-^■"-.vmi-yraäH! - -— —- "-an iiii1tM__i_ij__

ifmmmm rnrnrnrnmimm III..IIIII l,'IIPIV»l')IJI.|»,tl'i|l)lll|ill|l|l,»|lW»mB»Wy"'Vi|l'-»l-'J»ii|JI|l»)ti|Uj,t<»l,i »i,,^.,,,»!^^^ ^^mwr^„rr^„f^r^r„^„rrr**rVmurww^
Corollary $.1: (Wolfe's fonn of the duality theorem). Consider the
primal problem minimize the convex function f(x) subject to the concave
constraints gi(x) > 0 > 1 « 1,2, ...,m , and the dual problem maximize
jC(x,u) subject to 7 1 = 0, u>0. If a solution to the primal exists

MM X ^
then the dual problem has a solution and the objective function values are
u
equal.
Remark; (i) The linear prograjnming problem
min a x subject to Ax -b > 0 (5.32)
is a special case of a convex programming problem as linear functions have
the special property of being both convex and concave — this 1§ an

U
Immediate consequence of Lemma 5»1- This property of linear constraints
peimlts the previous discussion to be extended to permit linear equality
constraints. Note that if the linear equality constraints are not to be
contradictory, then their gradients must be linearly Independent.
(11) If the restraint condition B Is satisfied at x , and f(x) has
a minimum on S at x0 then x. also solves the linear programming
problem
min f(x0)+7f(x0)(x-x0)
[ i
subject to
(i) 8i(x0)+7gi(x0)(x-x0) >0 , i€l1 , and
(ii) ^(XQ) + V\{*0) (x - x0) > 0 ,
-^(XQ)-7hi(x0)(x-x0) >0 , iel2 ,
as the Kuhn Tucker conditions are both necessary and sufficient for a
50
BMBBB; j^^.^,;;:^ _,;..-:- .-^ ^.. „:.-.. „..■.^■w.^;;,.:,.: „: j^^gj^ MttMtiMiiiiiimaMüi

itJUm^mî;**^**^:*.-. sîTlÂml&djä
■'«"PW"' ' liIIBlli,J,1lW>|lll|lll|N.lipi,ll)|Jjl,l|li„HiliUlm,| IHHIIIIUIHPÎ,,,,!!,,!,,,^,,,,^,,,,,^,,^,!.,,^.,..,„
solution to the linear programming problem. That the converse need not
be true is readily seen from the example min -x subject to
1 - x2 - y2 > 0 which has a minimum at x = 1 , y = 0 . The associated
linear programming problem is min -x subject to 1-x > 0 which
has the solution (l,y) for any y . Thus additional conditions are
required if the converse is to hold (for example. Lemmas h,3 or k.h

-
could be used).
Example; (i) (Duality in linear programming). Consider the primal
problem
minimize aT x subject to Ax-b >0 .
M# M ***** ^
The corresponding dual is
maximize bTu T
subject to Au-a=0 , u>0 .
If the primal has a solution then so does the dual and the objective
function values are equal.
(ii) (The cutting plane algorithm).

(a) Consider the set S = {x ; g^x) > 0 and gi concave, iel^ .
If x*/te then gi(x ) < 0 for at least one i . Let a satisfy
e (x ) < g.(x ) , iel, . Consider the half space

U = lxjga(x*) + 7ga(x*)(x-x*) >0] . Then xÛ . Now if ga(x) >0
then, as g^ concave,
ga(x*) + 7gct(x*) (x - x*) > ga(x) > 0 .
Thus ga(x) > 0 =» xeU so that
Sa = [x ; ga(x) > 0} c U
51
■Hü iüMü IMIahfc«.v,„ r -. .. -^^.^......... ..„■„ „n,-,.,,,,,.,,, ■iiM.||.iti«..,IJ.,^.-,...1h|||r|.|..l|l||

mm&mfmvm iw im'in .11.. mvmi^mnnnnm^w «MIH, I M .TT-mTwmcif M»IM WW'W.T.'-^ vwnn-^T-rr^^-^-.n-r-.-^ .v...... , :,-fW' .n:WT!-r—*^r-r,
if ir ir
We have S c S cU . Thus the hyperplane afx ) + VgL~(x ) (x - x ) =0
separates x and S .
(b) The convex programming problem minimize f (x) subject to xeS is

equivalent to the problem minimize x^,, subject to xeS , x ... - f (x) > 0
n+x w nrx *. —
vhere x ^^ is a new independent variable (note that the new constraint
is concave). This equivalence follows from the Kuhn Tucker conditions u
by noting that the new constraint must be active. Thus a convex programming
problem can be replaced by the problem of minimizing a linear objective
function subject to an enlarged constraint set. u
T
(c) Consider the problem of minimizing c x subject to xeS and S
bounded. In particular we assume that S c Rn = {x ; Ax - b > 0} . We

I'
can now state the cutting plane algorithm
(0) i = 0 .
(i) Let x. minimize cT x subject to xeR. .
(ii) Determine a such that ^(x.) < g..(x.) , Je^ .
(iii) If g^) >0 go to (v).
(iv) Set Ri+1 = Ri n {x ; g^C^) + Vg^) (x - x^ > 0} ,

i := i+1 , go to (i)
(v) Stop.
Note that step (i) requires the solution of a linear programming problem.
(d) The cutting plane algorithm generates a sequence of points x^^ with
the property that

mm rji rp
c x < c x, < ... < c x. < ... < min c x
~ ~ü ~ - -L ~ ~1 xeS " ~
T
as R0 2 R-i 2 • • • 2 S . Thus the sequence {c x. ] is increasing and
52
lüilüiiüiim'iiuiirirrir' i ■ mmâmtmsmmummmmmamammuiimmitiatmmmmmmam^^ ^.L^**^^**.*^

i»'.'W-i*»^J.i!ii'SiqjMi^|iv;^ ^T^f^W^MP m ^'•Srrm^^Tyywr,rrVryrn,pr^
bounded above and therefore convergent. Let x be a limit point of
the [x } . Then x GS and therefore solves the convex programming
problem. To prove this, assvune x /t S . Then
min g,(x ) = gjx ) = -\ <0 .

i i ~
Let a subsequence {_x.} -»x , then, 3 k such that
(i) llär**!! < Ä ' ^

(ii) ^ < -i
where C > ||7gi(x)|| , xeR0 , ie^
Let
min B^x^) = SßC^) •
NoW X a 13jnit p0int 0f

Then gß(xk) < ' 2 ' ^Xi^ ^ ^ e nR
i ' ^
particular, x eRj-i whence
c + 7g (
ß^ ß ^)(~ "^ ^0
But
S^^-^ll < ^c < T

so that
g
ß(^ + ^(^(** " ^ < " T + H^ß^)(x*" ^ ^
< 0
which gives a contradiction.
35
■ ''-•■- -j-: --•--■■■ ■i-.^1.ai.»tiii»i.,i>Ma»-,.v.^.â.,.ai,1„.r^..;..^,-,.,-.,-■- ,...■■i...J.ai...„.fl.,,-||||||

■^•"■^---^^-"îViV.l'^î ^fi - im
PffWP^WWIHPPWP^W^WWiîWBüWWiîipH^^ Hü i.i L IIIUU. fQimmmmmmfmiiiiiimmmmmmm ■— ■
Notes
1. For properties of tangent cones, see Hestenes. Luenberger discusses
polar cones (which he calls negative conjugate cones) on pp. 157-159.
Lemma 1.1+ is due to Gould and Tolle. The proof is due to Hashed
et al.
2. Hestenes is a good general reference for this section and Includes
a proof that a finitely generated cone is closed. The proof given
here of Parkas Lemma is standard (see for example Vajda's paper).
An extensive list of alternative theorems is given in Mangasarian.
3. The main result is due to Gould and Tolle. The treatment of the
other restraint conditions follows Flacco and McCormick.
h. The treatment of second order conditions is based on Hestenes.
Similar material is given in Flacco and McCormick.
5. The treatment of duality is based on Luenberger. A related treat-
ment is given by Whittle who is good value on applications. Vajda
is a good reference for the mathematical programming application.
Wolfe's papers in both the Abadie books discuss various aspects of
the cutting plane method.
References
J. Abadie (Editor) (1967): Nonlinear Programning.

(1970): Integer and Nonlinear Progranmlng. North Holland.
M. S. Bazaraa, J. J. Goode, M. Z. Nashed, and C. M. Shetty (1971):

Nonlinear progranming without differentiability in Banach Spaces:
Necessary and Sufficient Constraint Qualification. To appear in
Applicable Analysis.
3U
tlitMm^ämi . ..—..r-^--^...- -:..-..-

■T— .1. ■■■>■.■
^^^■SPipPWIppipww-w" *-'" '"' ' i " IWIJ ■,. , ■ mmm i ' M
J" ' M I II ■ ! l ! H.»^|.i I,,IWII IJI,!,. 4_,
A. V. Fiacco and G. P. McCormick (1968) : Nonlinear Progranming and
Sequential Unconstrained Minimization Techniques. Wiley.
F. J. Gould and J. W. Tolle (1971): A necessary and sufficient
constraint qualification for constrained optimization, SIAM
Applied Maths., 20, pp. 16U-172.

M. R. Hestenes (1966): Calculus of Variations and Optimal Control
Theory. Wiley.
0. L. Mangasarian (1969): Nonlinear Programing. McGraw-Hill.
Peter Whittle (1971): Optimization under Constraints. Wiley.
55
mm mmmnmmmimmm iy IJ.IMUPI.B^^WBW^W^^^P—»I .ILL» »JH..W ."»I''»-u , i MIIUIÜJ»!)^
II. Descent Methods for Unconstrained Minimization
56
nnîiiMiiM^Mi—WîaiMii j^l^^MAaBiaftJl • ;-' -M ■■ i ■ ■! nil - ■'-'-—*- '■-'^

ipvMPwmiiiivPimm T*'***rr*^^?***mm*v*!rwp*rrf^rm'w^
——— ■■■—i
1. General properties of descent methods.

The class of descent methods for minimizing an unconstrained
function F(x) solve the problem iteratively by means of a sequence
of one dimensional minimizations. The main idea is illustrated in
Figure 1.1. At the current point x a direction t is provided, and
the closest minimum to x. of the function

~i
Gi(\) = FO^+M^)
sought. At x. , we have
G[i\) = ^(Xi+iî = 0 (1.1)
where x... = x. + \.t. .
7F(x.)
Figure 1.1
Definition; A step in which x. 1 is determined by satisfying the above
conditions is said to satisfy the descent condition. We consider t.
a profitable search direction if F(x +\t.) decreases initially as \
increases from zero. This condition is formalized as follows.
57
Mj y- ■■r^tafoül*fraiü1n-rir ^ütfemmmMs&a^&äimm* I iM -^'-'•-VJi--"SiAtfiiaänHimiiin I B

m ^mm mrnmrnmimmimmmmiriim PIWfll^WPBWIIP«IBni,'l»iWt!WIPJBi(»il!!HI|l
Definition; (i) The vector t is downhill for minimizing F at x
if 7F(x)t <0 . (ii) The sequence of unit vectors [t.} is downhill

c
for minimizing F at the sequence of points {x.} if 3 6 > 0 ,
independent of i , suchthat 7F(x )t. < -8(|7F(x )H .
Example: The sequence of vectors t-7F(xi) / ^(x^H) satisfies the u

downhill condition with 5 = 1. In this case we say that t. is in the
direction of steepest descent.
An estimate of the value of \ minimizing G. is readily given.
We have
0 = = 7F
î+l^l î)ii + Xi ii ^îî
u
where x^ = x.+ \.t. is an appropriate mean vulue. Thus
~i ~i i~i
-7F(x.)t. 6|l^(x)ll
(1.2)
LT 2
t^^FCxJt 2„,-
' 117^(^)11
«^ X «MX K*X M«X
Theorem 1.1; (Ostrowski» s descent theorem). Let R = [x ; F(x) <F } ,
and assvune that F bounded below and t 7''F(x)t < K|| t jj , xeR . J
Define
»11^(^)11
X.^T = x. + and
~i+l ~i *i>
=1
^îî ^ ^W^ll ' IfeiH '
for i = 1,2,... where 6 >0 . Then {F(x.)} converges, and the limit
points of [x.} are stationary values of F .
58
mmmmmtämm* , , ■■—!■! ■,
■- ■ — - | - --^-■■->^":-."1
IHIIKIIi.MilMiil.wiigilil i^wymi'wJWWPipppiiiBWiwiff.in.-Ji II. wmmmmmmm pppwwgip I,»I|.I,II»I ,i. [imn.*vw^mtwm*"wwm,wmr.**wws^whimwvr*r^,'v'mfn'm!''«!t
Proof; As {t.} dovmhill then [x } c R . Expanding by the mean
value theorem we obtain
2
SllvF(xi)|| /6ll7F(xJA
1 T72 F _ v
F(xi+1) = n^) + —Y^- ^(x.)^ +1^—r -j H îi
where x. is a mean value. We have
(»IIVF^H)2 J 51^(^)11
^^l+l) ^ F( C )
ii K K )"■
< Fî) (1.5)

K
Thus the sequence {F(x.)} is decreasing and bounded below and therefore
convergent. Further, from (1.5)*
\\VF{X±)\\ < i^K^x^-F^)) (l.U)
-♦ 0 , i -* • .
Thus vF(x ) = 0 if x is a limit point of (x.) . D
Remark; By (1.2) the step taken in the direction t. underestimates

> ft
the step to the minimum of G. . Thus (1.5) holds if the descent
condition is satisfied so that the conclusions of the theorem are valid
also in this case.
Theorem 1.2: (Goldstein's descent theorem). Let R = {x ; F(x) <F }
be bounded, and assume FeC and bounded below on R . Define
59
«MMi 'al^*â''-^—a^-"'- ■>-■•■>■'-'—-^M.1..,,!,|| , .,(,;

wjiifiiiiiWpwwiuiiMWiwipTO^ r
T^-JTa-r^TT-;,--^^:^^.«--.^-.^,^.;-„.,,„„.
Lix-^K) = F^-Fî+î) ,
L^
where ft.} downhill, and the [x } are generated by the algorithm

u
(i) = if 0
h+1 ~i î'**) = •
(ii) If ^(x^l) <a where 0 < a < 1/2

Ü
then choose \, such that o < \|f(x ,X.) < 1-a ,
else choose \. = 1 .
i
(iii) x.^^x^X.^ .
U
Then the limit points of [x.] are stationary points of F .
Proof; A(x.,\) = -\7l)"(x.)t.+ o(\) .
Thus A(x.,\) = 0 => IJVFCX^H 0 ais [t^ downhill so that x.± is a
stationary point. Otherwise 7F(xi)t1 < 0 so that ^(x^K) = 1+o(l)
whence \|r(x.,0) = 1 . Also the boundedness of R implies that J

ä(X.,\) < 0 for some X large enough so that, as T|f(x.,\) is continuous,
\. can be found to satisfy condition (ii) of the algorithm. Note that
fx,} c R . We have a
F{x±) -F(xi+1) = -^(xi,\i)7F(x1)ti
> ^51^(^)11 . (1-5)
Thus {F(x )} decreasing and bounded below and therefore convergent. To
show that the limit points of [x.} are stationary values of F consider
hO
wttmsMluiiUiîtkWHm k i jg^^jjgjgmigUl ^..l,tfXi,:ii,st

pppfjp^Pff-lwiP^JIiJPli!^^ Mf.S^W-^' '--■' ■■ ■r»rff:tWT.-Tf.,-.fTi,T'V-'^',VnTr^7-ty7'IT^Vr ^mnlRmBWM^>ffB!IIVn^*l*n9FaWI\,V<UMIWIIPlUU ■j w iiW .wppqwBB
the subsequence {x } -♦ x and assume ||7F(x )1| > e >0 . Then
|bF(x )!| > e for i > i0 . This implies that inf \ = ^0 > 0 as
" -P-i " 0 i î
otherwise sup \|f(x ,\ ) = 1 contradicting ^(x.,\.) <l-a . Thus
i -M-i M-i -.i i -
FCx, )-F(X )
-^
\\vfU )|1 < X0a(S
(1.6)
-M-
The right hand side -.0 as i -. « which establishes a contradiction. G
Remark; There are two aspects of this theorem which are of particular
interest. (i) It is necessary to assume only that feC in R .
However, the boundedness of R is used explicitly. (ii) The algorithm
for determining the step length \. is readily implemented. A value of
\ satisfying condition (ii) of the algorithm will be said to satisfy
the Goldstein condition.
Theorem 1.3; (i) Let the vector sequence in the Goldstein algorithm
be defined by
s. = -A-VFtx.)1 , t. ^s./HsJI (1.7)
\ . (A.)
where A. is positive definite, bounded, and X(A.) = r rrpr > OJ >0 ,
mcuc x
i = 1,2,... . Then [t.} is downhill with constant 6 = ou .

* -1 2
(ii) Assume that {x.} -x , and that ||A. -7 F(xi)|l = o(l) , then
\. = l|s,|j satisfies the Goldstein condition for i large enough.
(iii) The ultimate rate of convergence of the algorithm is super linear
for this choice of \. .
hi
--^ ^--^^^^^--—-u^...-.^^...-^..^^.-- .■^^.,..^.^-J.....-....^.-,.-.-.1.l^^^^,-..,.:;i^Ww,.:- , ■■,.--■>■■■ |J*gmaMAjJ| ^^Î.^.-L.-.^-.^.-U.:.^:...

^w^niiiLiiiinmiii.ijiii.jig.in^nipi , mWmn<invx9W)m»immfmmmm>mmßf*m''?«'m.mßmwmmmfm
Proof: (i)
7F(xi)Ai7F(xi)T
=
^îî
Xmln(Ai)ll7Fî)ll
< - cullvF^)! (1.8)
u,
(ii) ^(x^M =
Tirtiere x. is a mean value dependent on \ . Now, writing

J
1
V^Cx.) =A" +Ei
and noting that |1E. || - 0 as i - • , we have

ü
x El
/ ^ -, / ! . ~i h
so that
E
1
i !i
\^*±'V - t " 2^ I ^ if^ a)ll7F(x~)||
llEjl llAjl
- 2 s. U)
(1.9)
U2
. .^ _ . . ,„■.„...^■■■..;-^...^J..V:^..,i ,.^,..:^^-.,.^.th^;^^,^.^^w^î^^.K^.ft...^^â:^^^rt^^
PPP^^ II ■■Wmiim^lWW. HI»!!!.!! I,,^., nilJI . ^F^^,
'»l"i..!i.^.»i.—M"W»^'.Hil"».i.H»H|l»|J||iii|i|»Wt^.ji
In particular
\\\\ llAjl
l^llfill) - -* 0 , i -
2 ' - 2 U)
(iii) Another application of the mean value theorems gives
s. =
VF(-i)T =" Ai(7Fî) -VF^*))T
= -A.(72F(x)(x .x*) + o(||x -x*|l))
X «* X **X A# f^X ^M
- (*i-**)+ ^llî-x I (1.10)
Thus
X.^T-x = x. +7.s. -x
~i+l ^ -i i~i ~
= (l-r1)(xi-x*)+ 0(11-^-x (1.11)
From (1.11) the choice y. = 1 iK - ||si||) gives superlinear
convergence. D
2
Remark; Theorem 1.5 shows that if y F is positive definite in the
neighbourhood of an unconstrained minimum, then it is possible to have
algorithms with superlinear convergence without the necessity of satisfying
the descent condition. It is not generally considered economic to compute
the second partial derivatives of F , and considerable emphasis has
been placed on developing approximations to the inverse Hessian using
only first derivative information. Although the steepest descent
direction is initially in the direction of most rapid decrease of the
function it gives in general only linear convergence.
U5
gganMiiaMiiüi ..iJ.^—.i.;..;,,...^.
PIPIPIfim RpppUfpi H'l"1 i WJIinmi II miliuWflü^mmmmwnKIV»"!!"!1.1! W^KVlt T«.'^rpTr7»iJi«»niWf|wIwwwî-Trii^^ ''■,">"'
--
2. Methods based on conjugate directions.
The problem of minimizing a positive definite quadratic form is

u
an Important special case of the general unconstrained optimization
problem. In particular it is frequently used as a model problem for the
development of new algorithms. It is argued that in a neighborhood of
the minimum, a general function having a positive definite Hessian at
the minimum will be well represented by a quadratic form so that methods
which work well in this particular case should work well in general.
Let F be given by
F(x) = a+bTx+ | xTCx (2.1)
where C is a positive definite, necessarily symmetric matrix. We have
7F(x) = bT + xTC . (2.2)
Consider now a descent step from x. in the direction t. . The
descent condition gives
0 = =
^î+lî *i(Cî+î)+y
whence
where g. = 7r(x.)T . To calculate the change in the value of F in
a descent step we have
hh
liiitefflMiJiiiiiititiiiiy^^ ^^^ftto»^.L^^.^^--^^-:^..tJ^----^..^--^^-...^.1^^J ^. „ w ^..■^.j,.^-.^^ .......... a^.V.^ -IJ., „ . -.^îl^Ji

p..|i»ii.|iLWii,ii»iiiWiM.MW»TpilW|!fWipfypw^^ inii.inimmwT^wuMn11"!"»" i'i\j''\iini'.!.m«miwn»>m.uri\i'.*'mrrr*w-mw*m*'i™'m'*vr™?i.
F(
ii+hit) ■ F(Ji) - N *T !i+\ iic ii+1 *•! -ic -i
îjlVl^lfl0*!
1 (gl îj (2A)

2 T
t Ct
c
!i :i
Example; (Linear convergence of the method of steepest descent)
Let F =|xTCx . Then (2.U) gives
! (x^C2x)2
î+i)-î) = -i -7^37-
C
~i -i
T
^2
1 (üi gij
f
2
/iC!i
where w. = Cx
We have FCx^^) JwTc^w. so that
(
F(xi+1)
,
j- —
1
,-»
*I ll'2 F(x.)
2
wTc w, ^C'-'-w,
The Kantorovich inequality gives
(J w)' Ug g
l n
T 1
w C" w w C w T
(a1+an)£
where a, and a are the smallest and largest eigenvalues of C
respectively, whence
^5
itrtWmMiiirilM^ irmniitriiBMti-JiiWiin)^^^
*""" I UPPmp^Pmipiy l Wm\\V iHiPi»|Pii.|||^^jîi||Wil»»l'Wi|,¥ITI"''J'! »»''h"'.T'ITrt,',l''«l'!?'l'|l'^»nri.i i| 'm.lim»<MHMI ■'" '■' |J''i'-l''''W»,«n.'in|.r>f^"gWtr^-CT^,r.wi^^
F(xi+1) <
Vli H*±)
a +(T
n l
which shows that the rate of convergence of steepest descent Is at least
linear.
To shew that It Is exactly linear consider the particular case in
which
x. = oc v.. + or v
„1 1 „1 n „JI
9
where v, and v are the normalized eigenvectors associated with o.
and a respectively. We have
+ i+1 i
x.^,
~i+l -- oi1 \~1 + an ~nv = (1
v - l.a,
î I7)aj-1 ~1
v. + (l
v - x
ft4a )ci! v
i n' n ~n
with (from (2.5))
C^2 "l- (an)2 < O

v v
1' 1 n7 n
so that
v
^ 1' 1 n7 n
and
0
a'i+l
n
In particular
a,i+1 or a:1-1
n
a'1+1 CC a'1-1
he
—- - ■ ^. ..-.„:...;.■ ....:,. ■ ■ aMiiBlüiAgl

^Fî.^p^yy mmmmmm^mmmmmmtmtmmmmmim .mm . ppupn
J. /î2'
so that the ratios oct/a assume Just two values for all 1 (depending
on 1 even or odd). Now
and
so that
a:
1
aX ai+1
"n n < 1 , and 7 is independent
where 7 = min ( +1
1+
a:
1+
4i+1
or a'
n n
of i . This inequality shows that the rate of convergence of steepest
descent is linear.
Definition; Directions t, , tg are conjugate with respect to C if
t^ct2 = o . (2.5)
In what follows it will frequently be convenient to speak about a
hi
I
ma - ■
W^W^WIIW^WW^<^^^^^^^ff^Wi»»..i»li|».WJ 1 I III ■! »um ill ■■■■..■ i -"" "—
'direction of search* without intending to imply that its norm is

unity. However, the null vector is excluded fron any set of mutually
conjugate directions. It is clear that any set of mutually conjugate
directions are linearly independent.
Example: The eigenvectors of C are conjugate. The property of being

both conjugate and orthogonal specializes the eigenvectors.
Lemma 2.1; Let t.,...,t be a set of mutually conjugate directions

«J- *****
x
(with respect to C ). Starting from x, let o»xy ••*5rH.i be points
produced by descent steps applied to (2.1). Then
g"? t = 0 , j - l,2,...,i-l . (2.6)
T
Proof: The descent condition gives g. o =0 so it is necessary
only to verify the result for j < i-1 . We huve
=
§i!s î+?T!s
= 0
Coronary 2.1; The minimum of a positive definite quadratic form can

be found by making at most one descent step along each of n mutually
conjugate directions.
U8
■ -^ .. . -.
-- -■ - -
Proof; From Lenma 2.1 we have T t
gV, =0 , i = 1,2, ...,n .
Thus e^1 is orthogonal to n linearly independent directions and
therefore vanishes identically. G
Remark; A method which minimizes a quadratic form in a finite number

of steps is said to have a quadratic terminatlop property.
Example; The sequence of vectors
t. = - g. •«• "^ A t , i.2,...,n (2.7)

1 H 11
- liîf-
are conjugate. The algorithm based on this choice is called the method
of conjugate gradients.
We now consider the generation of sequences of conjugate directions
to provide a basis for a descent calculation. To do this we note that
the minimum of (2.1) is at x = -c' b so that if we minimize in the
direction t = -C' (Cx. + b) = -c'^7F(x1) then the minimum is found in a
single step. In general C~ is not known in advance, so that we are
lead to consider processes in which each step consists of two parts
(i) a descent calculation in the direction
t± = - H^ (2.8)
where H. is the current estimate of C" , and (11) the calculation

of a correction to H. which serves both the purposes of making the
t. conjugate and making H. approach c" . It is convenient in what
follows to assume that the H. are symmetric. This seems a natural
condition given the symmetry of C but is in fact not necessary.
U9
-- - ■ -— - ~- ■ -
PfWWWB'.'i'n )> ri.i.-i,^,'"i."^»M>M"uî''»'j."^v;'-M,'Hr*'»^^^
If we assrjme that t , s < i , are mutually conjugate then the

condition that each be conjugate to t. is
*icis =
SiHic~s = o
' 8<i
'
and, by Lemma 2.1, this is certainly satisfied if

u
H C = s<i
i !s Psis ' •
We write this eqxiation in the equivalent form (raultipQying both sides
by xs) u
6 i {2
«i^s = Ps^s ' < ' ^
where
i =î+iî ' li =
!i+i-!i * (2 10)
-
Consider the symmetric updating formula
where i* > \ t Q- are to be determined (or prescribed). We have

Hî y« = H-i y« =:
Pc dB ^ s < i , provided (2.9) holds as J
!^s = hhficls= 0'^ iiE±is = ^5:TiHic!s = Ps^îis =

a \ X-äPci = 0 . Thus (2.9) is satisfied for i := i+1 if
0 = 1
*\{l±E±li)'tx((S.l±) ' (2,12)
and
= l KJ
Pi i^^-îHi^ * (2 15
- )
If i. and TL are expressed in terms of p. and ^. fron (2,12) and

(2.15) we have
50
<äm*m*mm*ämm****~**^^.^..-..- -...--■ ., . ■M . ^1— ..I.: .-.. ^ ., ..... u ...-.- ■.■^■^,.„ ■ ■■ ,j|

mmimmmmmmmmm mmmmmmm '!^mmir^^mmw'^w'v["-,m>'''",'"''mmmmfmwmwwwv-''vi.*w>H'f'' WPWWWIIW »üPWWWPIW!
T
d y
i i
\ = - T c<l
b
T
H
yi i5:i 5:1 Hi 5:1
and
p y H y
i i i i
ip W ill '
so that equation (2.11) becomes
d d H y y H
i i i i i i
H
i+1-Hi Pi.T
!iyi y H y
i i i
+ + H y y H d y H H y d
k{ i^!iS
y
i7^-
y n y
i i i i- i i i- i i i
„i i i i i
= D(pi,Hi) + CiTiviyi {2.1k)
where
(2.15)
:i -n Ti "i a
and
y H y
i i i • (2.16)
\ T
=
Example; The particular case p. = 1 , C 0 * i = 1,2, ... gives
the variable metric or DFP formula which is the most frequently used
member of the family.
The class of formulae described by (2.110 generate recursively a set
of conjugate directions so that the first of our aims is satisfied. It
still remains to show the relationship between the H. and C- . To do
51
-.>■...,^■.■■..^-„■■.,^ ....-., ..^.„...w.v..^.-.:-^..^. ;.„^......^

iipil,l^|WW^PjiW,li|U||iW^
^^^^TT'i'l II ""ill ii mi ii ii)|i ■mi iijîmiijiij H j-
this note that (2.9) can be written (introducing the symmetric square
1/2
root C ' of the positive definite matrix C ).
C^^C^C^tg = pßC1/2ts , s » 1,2,...,1-1 ,
or, more briefly,
= Cl
Kls sL ' s = 1,2,...,1-1 . (2.9a)
t u
Defining the matrix f by <.(T) = —=—~s ■, nr , 1 = 1,2, ...,n , and the
1
(tTCt )1/2
diagonal matrix P by V±± = P^ > 1 = 1,2,.. .,n , we can write (2.9a)
in the case i = n+1 in the form
Ö^f «TP . (2.9b)

n+l
Now T is an orthogonal matrix so that ü
A AT
H^T = TPT ,
whence
1 2
H^, ^C-^TPTV / . (2.17)
n+l
In particular, if P •= pi ,
Hn+1 = pC"1 . (2.18) D
Remark; Remember the motivation for developing the recursion (2.1^) is
the search for efficient descent directions. Specifically we are looking
not only for conjugate directions but also for good estimates of the
inverse Hessian. This indicates that p = 1 is the natural choice (or
at least p = constant ), and almost all published methods use p = 1 . j
However, from (2.17), the choice of p variable may well have, scaling
advantages in the Initial phases of a computation with a general objective
function. Presumably the strategy for choosing p should make
p -♦ constant to ensure a fast rate of ultimate convergence.
52
--■■-„■.... , ■-^nüiniftîi-umfii
MMfcMMhiBttaMli MMmMMMMBMyai I ■iifi
..■-■_. i^îi-^.^^^^ iîwi^ - _^,
p*r',Tr«'*''f'wnww*'»w'fWf^
|^l^|WWyqpilWBgnw^yifTWrwfiWW*ffTBWF"fW
Lemma 2.2; Provided the descent condition is satisfied,
H ornu11
i+3ii+ilî -
Remark; In what follows it is convenient to drop the i subscripts.
Quantities subscripted i+1 will be starred. In what follows we assume
p is constant.
Proof; We have (using the descent condition, the definition of t ,
and d = xt )
T T
dd1 HyyH
D(p,H)g* = (H+ p =5 f^-) f
dy y-'Hy
Hy/HCy+g)
= Hy + Hg -
"T
y Hy
i yd
i (d - -f^- Hy) .
K
y Hy ~
Whence
H g (7 + C-rvg )v . D (2.19)
* *
Remark; (i) The condition that H g =0 when v ^ 0 gives a
condition which determines £ . We have
T* 1 *T „ 1*T*
vg =-rg Hy=--g Hg
so that (from (2.19))
1 (2.20)
C =
Xg Hg
53
.■„^■■^.,^j-..l,rfinj,.w,,-:,^......,....,.■„..-■^■■. ...■ ■- . .
MiaaüBcataitijaatiaMartBBiaiaMMflMaaMliBB ■-•••,: - ■ 1 •;.
^W.^M^^H'IIF ^^'w^wtimmmmmimmmmmm'QmimmmBm 1 i 11
1 . ^; ffw^pBBffmniwgpi Try" iF<rrjT.Twm TCy Wi'JiyTI TfunrnffBrnw^ f TOI- ■»7-T7f»?r j iTJT'^-^vT-^rT.f«- .^^^j—",»,^^-^.^^^^ ^
*
Provided this value of C is excluded from consideration then x is
independent of ^ . Note that this result is true for a general

u
function as no properties specific to a quadratic form have been used
in its derivation.
(ii) We can only have v = 0 with d and g nonnull if H is

singular, and in this case H is also singular and the null space of
H* is at least as large as that of H . This follows from (2.15) which
can vanish only if (a) Hg* = 0 and 1 + -^ = 0 , or (b) Hg and

> i
Hg* are parallel. Now if H is singular a w , vTH = 0 . Thus
T T *
w d = 0 > and hence w H = 0 .
Clearly it is important that H. positive definite =» Hi+1 positive
definite, i = 1,2,... in order that premature termination should be
avoided (H e = 0 and H positive definite =» g =0 whence x is
a stationary point). Conditions which ensure this are given in the
following lemma (due to Powell).
Lemma 2.?; If 0 < p,T <• , H positive semidefinite, and

HH v = v (where H is the generalized inverse of H ), then H is
positive semidefinite, and the null space of H is equal to that of H
provided
y d
(2.21)
c > (dTH+d)(yTHy)-(dTy)2
Proof: We first note the identity
D(p,H) = (l + uyT)H(I + yuT) (2.22)
where
5I+
laynmumgûgîj ngjuî. i g^ ^ m a k ^MiiiMiMaBiiiaMiiiBiaafcnaai^^

wjBMuxuaaasamäjBBsmsuiitmmisiiüiuiMmiS^äau
WiWilllWWffWiiwffîiww^pimiijptiiiPijûw ..,:;-,,^,.-^^
1
u = r^d-^Hy^l . (2.^)
T T
\/(d y)(y Hy) ^ VT J
and
det(l + uyT) = l+yTu = \/-2- (2.2U)
T T
so that, by the assunrptions, I + uy is nonsingular. New y v = 0
so that H can be written
H* = (l + uyT)(H+CTwT)(l + yuT) . (2.25)
Thus the problem reduces to considering H+ ^TW . We have
H+Cf wT = H(I+C1^ vv
) *
* + T
The null spaces of H and H will agree provided I+^TH VV is
nonsingular. The condition for singularity is
0 = det(l+ C;TH+vvT)
T +
= 1+ CTV H v
Noting that HH+H=H,and HH v = v=»HH d = d we have
T T
dy y1Hy
1+ C T V.T1 TI T„+
H+ v = 1+ C T (d1 H d - 2 ^p + ^-f- )
(d y)
= 1 + ^T (dTH+d
T
y H y
and this vanishes provided
y d
C = 3T„+ ,x/ T 1^x2
(d^H^^C/H^-Cy^d)
55
"•■"■'-"
i ii ■ —— - —M—MM
"" ' ''" "J- ' - 'J *mmmm | iwiii m*rmmimmmmmmm»mmmmmmmmmmm~~--
The stated result is a consequence of this and the observation that
decreasing £ below this value will make H indefinite. D u
Remark; (i) The condition on T is autcmatically satisfied if
H is positive definite and the descent condition is satisfied for then

T T T
dy=-gd = xgHg. However the lemma does not require that the
descent condition be satisfied and remains valid even though the exact
minimum in the direction t is not found. In this case the condition

i.i
on T is necessary.
Corolary 2.2; If H1 positive definite, and Hi+1 = D(p,H.) ,
i = 1,2,.., then provided the descent condition is satisfied for
i = 1,2,... then H. , is positive definite.
Proof; This is a consequence of (2.22) and the above remark which shows
i)
that if H is positive definite, and if the descent condition is
satisfied, then I + uyT is nonsingular. D
Theorem 2.1; (Dixon's equivalence theorem). If (i) the formula {2.lh)
is used to generate descent directions, (ii) £. satisfies (2.21) for
i = 1,2,... and H, is positive definite, and (iii) the descent
condition is satisfied in each descent step» then the sequence of points 1.;
generated by the alcorithm depends only on F , H, , p , and x. and
is independent of £., i = 1,2,... .
Remark; It is important to note that F is not restricted to be a
quadratic form in this result.
56
•—^— ^^MlMtmuüm* , in „■■-, ■^.n^„i.1„..,,,8l..l..,,.^.i..,..,...,.,.l,..,.l. „:,..,.,_...■,

—i^.ji_^.,_n.j.i„.-f:iûji
^^^^^W^^' ■|- "l 11 [ BHWiWPPIPWiil"^^'1 PWPWWIPWPPIB WiSiPiBpwwipqippiippfi^pBiiiMpi^ "PWWWW^BywBgUi WBWWT w
Proof; Let 0^^=11-, D. = D(p,Di ,) , i = 2,3,... • We show that
if H. =D.+ad. dT , then H.^n = D.^n + K

ßd.^n d^. . By Lemma 2.2 we
i i -vi-i ' i+l i+1 ~l+l-i+l "
have H = D(p,H) + 7 d* d^ . Now
ddT (D + addT) yyT(D + addT)

D(p,H) =D+p==- + add"1
T T
dy y (D + add )y
rn m rn '"PT ^TPT1
ddT D y y1 D + a(y d) (D y d1 + d y1 D) + ^(d'-y) d d1
= D + p ^- + add
dy y Dy + a(y1d)£:
-Dyy Da + a(yTd) (D y dT + d yT D) - a(yT D y) d dT

T
y D y
= D
y Dy + a^dr
T
* a(yT D y) y d yd
D + (d - -~^ Dy)(d - -=—- Dy)J (2.26)
y Dy + aCy^)^ ~ yDy~~ yDy"
By Lanma ^.2, d ||D(p}H)E . By (2.26) D(p,H)g || D g . Thus
cLJlDjß., j-l,2,...,i ^ d^JlD^^g

11 . But the case j = 1
Zi+l i+12i+l
is a consequence of Lemma 2.2 so the result follows by induction. D
Example; Equivalence results for a wide class of conjugate direction
algorithms applied to a given positive definite quadratic form can be
demonstrated by noting that at the i-th stage we find the minimum in the
translation to x of the subspace spanned by t-, ...,t. , and that this
sub space is aleo spanned by H-g-, ...^H-g. . Thus x.+1 depends only on
57
,x
■ JMMJMlliiiiliilifiMiiMfciiMttiihaiaiaaaaaaa
PPPP^WW'^'-'^MiMiitJIÎipiJJJluipii.i.^ wmimvwmH'vwm'i'n^^W'*r^'>'m-.f™rw'"™^"wm*'
x , ...,x and not on the particular updating formula for the inverse
Hessian estimate. If H, = I this equivalence extends to the conjugate
gradient algorithm (2.7).
Lemma 2.U; If the descent condition is satisfied at each stage then

the sequence g?D. g. , i = 1,2,... is strictly decreasing provided D1
A«! X **x ^
u
is positive definite.
Proof; We have (as g d » 0 )
u
(g^Dy)2
g D g = g Dg £-
V Dy
/ *T„ *N2
(g Dg ) u
gDg +gDg
(g Dg )(giDg)
L)
g Dg +g Dg
Thus
O
(2.2?)
«T # # -»(T^ ♦ T^
gDg gDg gDg
By Corollary 2.2, the D. are positive definite so that the desired

u
result follows from (2.27). G
Remark; This result indicates a potential defect of the DFP algorithm.
For if the choice of D, is poor in the sense that it leads to too
small a value of g? D, g, then the algorithm has no mechanism to coi. «ct
this, and must initially generate a sequence of directions which are
nearly orthogonal to the gradient. This must also happen if, for any
58
mmmm MWMifilitM^fihtiirii-'iiiiiiiiMiiaiiiBia^^
^mtm i iilMHIJHIJl.miviiiiii^fiunjiW.MWJ lJ.|.»»t^l'r^»l»»l>TVIIIi»JVW^riHivJ>|»i!FiBa»»lTOWmCT!JrJUü|UlilU|J»lul,i||i|~Mf'ii|.Mt'iMi;i^ wiror-
reason, an abnormally small value of g Dg is generated at seme stage.

A possible cause of such behaviour is poor scaling of the problem.
Lemma 2.5; Corresponding to the formula (2.lU) for updating H there

is a similar formula for updating H" . Specifically we have
H = D(p,H) ■'•+7liWWA (2.28)
where
,-11 - H--
D^H)-" „-1.,_!_.
1 x yy
ü
* i^r+V.)^ 1 /..^„-1^-l^JS
1 (yd H^+H'-'-dy1) , (2.29)
p
dxy d'y
T -1
diH ■'•d
V- = " " ' (2.30)
T
dy
w = y - — H^d , (2.31)
and 7 is related to g by
7 = - XllL (2.32)
TX -1
1+CTV H v
Proof: From (2.22) we have
■XM)-1 - (i -sffyJ) H-Î -7i 2 f)

and (2.29) follows from this by an elementary calculation. From (2.25)
K*-1 - (I -v/f yuT) (Hn^vf)"1 (I -y/f uy1) .

1
- (I -x/i^KH-
v
LL_^„-1VA-1)(1 -/Tuy^y
<" - - i+frv^H ^T "- y p -- -
(2.33)
59
iaiMiiiiiiiiilHiiit ''M'M'M«»^*1"*'**'*'"^^ 1 ...„.A ■■'*-'■"*---''-—■'-
ipillSPiPPIIII^^
Now
« -^w , (2.5^)
so that (2.28) is a direct consequence of (2.53) and (2.3^). D

u
Remark: If we take 7 = - -J- in (2.28) then we obtain
yd
i 1 1 yyT H^dd1^1
p yd d H d
-(I+zdVÎ+dz1) (2.35) Ü
where
z = (2.36)
We have
D(p,H_1)"1 = G(p,H"1) + -^wwT

yd""
= (l+z d1)^"1* -|-wwT)(l+dTz) (2.37)

~ ~ yTd ~ ~ o\ m
as dTw = 0 .
60
- -»-'-^-'--i--..-^^vV-.^^...^.^>!-^-^.--^^-^-'- ...^^.U^J^M«!
' ''I 'I Wi i'.iiiwi"» W!MWglM»»ff,JiJIIIII,,ll||J|lH|l|4luitl|YuWiiiiiWu.u,ll|U,iilull|l>»J,ilui|^^ «■;'."-"»
To summarize these results we have the following:
(i) D(p,H) = (l+uyT)H(l + yuT) >
^(p^H"1) = (l+2dT)H"1(l+dzT)
(ii) DC^H)"1 = GCp^"1) + -jp WWT ,

y1d -"
GCPJH
-1
)"
1
= D(p,H) + ~ vv7 .
> update foraula update formula for Inverse
T T
dd1 Hyy^H yyT
1
D(p,H) " > p m ■ ijf H" * (7+^)^
p
s- (yd'H^+H^dy1)
dTy T
d y -"•
******** \
, .y/ H^dd1^1 ^ n tn m
GC^H" )
1
H+(p+T) «a i- (dy^ + Hyd1)
dy dV ','~ "" j
D(p,H) , GCp,!!'1) have been called dual formulae by Fletcher.
Lemma 2.6t Let A be a symmetric matrix, A = TATT where A diagonal

iff
as
(AIJ X^ »
i = 1*2, ...,n) , and T orthogonal. Let \^ f i « 1,2, ...,n
T
be the eigenvalues of A+aaa , then either cr > 0 and
Xj. < X* < Xi+1 > i = 1,2,.. .,n , or o < 0 and Xi.1 <\^<\ •
61
■MM mm - -»" ■■ - ■ •
JIJJ,.JJJililW|W|liliJlii.WJflW.iu.,IIIIMiilpp.yii>(.iiia,ii,t. i ,. , ,,.i .■Wn,p^.l.WyMpM,t »„ mmimmmm — ._
Proof; We have
det[A+aaaT-\I} = 1T(\-M
1
det{l + o(A-\l)"1(TTa)(TTa)T}
"- i=l - -
= ft (VM
1
(1+ <y(TTa)T(A - XI)"Va))
i=l
n 2
i
1=1 &. V^ ii-
and the desired result Is an easy consequence of this expression. D
In the following theorem we consider specifically the minimization

of a positive definite quadratic fozn. We assume that the initial
estimate of the Hessian H, is positive definite, and we make use of
the following sequences of updates for the current Hessian estimates
(a) H^-DCP,^) , 1-1,2,..., and
(h) H1+1 - G^H"1)'1 , 1-1,2,... .
Further we do not assume that the descent condition is satisfied.
Theorem 2.2; (l) Let K, - C^ VL.C1* , and let the eigenvalues of

K. ordered in increasing magnitude be M ' , J - 1,2, ...,n . Then
if \W > p then \^ > \^ > .. • > p , «hile If M1' < p then
p for
^ - ^ ^ ••* - ^ ' 1'2'**-»n * f11) !•«* Kj. - C^2HiC1^2 ,
and let the eigenvalues of K. be ^^ , J - 1,2, ...,n . If M' > p
then fä > }Sf) > ••• > p » while if ffl < P then
\^ <^ < ••• <P for i - 1,2, ...,n .
62
^M . i *i m 1^.———
■ IIIJII m I ■;.. ■ ^WL^jiwî« mv
I . MI
•'•■"• p—rn ^ i i j-L IIJ^JJÎ . i , i-.iij umigii
Remark; This result Is Important because it shows that we have a
•weak' convergence result for these Hessian estimates when minimizing a
positive definite quadratic form even «hen the descent conditions is
not satisfied at each step.
Proof; Noting that C^ d = C~^'y = a,we can write the formula for
updating K as
T T
aax Kaa K
K = K+p T T
a a a Ka
We can break this into the two operations
T
J = K - and
axKa
aa
J+p
a a
Note that J has a zero eigenvalue, and that a is the corresponding
eigenvector. By Lenma 2.6 we have \(J) ■ 0 , and X., < X (j) < \
for J « 2,3j...>n . The rank one modification which takes J into

K changes the zero eigenvalue to p and leaves the other eigenvalues
of J unchanged. Assume that X.(j) < p < ^S+TCJ) then reordering the
eigenvalues in increasing order of magnitude we have XT ■ Vo-i (J) »

x
k « 1,2, ...,j-l , j - P » ^k " \(J) ' k-J+l,...,n. This
establishes the first part of the theorem. The second is demonstrated
*-l
in similar fashion by noting that K satisfies a formally similar
update relation. This establishes the result for the eigenvalues of K
and hence for their reciprocals. D
63
■-•■■--•
fffyP^^g-l^f ^^WI'^WM^^Jilll^Pg^Pg^^W*1 ^-T-.'?^'>'■''.W, ■ F^WSg»!!- i*..mAjg^psj^BBPgg|ffW^!WWPW»WWrB?Bf!' -t^^^^^-v^^rWi^H^'^V^^^^t^^^^^^^ ^r;r^^Trs7P'^^^?/^y^?JiL^.M^.''^^^yp*''.'!^.!i.ij'i'fft>iP'^
Remark; Note that both H. and H. t*re positive definite i = 2,5».
if H, is positive definite. In this case the result does not depend
on the descent conditions being satisfied.
Theorem 2.3: Let H be positive definite, and consider a step d in
the direction -Hg . Let H* = D(p,H) , H = GtaH-1)"1 =
D(p>H) + -J- vvT, and H0 = eH+ (1-9)H* = D(p,H) + -^r vJ1 . Let
yd yd
*
K = C1/2HC1'2 , and define K" , K , IL similarly. Let the eigenvalues
* *
.9 respectively,
of K , K , K , IC be \. , X. , *... , and X..
9
J=l,2, ..,,n. Let 0<0<1. If X. >p then \ > ^1 > ^^ > ^^ > P
6 ,9
while if X. < p then >**<>•* <K <>**< P • If ©^ [0,1] then \.
need not lie in the interval defined by \ and p .
Proof; It follows from the definition of H , H , and HQ and

* 0 *
Lemma 2.6 that ^1 < ^-4 < î > J = 1,2, ...,n , provided 0 < 9 < 1 .
The first part of the result is now a consequence of Theorem 2.2. To
Q
show that \. need not lie in the interval defined by \. and p ,
consider the example
1+e /E 0
C = H = I , p =1
/E e 1
We have ^ = T] , \2 = 1+ 2e - 11 where 7) = | (1+ 2e -Vl+p-e ) . Thus
T) is positive and 0(e). In this case we have
aT Ka V/e
,1/2
) i v = a —1 v
Ka =
K = C T =
aT a 0
61+
M^MiaîiitariJiiahäatoitettiiiiiaaMi11 ni ■ - t.:-..^.-..^.^^ ■.aflty ■■ ■■--wi'"- -'.,*^- --^-"^^-^^fc' ->-"^■■J.,.-.--..^^.i^.^â^t^^.

wmmnmm , y .....I.IU...II...
WCTÎWiWltPippilipiWI!iyp"WWgWW^fwwwaBPPPPBW»q!P^w^^
It is readily verified that K '{% °].scttet Kt\l \\

1+ 2e In
IL. = 0 i • ô*11 cases eigenvalues lie outside the
prescribed interval. In the first case we have 0 < 1] , and in the

second l+2e > l+2e-1| .
Remark; This res-ilt shews that fi' gives the best improvement in the
eigenvalues < p , while H has a similar property for those > p .
This suggests an algorithm in which a choice is made between updating
H to Ö or H depending on some appropriate criterion. Fletcher
suggests that if T > 1 (that is, yTHy > ^C^y ) then H
MM M t»0
should
be used, while if T < 1 then ft is chosen. He has used this criterion
in an implementation of Goldstein's algorithm, and has reported satisfactory
results.
65
MMMMi ■ ■
Wm^m, UliJ|NIJ)l.iyiimiWjpypjiWBJjJBj,,im w _
pi WjWWUBWWHW ■ "". "I;;".-, IIIIUI Dip,!
Notes
1. For Ostrowski's theorem see his book 'Solution of Equations' vJ
(2nd edition) or Kcwalik and Osborne. Goldstein's theorem is from
his paper 'On steepest descent« in SIAM Control, I965. Theorem 1.5
is abstracted from Goldstein and Price, 'An Effective Algorithm V'
for Minimization', Num. Math. I967.
2. For background material see Kcwalik and Osborne. The form of the
update for the inverse Hessian is due to Powell 'Recent Advances in J
Unconstrained Optimization' to appear in Math. Prog. It is a
specialization of a form derived in Huang, 'Unified approach to
quadratically terminating algorithms for function minimization', .)
JOTA, 1970. The form (2.1U) and the result of Lemma 2.2 are
probably due (in the case p = 1 ) to Fletcher 'A new approach to
variable metric algorithms', Comp. J., 1970, and Broyden, 'Convergence
of a class of double rank minimization algorithms', JIMA, 1970.
Lemma 2.5 is due to Powell (to be published). The product update
form (2.22) is due to Greenstadt (to be published). Dixon's paper
containing Theorem 2.1 is to appear in Math. Prog. The significance
of (2.27) for the successful performance of the DFP algorithm was
noted in Powell's survey paper already cited. Attention was drawn
to the dual updating foimulae by Fletcher. This material together
with Theorems 2.2 and 2.5 are included in his paper already cited.
References
C. G. Broyden (1970): The convergence of a class of double rank minimization

algorithms, J. Inst. Math. Applic, 6, pp. 76-90.
66
MWta ■-- ' l

-'--
-"' ■ ■ i 11 - '
WPWP|IPWPBiip^^wwimwi|8wiMi.iii flUiiNjw^jiJWWiiiwiwiim^ww^^
R. Fletcher (1970): A new approach to variable metric algorithms,
Computer J., 13, pp. 317-522.

A. A. Goldstein (I965): On steepest descent, SIAM J. Control, 5,
pp. IU7-151.
A- A. Goldstein and J. Price (1967): An effective algorithm for
minimization. Num. Math., 10, pp. I8U-I89.
N. Y. Huang (1970): Unified approach to quadratically convergent
algorithms for function minimization, J.O.T.A., 5, pp. k05-k25.
J. Kowalik and M. R. Osborne (1968): Methods for Unconstrained
Optimization Problems, Elsevier.

A. M. Ostrowski (I966): Solutions of Equations and Systems of Equations
(second edition), Academic Press.
M. J. D. Powell (1970): Recent advances in unconstrained optimization.
Report TP.JOO, AERE Harwell.
67
MilMgiaaillftillgCMaam, ni,, n.....».-....^..--^.,.,....,,, „...,...■,... :.,.

»jniV-W^^Mf»ffwnwTWre,.r^.;i,i.i, ,,W,I;IW^
APPENDIX Numerical Questions Relating to Fletcher's Algorithm

Ü
1. Implementat ion
In this section we consider two questions relating to the implemen-
tation of Fletcher's algorithm. These are
(i) an appropriate strategy for determining \ to satisfy the
Goldstein condition, and

(ii) the use of the product updating formulae for the inverse Hessian ~
estimate.
In his program Fletcher uses a cubic line search to determine \ . Here
we use a somewhat simpler procedure which has the advantage of requiring O
only additional function values. Also we work with the Choleski decompo-
sition of the inverse Hessian estimate. This has certain numerical
advantages which have been outlined by Gill and Murray^ . In particular, O
it is possible to ensure the positive definiteness of H. , and this can
be lost through the effect of accumulated rounding error when direct
evaluation of the updating formulae is used. Another possible advantage J
of the Choleski decomposition is that we can work with an estimate of
the Hessian (that is H" ) rather than with H as division by a triangular
matrix does not differ greatly in cost to multiplication. We felt this -'
could well be an advantage in problems with singular or near singular
Hessians, in which case H would be likely to contain large numbers.

1
1
To Implement the line search we note that by Theorem 1.2 we should
test first if T|f(x.,t.,i|s.||) = \|f(x ,s.,1) satisfies the Goldstein condition.
This requires the evaluation of F(x + s.) , and this, together with the
_7 ~—~ '
-' NPL Mathematics Division, Report 97, 1970.
68
i^lj-Mln.^^.-,,,. ,.. .,.,_,^ .......1,. -,.,....^ M:...v„. , ,

W» »miiufiii.ninimiijii .11 l|l ■»IIJUII..P l|l|ll<ll|i|l|,^HIIJl||lt.lll.liIHMI».-«!IH'll".»lli»Vl^r»'1llilH»WWJ.II".|l,Mmi-'!illl, |.|Hi»»ll,VI.>H»; ■), .fflliliinilii i»1|tt.)i||l.;il|,.g|lii|l..T^.l.p1|jWi|ijlMff. i a^|.i,|ir,l.,.....|..,.^;rnM
known values FCx.) and F^x ) = 7F(x1)8i , gives sufficient information
to determine a quadratic interpolating polynomial to F . We write this
as
P(\) = F(x,)+Fl(x,)\ + A\c (A.l)
where A is to be determined by setting P(l) = F(x. +6.) . This gives

wX MX
A = F^+s^ -FO^) -F«^)
= F'(xi)(t(xi,si,l)-l) • (A.2)
The minimum of P(\) is given by
(A.3)
To test if this is an appropriate value we compute i|r(x.,s.,\) . This
gives
^(^fiA) = 2
i 2i \/
+ 1 l^vî)
1
\? =:—
(A.10
where \ is a mean value. Thus, if F is quadratic and

ilf(x.,s ,1) <CT then \ given by equation (A.3) satisfies the Goldstein
condition for any allowable a (normally a is chosen small —
say 10 ): For nonquadratic F the test is satisfied if the relative
error in estimating ^ F'^x^^s.) by A is not too large.
This analysis provides the basis for our method which is given below.
Algorithm
(i) Calculate \\B^\ , set w = min(l,||si||) , \ = 1 .
(ii) Evaluate \|f = i|f(x.,s.,\) .
69
v
MafcMjj.|-a,M|ajîJia^yHB||.i ^^ ,■,, ■...^^.., .,...^.... .,,J..,.
wmmmm ■■■■.nijiiiji..,.,!„„.„ mmmmmgmmagmm
(iii) If \ < a then begin p\ = \ ,
go to (ii), end.
(iv) If \ < 1-a 6° "to EXIT .

LJ
If \ > 1 then begin if \ > l/w go to EXIT,
\ = 2\ , end.
+
else \ = .5(^ PM •
go to (ii).
Remark
(i) Numerical experience has shown that the value of X predicted in Li
(iii) can be too small, and that an additional instruction
If \ < s^p\ then \ = B*pX

should be included. A value for s of about .1 has proved I 1
satisfactory (1/8 was ueed in the numerical experiments reported
in the next section).

(ii) It is readily verified that lim +(x ,s ,\) = 1 . Thus the u
algorithm can be expected to return a value of X satisfying the
Goldstein condition unless i exhibits rather pathological
behavior.
We write the Choleski decomposition of H as

T (A. 5)
H = R R
where R is an upper triangular matrix. Thus we require to find R
such that
*T * *
R R = H (A.6)
where H is given by either
70
■MMMlkMIHMa ■UMMflU
—"- ->""'*Mi«i'mrr -»rlHT-irHriHi I ît^^^^^^gj/jjgj^
^fffßwm*&wwwwwww*i*vi\*v***w:;'»ri ww^ mwmfm*ww rsvpy^TT^-^r
.^«TT
(i) H = (l + uy^rRd+yu^ , or
(ii) H* = (I + uyT) {RTR + t T w1}(I + yux) .
The second case can be reduced to the first if we vrite
ftT
rR = RXR+ST w" (A.7)
To calculate R note that

R
RTR+5TvyT = [RTl/^v]
/fFvT
= [RT | /? v]QTCi (A.8)

/TT v1
where Q is orthogonal. Thus we seek an orthogonal matrix Q such
that
r R 1 r.1
R
(A.9)
Q
1 0
/Fv
Let W(i,j,{p,q}) be the plane rotation such that W(i, J, {p,q])A
combines the i-th and j-th rows of A , and reduces A^ to zero. It
is necessary that p be either i or j . Then Q is given explicitly
hy
(A.10)
Q ="n'w(i,n+l,[nfl,i})
i=n
It is readily verified that the zero introduced by each transformation

is preserved by the subsequent transformations provided they are carried
out in the order indicated.
71
- - ■ ■"—'^- ■:■'-■ - ' '■" -'- - —■ ■' . IS. ■ . ■ ■-

ppijppiPiJipffiPBiwiipwpwpiiiww immmmmmmmmmmmmtiMmmmmmiimimmm - Wfmtf^ffwfHw^nfiwwg^ffiigiigfim
Consider now the problem of consti-ucting the Choleski decomposition
of STS where S = T + ab , and T is upper triangular. This corresponds

ü
to our problem with the identifications T = R or ft ,
a = Ry or Ry , and b = u . In this c-ase the decomposition is done in
two stages. Our method uses ideas due independently to Stoer, Golub,
and Gill and Murray.
(i) We determine an orthogonal matrix Q. such that
Q^ = ||a||en . (A.ll)
If we set
^ =J|Tw(i,n,{i,*}) (A.12)
i=l
where the * indicates that the rotation is defined by being applied

m
to zero an element of a vector, then (^S = Q^T+Hal^b differs from
O
an upper triangular matrix only in having possible nonzero elements in
the last row.
(ii) To complete the determination of R we sweep out the elements ;>
in the first (n-1) places in the last row of ^S by plane rotations.
Thus R is given by
.i
R* =Q2(Q1T+lHlenbT) (A.15)
where
Q2 = TT W(i,n,{n,i}) . (A.llO
i=n-l
It will be seen that the updating of the Choleski factorization can
be carried out very cheaply. Depending on the update formula used, the
major cost is either 2n or 5n plane rotations. It should be noted that
72
ffPffWPffff'.^'." ■ ■■li|i!finiîww.i.m»yiiwi,fvi^""-^" BPPIPP W.riT»"1? [Wr'm^mwiVWp*'* "i fT»irmr» W"1* 'WfWW^W-^T^^^miVTW^v/fWim^
yTHy = llRyll2 a (A.15)
is required In the update formula. Thus a can be already available
for S .
2. Numerical Results
In this section we report the results of numerical experiments
carried out to test some of our techniques. We consider four line search
strategies:
(I) a standard cubic Interpolation procedure with \ = 1 as Initial
search Interval,
(II) a standard cubic Interpolation procedure with \ given by the
step to the minimum in the previous line search,
(iii) a strategy for satisfying the Goldstein condition in which X
is reduced by the factor l/8 if \|f < a , and
(iv) the method for satisfVlng the Goldstein condition given in the
previous section.
Product form updating for the Choleski factorization of both H and
H* = G has been implemented, and the results obtained for each are
given.
The problems considered include:
(i) Hilbert: Minimization of a quadratic form with matrix given by
the Hilbert matrix of order 5 . Here
(x1-l)(xj-l)
-■iki i+J-1
and the starting point is given by
xi = - Vi * i = 1,2,...,5 •
75
iMMMrifeaiHAdiawu —■-— -'■" ■kA^-^j-^j-.. ■.-..■ j. g^-mf .. ,.. ■:.....■

■^^^^^W'HWWI.IIIHUIiiaililJ'lll'l—'*—'■■ I
^"""^"" II | ,._ |, . I | .1.11., I I 11 1,1. 11 .1) .11. , JIJI) .1 . I LIU 1 jll HUI,. | .|l HHjlHm.-j
(ii) Banana(n) : The Banana function In the cases n = 2 (the
Rosenbrock function) and n = 8 . Here

n-1
F= E [100(x -x^)^+(l-x)d} ,
and the starting point is given by
x. = -1'2 If 1 odd, otherwise x. = 1
(111) Woods: Here
F = 100(x2-x^)2+(l-x1)2+90(xj+-x^)2
+ (l-x5)2+10.1((l-x2)2+(l.xu)2)
+ 19.8(1-x2)(l-xu) ,
Li
and the starting point is
xT = [-5, -1, -3, -1} •
U
(iv) Singular: Powell's singular function is designed to test the
performance of algorithms on a function with a singular Hessian
at the solution. Here

0
2 2
F = (x1+10x2) +5(x5-xu) + (x2-2x5)^+10(x;L-xu)^ ,
and the starting point is
xT = { 5 , - 1, 0 , 1] . o1
(v) Helix: Here we define
„R = {X
r 2^ 2-ll/2
1+X2} ' , :).
T = if x1 > 0 then 5- arctan I — 1
0) • ■■
, (
if xn < 0 then yr- arctan
J. cTT
71.
:>)
äaÛbUtaÜkim^M^î..^^ ^^..-.i^., ^-1. .^^^.^■||| .„^ .. ..,, ■■■■^■'-■-'■aiiiiimnhBifc» ■■ —üifMB 1 tarn ' r,
P!^P^WW!^^WWP^^^gWWfWP^''l^|i^ll|.,w^^mw'^
and set
2 2
F = 100((x -1JOT) + (R-1) ) + X?
The starting vector is
xT = [-1,0,0} .
Numerical results are given in Table A.l. For historical reasons,
the test for terminating the calculations was based on the size of ||s |1
-8
( \\s. jj < EPS/n with EPS = 10' ). This proved reasonably satisfactory
for all cases except the singular function — in fact in all other cases
the ultimate convergence was clearly superlinear, and the results were
accordingly only marginally affected by the size of EPS. In the case
of singular the convergence test proved difficult to satisfy in most
cases (indicated by * in Table A.l), and these computations were
terminated by the number of iterations exceeding the specified limit.
However, in all cases the answers were correct to at least six decimal
places. There is some variation in the H and G columns. This shows
the effect of rounding error, as these would be identical in exact
arithmetic. The most interesting case is the H column in both cases
of the Banana (8) when satisfying the Goldstein condition. In these
cases both H and G foiroulae produce very similar results until the
10-th iteration at which point the H formulae produce much larger
reductions in F than do the G . However, this progress is not main-
tained and at the 20-th iteration (in the case of the line search
algorithm of Section 5) the H matrix becomes singular and the iteration
is terminated. A restart procedure could have been used at this stage.
The numerical results indicate that the new algorithm is promising.
In general, although more iterations are required, we make significantly
75
. | ........ ^JJ^JIJI . n .. 11 - - ..^ ..-._.

!,-.uM-..^f-inB^.ii.. nmfBuwm^f^m^ *■■ » .»»£. WCT^y '1 ■■■ ■ —Tf-
■P*'.\* *'.»W""^« ■■■JiL ini-i
fever function evaluations In comparison with the routine using a

standard line search. As only one derivative evaluation is required in
each Iteration, the real saving can be considerable. We note that on
the basis of the evidence presented it is not possible to draw conclusions
as to the relative values of the H and G algorithms. Hoirever, that
both manage to produce very comparable results provides some evidence of
their stability.
The program which gave the results presented here is coded in
ALGOL W for the IBM 560/67 at Stanford University. The calculations
were carried out using long precision (lU hexadecimal digits).
A FORTRAN version of the program has been developed at the Australian
National University.
76
-
'J ill,HPWWfflfip»|Pii ■ i. i i. î ^mjm.i !■ . , W^fj p J I IM. 1P>.—. J--_»"H^HHW..-»^h..'.-.-
V fv K 1
g
o
•H
■P
\ H
l
\ «5
\
\3
\ -*
|\« \ CO
V
1 o
\
Ä
\
s\ A Vo\
1 \M
k« k
0)
\1 \ J
cJ
^
•y g V
^ » V \ \
i
\ ! ^
r
s
\
^ \r^ \p \ I
C
\ \ K*
«A N
•H
« \ ^
\ 00
CJ
V V i\3
o
A A §\ §\
%
3Ü
> V V \*
x_^ \ \ \ J
-
\ «o
as V
\ \ s V
\ Ü3
s\ Ö\
^\ S^
*
\ \ \ \* \
\ o> \ "^ \m
o \ -*
\cvi
\ K^ \H
Vy
\ Q
§
v" v v 1
H
II
vo\ -* \
CM \
CVJ \
t- \
ON \
\
s\ t(\ \
\
P
O
M ^ k V \
^
K
V
\ "A \ "^
V
\J O
0)
1 0)
10
1 fl>
vt> \
\
-* \
w \
\i
J-\
^~ \\
A w \
ir\ \
•^ \
\
\l
\
.
5
' H
-P
-P
s.
\
\ o>
\\
\ -*
\ t-
1
\ -*
\\ H
t- \ 00
\ t>-
\*
\ K^
\ t^
\H
V
\ w
\ Q
1 I
(0
y
•H O \ \
•^
O
3 >
«
Pi
\
VO \
\
^\
\
\
l/N \
A R\
\J
\
'*
\
ir\ \
\
\l
i §
•H
I
(U
V*
\^ \ 0
\5 1
^ 1
\H
X \ \ V o
*,
^ VO \ i^\
A R\
\ '
u
04 ! CO 1
s
*^ (
ati
tV 05
1
H j •s
S
m
i i
3 '
-8 j g)
a
*n I v
En
•* O 1
X j
! 1 * 1 m x
77
wm^mmmmmmmm'mmm i^qnpipRpipiininpii. i wm WPPWP^PPWBP
III. Barrier and Penalty Function Methods
78
IX
. ■-- ---
iVURllllllllMU ^BiPffPWBWiBP^ iû'^ff MyWP^^ ''^^^^.^t1 ^/MW'-rwî^Tf^"i^^î!ii^w^^jj}pTT"Tw
1. Basic properties of barrier functions.
Consider the inequality constrained problem (ICP)
min f(x)
subject to gAx) >0 , i = 1,2, ...,m (iclj ,
where we assume (as before) that f-, g , iel , are in C . We also
assume that S = (x ; g.(x) > 0 , iel-,} is compact, has a nonvoid interior
S , and satisfies the regularity condition that every neighborhood of points
of S contains points of S- (this precludes S having 'whiskers1). If
xrS and G.(X) = 0 for some i then it is assumed that x/S„ .

i ~ ~r 0
Definition; 0(g(x)) is a barrier function for S if the following
conditions are satisfied.

p
(i) 0 >0 , xeS . If X closed set, X c S0 , then 0 € C on X .
(ii) 0 -• > gi -»0 , ie^ .
(iii) *<•*-
og. < 0 if g.
v
' c>
i < o.
K
i whtre the p.
K , iel. , are fixed positive constants.
i' 1 ' ^
(iv) Ij^-j bounded on N(x,6) if gi > 0 on N(x,6) .
m
e Example; (i) 0 = J^ l/ßi(x) (inverse barrier function).
i=l
m
(ii) 0 = £ (log(l+g (x)) -log g (x))
i=l
Remark: In the second example the term with argument 1+ g.(x) merely
ensures that the positivity condition is satisfied. It could be

replaced by a bound k. for log(l+g.(x)) on S if this is known. In
79
M^MMU^M
M^l^m^p^^^^^^^^^^^w.ii» ■iil,.. i.-..-,..,■ n.miwmfwtfp ipu-i^nimninijuminiün.i, I.I.IJ»-HJ .,.. »..■■IU,^,,.
practice it is of no consequence. The barrier function

m
0 = T* (k. -log g.(x)) is called the log barrier function.
i=l i 1
~
Definition; T(x,r) is a barrier objective function if
T(x,r) =f(x) + r0(g(x)) (1.1)
where r > 0 .
Lemma 1.1; 3 x = x(r) e Sn such that T(x(r),r) = min T(x,r) .

"" ~ ~ u ~ xeS ~
Proof; T(x,r) is bounded belo*/ on S , and T(x,r) -» + • as x -. öS .

a
Lenina 1.2; Let {r,} iO , and let x(r.) = x . Then
(i) [T(x ,r.)} is strictly decreasing,

~J d
(ii) [f(x )} is nonincreasing, and
(iii) {0(x.)] is nondecreasing.
•J
Proof; Let r.X < r.J then
< fCx^ + r^gCXj)) ,
< fCx^ + r^gC^)) .
This demonstrates (1). Subtracting the inside and outside inequalities
gives
(rj-r^g^)) ^(rj-r^CgUj))
8o
:>
i
—~*—-" . . _
|WI»I...»J jm j. m.m.jupMiu, ,i.u muni11,,,, „. i , u,.,,, ,„,,. u. „j.,,,,.,,,.
which gives (ii). From the first inequality we have
KtixJ . D
Remark; If T(x,r) is strictly convex, then all inequalities are
strict.
Theoram 1.1; The sequence fT^x^r^} converges, and
lijn T(x ,r.) = rain f(x) .

i-«) ~ ^^ xeS
Proof; By Lemma 1.2, fTCx.,^)} is decreasing and bounded below and
hence convergent. Let f = min f(x) , then

xeS
Ttx,^) > f(x) > f
whence
lim T(x,,r,) > f (1.2)
i -• -1 1
Now let e > 0 be given. Choose XCSQ such that f(x) -f < e/2 (this
is possible because of the regularity condition on S ), and choose ^
such that ri0(g(x)) < e/2 • Then
min T(x,ri) < TCx,^) < f + e
whence
llJt» T(x ,r ) < f . a (1.3)
i-« ~
Corollary 1.1; The limit points of [xj are local minima of the TCP.
81
IÎ^^^^^^WWWWPWIHWWilWPPB^^ ,
!! WWWBWPIBHIIIWI
Remark; The generality of these results should be noted. For example,

we have not required S to be Lagrange regular at the Units points
of (x^ .
Definition; Q(x,r) is a separable barrier objective function if
Q(x,r) = f(x)+ 2 ^.(g^x)) = f(x) + rT0(x) (lA)
where r > 0 , and 0. is a barrier function for Si = (x ; g^x) > 0} ,

i = 1>2, • • »jin •
The previous results are readily extended to this case and are
summarized in the following theorem.
Theorem 1.2; Let r. > r..., , i = 1,2,... , and 11m r. = 0 . Then
(i) min Q(x,r.) is attained for some x. c S ,

xcS - "^ "^ 0
(ii) te(V]0} is strictly decreasing, WO} is nonincreasing, and
(iii) lim QÔ = f* » and the limit points of [x^] are local
k
minima of the TCP.
Remark; Given a sequence of positive vectors tending to zero then it

is possible to select a subsequence which is strictly decreasing.
Conclusions (i) and (iii) remain valid in this more general case.
82
-—
2. Multiplier relations (first order analysis)
In this section we assume sequences {r.} i 0 , {x. } -» x . The
condition that T(x,rk) is stationary at r. gives
VTCXj^) = vfC^) + ^ rk ^ 78^)
m
- k
= vti^) - f uj Vg^) » 0 (2.1)
u r Nofce tbat u 0 r 0 k
where i = " k ^- (8^)) • i " ( ^ ^ ' -• '
if i / B0 , and u^ > 0 for i e B0 and ^ > k hy the conditions

defining barrier functions. Equation (2.1) is formally similar to the
multiplier relations given earlier (MP(3.2)), and it is comparatively
straightforward to deduce these relations from (2.1) in certain special
cases. We assume that B. - {1,2, ...,t} , that the rank of the system
of vectors 7g.(x ) , ieB,. is s <t , and that 7g,(x ),...,7g(x )
are linearly independent. We define matrices (Mx) , C2(x) by
'C1(C1) = ^(x)1 , i = 1, ,...,s , and K^C^ - Vg^x) ,
i = 1,2, ...,t-s , and vectors u^^T « ^,...,uj} , u^1 . (u^.1--«Ug} •
Lemma 2.1; If (u } is bounded then the Kuhn-Tucker conditions hold

*
at x .
Proof: Fran (2.1) we have
7f (
!hc)T = C (
i !5k)!tik) + ^b)*^ + 0(rk) • (2 2)
-
The linear dependence of the set of vectors 7g1(x ) , ICBQ , gives
85
C2(x ) = C^x )R (2.5)
so that (2.2) can be written
7f(x/ = C^x^) Cu[k> + Ru(k)} + {^(XJP -C1(xk)R>u^k) + 0(rk) . (2.U)
Provided k is large enough the rank of C:L(x.) will he s (see
MP Remark following Lemma 3*3)» Thus
üik) + Bi2k) =
tcÄ)\(ifk>5'î^)TWi)T+(c2(St)-ci(5t>R5u^) + o(rk)}
(2.5)
(k)
As uA hounded we conclude from (2.5)
(k)
(i) u^ hounded, and
(ii) lljn u^k) + Ru(k) - [C1(x*)TC1(x*)}-:iC1(x*)T7f(x*) .

k -•• ~ ~ " ~
A8 lui } > (u2 } are

bouncl«^ wd nonnegative (at least for k large
enough) this property is shared hy the limit points of the sequences.
Consider subsequences tending to u. , u« respectively. From (2.2)
we have
7f(x*)T - C1(x*)u* + C2(x*)u*
or
u
7f(x*) - E i ^(x*) . (2.6)
- iCB0
Thus the Kuhn-Tucker conditions are satisfied. Q
«
Corollary 2.1; If the Kuhn-Tucker conditions do not hold at x ,
k k
then fu.} , {vu) are unbounded.
8^
mm" pppiipmBpi^n^.wcîTD^îw.iiî, ^-»jpy,-"^ »'.".^ «».;. ».i|;..i..w?w UPPPWBH >m« jmawmngawiwiw
Corollary 2.2; If restraint conclition A holds then the VgAx. )

I»
are linearly independent for ieBn . In this case {u0} is null,
and [uk } converges. If restraint condition A holds then the multipliers
in the Kuhn-Tucker conditions are uniquely determined.
Lemma 2.2; If restraint condition B holds then {u^ ^} is bounded.
Remark; By MP Lemma 5.2, this implies that {u^ ^} is bounded for the
convex programming problem provided S has an interior.
Proof: If restraint condition B holds then 3 d such that 7g. (x )d > 0 ,

i = 1,2, ..^t . From (2.2) ve have
t
i=l
»i VgjC^d • Vf^d+O^) (2.7)
As 7g1(x)d is a continuous function, ve must have u. > 0 and

7g1(xk)d >0 , i • 1,2, ...,t , provided k large enough. Thus (2.7)
gives
f uk ^ *f^+0^ (2.8)
This relation shows that the u are bounded as k -• • . D
Remark; The results of the first section shoved that convergence of

barrier function algorithms can be proved under very fev assumptions.
The results of this section show that valuable structural infoimation
on the problem is available as a by-product of the conputation. Note
v
that the condition that the u. be bounded is a veaker restraint condition
than either A or B.
85
v
-
mm
Bi^^^^^W'^^P|BW^W^^g»W"-H"l'.»l'"U,|g|',-|i'11 miiJi^njlH, HI ipppp i,.iw^.i-A..,;i»..^.,Wr.?^..i;uwiWi....1.L..ii.iii..,,,iii. uijui..... .,>.!. . ,»..,....1.1....,,..,,-
5. Second order conditions.

Consider now a barrier function 0 and a sequence (r.) i 0 such
that {x. } -♦ x , {u. } -» u . It is convenient to assume the following
properties which are satisfied by all barrier functions of practical
interest.
o
1 og^
r
(11) k 2 W -, + BB
* k -• , leB0 . (But see Example U(ii) p. 100
6
i for qualification.)
Lenana ?.l; If the matrices VT^x-jr.) are positive definite for

k > 11 and the 7g1(x ) , ICBQ , are linearly independent then
vT7 l(x*,u*)v > 0 , Yv ^ 0 such that 7g1(x )v « 0 , VI e B0 .
Proof; Differentiating T(x,r.) gives
which can be written
2
r +c D c T r
^^ k) -^5cük) Ä> k i(;k) %i€lTt-BR kMî^V^)
dg 1 0 1
where
86
mmmmmm mmimm |lt.MP:SIJ...JIIM^»»!fm»M^tlj.i,m.i.liw.«WWiW.JU.P»igt^^^
Left
P (5.2)
k = ^^^l^)
then, for arbitrary nonzero v such that (I-P.)v ^ 9
0 < vT(l - Pk)^(^rk) (I - Pk)v
= vT(l - PJP^K^,^) (I r Pk)v+ o(l)
as
The desired result follows from this on letting k -♦ • . G
Corollary 5.1: If in addition to the conditions of Lenma 5.1 we have
also strict ccraplementarity then the second order sufficiency conditions
(the conditions of MP Lemma U.J) hold at x .
Remark; The problan of generalizing this result to the case where the
active constraint gradients are not linearly independent is the following.
In general, when k < • , ranklC^x^ ^(x^ ] > s . Thus
V = ; 76 (
k ^ i ^- = 0 , Vi c B0} c Uk - W ; v - (I - Pk)u , u€En} .
We have
llm Uk - V ' [v; vsîx )v - 0 , Yi £ BQ)

In —
It is not difficult to construct examples in which 11m Vk c V .
Consider C^j^) - ^ , Cg^) - t^+e^x-x )e2 . Then
87
PÜP! ffgTffPg^^jBB^figjWg^^g^.^wm^îitM^Tf^
V. = [v ; e^ v = ej v = 0} = lim V. c V* = (v j e^ v = 0} . The
argument of Lemma 3.1 shows only that v 7 X(x ,u )v > 0 for
v € lim V. .
Lemma 3.2; Let
W = U+7VVr ,
N = tt ;||tl| = 1,/t =0} ,
M= (uj||u|| = lêN1} ,
v=mintTut>0 , a= min u U u , n « min jlvüll > 0 ,

tcN ~ ~ ucM ~ ~ ueM ~
Tl = min t Uu , p = min(0,TO
t€N , ueM ~ ~
then W is positive definite provided
7 > Q ^V^gV . (5.3)
Proof; Any unit vector w can be written
a u + ßt where ucM , teN , and or + ß »1
Thus
/ww « a2uTUu+2ClpuTUt +ß2tT Ut+TQÎl/uJI2
> (^(a + 7ii2 - v) + 2|a|(l -a2)1^* v
>0?{a + yp2 - v) + 2|a|p+v (5.M
88
.) i
' — -. -—.^ - - - t
^HIPIPBilPimPi !>
ffJ'"'""■'»■ »■ '■lP^'lJ,W^<!ViJi|^Wiilaiil^.wrPliiM^Jii»piw,t«l|yi^WJHi|^ H..W..JIHI ■i.-iaiiygwn»
as p <0 . Provided 7 > ^^ (a weaker condition than (3.5))» the

V-
right hand side has a minimum at a = ^= > 0 . The value at
0+7^ -v
2
this minimum is ^ + v which is positive if (5.5) holds. D
0+7^ -v
Corollary 5.2: If W=U+VDVI where D diagonal, and if the

conditions of Lemma 5.2 hold, then W is positive definite provided
2 2
min D.. > *—^ " v . If D is positive definite the result holds
i vy,
provided the smallest eigenvalue of D satisfies this inequality.
Proof; We have
wT VD /w = Oû1 VD V1 u 'C?' llh,w^n

2
> min D..
11
C^W^MW
-
.
i
The result now follows as fron equation (5.^) above. Q
Lanma 5.5; If the second order sufficiency conditions hold at x

then 7T(xk,rk) is positive definite for k large enough.
Proof; We have
V cU = (v ;7g-(x )v ■ 0 , ticB such that u. > 0} .
Thus the second order sufficiency conditions imply that

vT7 X(x ,u )v > 0 , Yvc V such that y /t 0 . From rorollary 5.2
it follows that 3 D0 such that V2t(x*,u*) ♦ ^^^(x*) |C2(x*) lD[C1(x*) |C2(x*) ]T
89
FU^^pgg^ppi^^p!»*» m «ism. ««...j. . | .LU-.L. .-. .u-..,^— ^ TT , ....«^p-, r- > ■"•^ -
is positive definite for D > D- . By continuity this implies that the
corresponding matrix evaluated at x. is positive definite for k large
enough. The desired result follows from this as (Dv)..* -• • *

i = 1,2, ...,t as k -» • . G
Lemma 3.k: If U is nonsingular, D diagonal, and V of full rank,
then the system of linear equations
[U + VD/JX = vy (5.5)
has the solution
x =U"1V(l + M)"1My (3.6)
where M = (V1!! V)"T) provided I+M is nonsingular. A sufficient
condition for I+M nonsingular is ||M|| <1 which is satisfied if
min |u..| is large enough,

i
Proof; The result follows on substituting (5.6) into (3.5)» Ü
Remark; From (5.6) it followB that
x ~ u"1V(v:ru-1V)"1D'1y (5.7)
as rain |D.. | - • .
1X
i
Corollary 5.5; If the right hand side of equation (5.5) is z ,
a general vector, then the solution is given by
x = U"1V(I + M)":1>I(VTU"1V)"1VXU'12
+ u"1(i-v(v;ru*1v)'1vTu":L)z . (5.8)
90
■MtaMBMMkM
|^l^(W!W!IIlp>iB!!P^W«plB»l»|IBWr^WW^ .. i J..1JJ1111111., j.mpi
k. Rate of convergence resailts.

In this section, rate of convergence estimates for barrier function
algorithms are considered. Unless stated otherwise the conditions
imposed in Section 3 are assumed, together with the condition that
llvg^x^H/O, i€B0 .
Lemma h.l: Provided {u } is bounded then
*II2
+ 0(n1aX{llxk-xr,r k||xk-x||}) (lf.l)
Proof; The result follows by taking the scalar product of (2.2) with
x, - x and identifying with terms in the Taylor series expansion. D
Definition; We say that u. is SO(v.) (strict order v, ) provided
(i) \ = 0(vk) , and
(ii) 3 1^ < • and p, > 0 such that k | > ^|v, | for k > It. .
Remark; (lul) gives an error estimate provided the remainder terra is

small. A sufficient condition for this is
f^-fOc*) =SO(|lxk-x*||) . (U.2)
u
This implies that for at least one 1 , i S*^) = ^(ll3^ "x II) •
If (^.2) does not hold then for i = 1,2, ...,t either
(1) u -• 0 , k -♦ «B , or
(ii) S^Xj,) =0(11^-^*11) •
91
t^ct» ..^„M..^:..^^^,,^,.....^^.;,^,.,.^,^^.^^^.
ppppp^^ppawiBppiwiiiiiiMpiiw,!»^ ' ■" pi».ii»n.n».ip»iiiiwii»'j^j".i*"ii»mi!""^ «jgwan i. H.iiuuiwimi»!». mmiiP«iu*m\mmMmmm<.t imm 'Mimmmmmmiimmmmmmm^mmKl
If (ii) holds then the approach of x. to x is tangential to the

■x- _—*
siirface g.(x) =0 at x = x .
Lemma ^.2; If the ICP is convex, then

are d:ual
(i) ^^ ^ feasible, and
(ü) fc^-ftx') < i__l

Y. *U±i\) •
Proof; The dual feasibility is a consequence of (2.1) and assumption
(i) of Section 3. This follows directly from Wolfe's fonn of the duality
theorem (MP Corollary 5*1) • We have
^VV
~ ~ =^ t ui ^^
~ - i=l ~ $
x(x u#) = f(x )
~^ ~ * ^ ^h) (
^5)
which demonstrates the second part of the desired result. □
Example; For icB- let S-i\) = S0(||XL -x |l) , u. > 0 .
(a) inverse barrier function. We have
U
i ^k/^^
whence
e v(x.)
i ~"V =^/r~ûf
"*k' i •
This gives
H^-xl =0(rV2) .
(b) log barrier functions. In this case
u. = r, / &g.(x. )
i k' i ^k'
which gives
92
mMmMji^mvî^tMijAi^^^ ■.:..-. ■^M..^..r..^M.^^.... ^.y.^...,^:.^^^.^^..^^^^!^.,^ .ÄÜÜlteJÜti:

mammmmm^mmmmmmmmninmß i i m*im*^*mi^^*mm*^**'*v^^mmm
= r
h^ k/ui
and
115,-xl =0(rk)
Thus the strict order condition peiroits us to deduce a rate of convergence
result. We now show that the SO condition is equivalent to the
condition of strict complementarity for the inverse and log barrier
functions. In these cases the remark following Lemma h,l gives us a

geometric interpretation of strict complementarity.
To discuss this equivalence, consider the following system of
equations which define x^ and u. as functions of r. .
m .
=0
^-^WbJ
1=1
'
and
u =
i/|^^ -rk ' i=l>2,...,m . (^)
If the Jacobian H(x,u) of this system with respect to x , u or an
appropriate transform of it is nonsingular then we can study the behavior
of x(r) , u(r) as a function of r by integrating the system of
differential equations
— "
dx
0
dr
H(x,u) (^.5)
du I |
e
_ dr _ u. " -
where e = [1,1, ...,1} . We have

. :
93
„^ '^^—i'*a*^-ai—■"--"'■ ' ' -■■- ».....■■■. a i

p^wfB^Pww*1 f^m w■■WIJNW.T!B'FIP?LI ^.■'■■i-'.' "i '!■■ ■■! -iMM^r^T.:... lamyp^pBH^yjiBBB»k- y>- JH»iiw.^**^t f,'«'JJ,'■ f- »i i MJ wmweBffww ^^v-1"'■■".■»'.gw —■ wwTWPwyi—iiwp ww -L'-^"-' w *wy )W"*" if " m
^^-M¥Vm^W!^n^
p
7 X(x,u) 7ß.(x)T... 7^(x)-
yt
m ™—2 7g
72 i^)
ög
H(x,u) = ik.S)
_m o_^
Vg
m^)
C Kf*
%
Let D be the diagonal matrix
D = (^•7)
w
m
where
■■ ■ Wi (4.8)
then
T
7 x(x,u) - tfe-^x) " ^(^
u yg
i i^^
DH = (M)
L V^m^)
9U
- ■ ■- '—--—IM 'I a/tVhil • V •~1 IJ

- ' ' — • ■■ '■■•■. - .-.^.i.- ..l^.M^^-i:/:-. .-.^^JiAu^Vilio™ «ouAJ
-^WS-'-'■ ',"?" r^mm&mm^Sf^ntWfft i i.|titfng?y^wwü^!J'ww.|l-iiM^>-...t. n-.i' ■W^WMU'-I.' »■'l—'" ^tww-iyiiiB^f,..u-, ..iiiu^Li^ffutiiuj.|i|.p_wj|pjHpMwijff)iw^yfyj^^
Provided -ß- /^-| - 0 , k -. • , leB-0 then DH will be non-

^V *£
4.
singular for k large enough provided J(x ) is nonsingular (see MP
Lemma h.h). This implies
(i) the second order sufficiency conditions hold,
(ii) the active constraint gradients are linearly independent, and
(iii) strict complementarity holds.
In this case
- -
dx -
0
dr
DH (^.10)
du
w
dr
u"-
whence
x, - x
~k
(U.n)
'[:]
= "J (DH) dr
u -u 0
provided the components of w are integrable.
Example: (i) inverse barrier function.
We have
=
V^ "2 ' W/ög? = 2g
i
In particular, it follows that DH(x.) is nonsingular for k large enough,
while -w. = TT vuT/r . We have

1 2 i'
i
95
i
l)gllglB|llllMiM|liaaalBB[B|M|Btt||||^^ fi'iailiMMlMiiMMmmaa-rii-- n ■■■"--■--■-■■■■'■■■■■■ ■-

*aw!rw»»P"WTT''"'"«"^!i«TOTO»»H»!w>nTT^
ox
0
dr
1 (U.32)
(DK)"
=
du 2r^
•
dr •
•
■
1/2
whence, changing to r ' as independent variable,
dx
0
= (DH)-1 (U.15)
u
du
Thus we have
0
x(r) -x
= rVVCx*)]"1 + oCr^2) . (U.ll^)
u(r) -u
NG*
(ii) loß barrier function. In this case
= 1
so that (^.10) becomes
" dx "
" 0
dr
= J"1 (^•15)
du
e
_ dr .
96
■ ■'- - - -■ ■ -"-- ■ -
""S^ '!'VM»I'?"Ww"">«w |p """- L
'■■' '■ I »."".'.»«■'Jim - lij.^!.ilwlnilJt»,"-J-'.iiJH,"-.ipi».iff--Ji.'Jm-)m'».'i m.mp.ii i .1 .ij.n MIM g;»;ppi||»^BBiwpiiggiWj»wiwllW|iwipp^
In particular, x(r) , u(r) inherit the differentiability properties of
f and g , i = 1, ...,m for r small enough (if ^geC1* then
x,ueCP" ). Thus
x(r) -x
u(r) - u -■■-[:] + 0(r2) (1^.16)
Remark; These results can also be derived by differentiating (2.1) with
respect to r . We obtain
^(rhr)^ = - l ^-7gi(x(r))T . (U.I?)

i=l î
In this case Lemma 3.5 guarantees that 7T(x(r),r) is positive
definite for r
small enough, and Corollary 3.5 can be used to give
du
the solution to (^.17). Note that (^.l?) results if ■— is eliminated
fron {h.2D). i i
We can now proceed to the main result.
Theorem ^4.1; Provided j(x ) is nonsingular, then the strict
complementarity and strict order conditions are equivalent for the
inverse and log barrier functions.
Proof; The argument is essentially the same for both barrier functions,
but is simplest for the log function. Thus only this case is considered
here.
so
For the log penalty function u = r-J&AxJ) ât i
97
■•*JJai-^-—'"■■' :"--- -- ■..-■■■'-.-..■.---^■.■•..■.■;„r,aivv- Miaaa..^â**u.

mm».mm wJiHiimaûmi nMi«»yu»*nu.. . . .m il|iiiiii,jilui..tiiM,miJMi|iii,..,i,^w.w,. M u,,, nj, „u lmn,.[ummmm*mmmn „i,. . l m ,wll n,i,,i jmmßmmwm^m^mmmm
f(xk)-f(/)= E i^ry^x,)*
ü v y
od^-/«)
icB0 i -k
= trôdj^-xjl) . (U.18)
If strict ccmplementarity does not hold then for at least one ieB»
r
k
11m —T—v = 0 .
k-»09 g ?5?
g (
^ i 5c' ''si(i "iht'Ü'*0'"5i'!5 I" iS 0
0lat-i ID at most, It
follows that r, = o(|jx -x ||) and hence, from (^.18), that
f(x ) - f(x ) = o(j|xk-x !|) . Thus the SO condition does not hold.
If strict complementarity does hold then asymptotically for small r
, f •>-,> - v > o ', ieB-0 .

*,i
||7g(x')l| ||x(r)-x| ^ i^xMT
- R.fxfrl) - uii
Thus r = 0(l|x(r) -x jj) . As j(x ) is nonsingular {h.l6) holds and this
implies (as r = 0(||x(r) -x ||)) that
||x(r) -x*l| < Kr + 0(r2)
for some K > 0 . This shows that r = S0(||x(r) -x ||) so that, by (U.lB);
the strict order condition is satisfied. D
Remark; The above argument shows that if strict complementarity does
not hold then the strict order condition cannot. The condition that
J(x ) be nonsingular is required only for the second part of the theorem.
98
MaMtaMlHMh^na.-Mi.,. ,.,.....;....^.- ...-. ..,■.. ,.t]]l , ^

»■-jtn,ra?.--ri»j^«T.-M^-.«tT4L-,.'-...^.v-:'i., .... ^;V_:-,'..^.,. .. „«.j-...^ w»,^.rf.,™;...' ^-.^J,._i-,-.-»--^.yy-i^t
«KPWIMJÄSWliHWPW'BW^^WWWiPIWWPPBiP^^lIPWV
Example; (i) minimize x^Xg

o
subject to -x:+ x2 > 0 , x1 > 0
Figure ^.1
From Figure U.l it is clear that the mlnimm is f = 0 at x1 = x2 = 0 ,
and that strict complementarity holds.
(a) inverse barrier function
= 0 = l+r
lr- = 0 = 1+r \i*2-*l)
99
HM^MMtftMMMMMHMUMI ajaaiiaij»J»..^.a-v.^ji.-,,.^.^-.,>J.....:..-,,
i mi. riiuM». HIP,.PWII».-P.Umm-"
i
"- ! mm> WWPWWPPWWW [■wiiiii>iiiu,i.iiiiij.s«mpii ii IILHIIIHIV in ma ii ii ' ■ ■"
This gives a pair of equations for x aiid x^ as functiom

of v . We have
x1 = r^-r+OCr5/2) ,
x2 = r1/2+r+0(r5/2) .
(b) log barrier function
T = x1+ x2 - r{log (x2 - x^ + log x1}
-2x1 1
= 0 = l-r< 2 +
x,
dx1
x2 - x;L 1
Solving for x and x as functions of r gives
x = r -2r2+0(r5) ,
7
2 '
x = r + r +0(r^) .
(ii) minimize x.
0
subject to x? - x1 > 0 , ^ > •
In this case the minimurii is again f = 0 at x.. = Xp = 0 . However,

T T
7f(0) = Cp is orthogonal to 7g2(0) = e, . Thus, as both constraints
are active at zero, strict complementarity does not hold. Note that
the constraint gp = x, > 0 is redundant, and that the barrier function

trajectory is tangential to the constraint surface g, = 0 . Note also
that the rate of convergence is reduced, and that r, —^ does not

k
ög2
tend to oo for the constraint with the zero multiplier.
100
^-yca- *' iM.'.j..i wi-ugy 11 tuyi irrr-rtr* âw»ww*wpyw?Bwrww ■pnyrygffwiicff^^g^p'w vv1'-' >■ 9w*&*%mgmmiQm
mmtm^mmimmmmiaimmmmmm I^W»»MWi>B»«»?^w»>WWg,».,^r-'IWW'l!M^
(a) inverse barrier function.
1
1^-4) 4;
ar = 0 = l+r
whence
x^Cr/^ , u,-! , r^-^

ög1 r
x2 = 3r/2 , u2 = (Ur)1/5 , r ^| = U .
(b) log barrier function.
T = x2 - r[log(x2 - x^ + log x1}
0 =-. -r
1
^-
ar 0 == 1- JlX2-Xl
whence
Xl = (r/2)V* , u1 = x , r4 =i
äg
= 3r/2 , u . (ar)1/2 , r ^f = a
a
4
101
l^HaMMaM MaWiiwMuttitaAi
|
w^m^^mmm^^K^mmmfwm^m^mn^timKf^ VWI ""'^M HBM in L u. HHIWL I I im^.i
(iii) minimize -x.
subject to (1-xJ -XgÔ, Xg > 0 .
This is the example used in MP Section 5 (see Figure 5.1) • The
optimum is f = -1 at x. = 1 , x = 0 . The Kuhn Tucker conditions
do not hold at this point.
(a) inverse barrier function
T = -x1 X
^(l-X^-Xg 2j
3(1-x1)2 "1
= 0 = -l+r<
5x7
ÖT / 1 ll
^" ^^d-x/.x/'x^y
= 0
whence
X2 , 2V2 Â r5A .
In this case u1(r) = u2(r) = ^ JTC
(b) log barrier function.
T = ^ - rflog( (1 - x^^ - x2) + log x23
STi
D 0
= 0 =
1 J'Hl'xl)2 \
"1'r\(i-x1)5.x2/
102
LU, I 1..1111111I-1 I>J j ■■»II> up i^ i,^. .m.Kini i. uu,.n .j. ..n ii i ii i... _., ,
""U1-^-^*"2)
vnence
x1 = 1 - 6r ,
x2 = lOSr5 .
In this case u, (r) =u„(r) =- ^

108r
The above examples confirm the predicticns of our analysis, and for
a given fixed sequence of r. values effective convergence is attained
more rapidly (i.e., for earlier members of the sequence) with the log
barrier function.
Now let 0 be a barrier function. Then
^ = log(a + 0) , a >l {h.19)
is a barrier function. Let x, minimize f+r^, x^ minimize
f+rji, . Then comparing corresponding Lagrange multiplier estimates
gives
-i > r
k-0
whence
^(^-C^Ô as r^ -* 0
Essentially this says that g^^) - 0 more rapidly than g1(xk) » so

that a faster rate of convergence is anticipated for the 0. barrier
function.
103
,■■1,1 illlinill .JIJI.^L.li^ ^^JW,,B1.,.l.llw..>,mp^w^j|| mppMp^Mp m ). .■M.|.i.i..i ii. ,1- ii .,.>. J ■im-im I. i m »i.»....^^,,,, II,I IIJI .ij.i.n.mup,,^.^ „„.y, „, mi, ,v,mi„^„^mv^ri
VK
Consider now the sequence of barrier functions defined recursively
by
0<(1) =
^ =log(k -log(g (x)) ,
^(^ =log(a + 0(i-1)) , i = 2,3,..., a>l.
l(i)
E 0^ • (Iu20)
J=l
In this case the error estimate is
t ^•i) i r
k
i-1
-r i
ftx^-fCx*) = -r
(1+.21)
The right hand side of (^.21) tends to zero as i -• • , and this suggests
that increasingly rapid rates of convergence can be obtained by using
barrier functions associated with large values of i .
However, an even more interesting result is possible. This shows
that in certain circumstances it is possible to choose a barrier function
having the property that the solution to the ICP is approximated
arbitrarily closely by the result of a single unconstrained minimization,
without requiring r to be taken arbitrarily small. Let
T(i)(x,r) =f(x) + r f 0(.i)(g1(x)) , and
m
Q(x,\) = f(x) + J] \ v(k -log(g (x))) .
d J
J=l
lOU
~-~—— ■ 1niri.'rn#iiamyiifci^ftih» ■ -^ L.--......^- ..:.,■■. ...._...,..■.-; -.J

in.. L .11 i.i inji 11 u.nw i.i i wmmm f?mm wimm**m mmm mm. gwywiwiiiiB mammm i' )wuijûjiiJigw^wfpwg|iiMl'wiJMWiiw^Mam
Theorem ^.2; Let Q(x,X) have a unique stationary value (necessarily
a miniraum) in S0 for each \ >0 , and let x^ ' minimize T^ '(x,r)
for i = 1,2,... «md fixed r . Then the limit points of {x^ '}
are local minima of the TCP.
Remark; Note that r does not have to be small in this result.
Proof: If x^ minimizes T^ '(x,r) then
1 1
7f(x^)-r £ i±K ^ m " m Wî^
* 0 , (^.22)
and this expression has the form
VQ(x(i),\(i)) = 0 (1^.23)
where M1' has the numerical value
, .x i-1 ,
\\J = —rn—
(i)
TT —-T7\ nrr ' j=i,2,...,m .
kj -10g(gj(x )) S=2 (T + ^Cg^XÔ)
(1^.21+)
Thus the {x^1'} also correspond to a sequence minimizing Q(x,\^ ') by
the assumed uniqueness of these stationary values. Now, as a > 1 ,
fiv' > 0 , s = 1,2, ...,i-l , K\ ' can be made arbitrarily small for
each j by choosing i large enough. The desired result is thus a
consequence of the remark following Theorem 1.2. G
105
■
ipipiimPMpnii^viiMPvpnninwiiivR
—4_
Remark; The conditions of the theorem are satisfied if f(x) convex,
gi(x) , i = 1,2, ...,m concave, and strict convexity / concavity holds
for at least one of these functions.
In what follows it is convenient to use the superscript i to
indicate the appropriate member of the log barrier function sequence
(U.20).
Lemma U.3:
(^.25)
where
U-.^+
■j "j g ,^<,^
~j ögj 'i
. (1+.26)
and
(i) -ii ,
py gj -* o . (^.27)
Proof: Let \(0) _ -log g then
^_0) (0) 1_ (U.28)

^
so that p^ ' = 1 . Now, differentiating the relation
(U.29)
^
gives
106
ii'iBiitiiririMiiiBMiiaiilfalttlit'tiMiiifiiiifiiiiiii iiii i-iiiini ...

Bww7'WWWWBWffTipWWiW*1l'r ".•w y ""rvr'-v-^rji? v •»T""1",""-"l,l.,V
MMMHi
so that
a2î+1) /^j ^ ,î

(i l)
+ ^
^(i) +
/^(i)
06^
"Si ^ ag
^ + g
■■*(• )
This demonstrates (^.25) and (^.26), {k.2rf) follows on noting that
pj0) = 1 , and that, from (^.29),
(i)
g ^G - 0 , g. -0 , i = 1,2,... . G
d «gj 0
A consequence of this lemma is that, provided J(x ) nonsingular,
then D^V1^ is nonsingular for x(r) sufficiently close to x .
Lemma h.k: Let J(x ) be nonsingular, and I1 = B0 , then the SO
condition is satisfied.
Proof: We have fron (I+.29) that

,2
t$U
w (i) _
(i)
J ög
6
'1 'J
<(i)
(1^.31)
1
so that w^. ^ -«0 as r -«0 for jeB^
0 . Now
0
0
bi~t 1 1
[D^V )^)]-
(i)
\ 'I •
•
. m 'Sn .
+ smaller terms
107
HBM|ttBBiM|MBMMjM|iBMgB ; ,J..ri. ^ :^ ^ -._

W^äU
■ MI.II Jll IIJII«. U«!«!!«! II M mM'.m |)WII.WH|Wffi)lJ,lli|MHJlWp/itl»|[»llll!l<ll|lBIIW»IW;ilWl|llTMl»WWi'iIW;'lll"l'Wy^
where P^1^
is diagonal, and P^' = l/pj1^ - 1 , r
•JJ --"J -' k^0'
j = 1,2,.,,tm . This result implies that for k large enough
1|
Vü*II ^ K^ u k
JeB, j «j^ •
The SO condition is an immediate consequence of this inequality, D
Remark: If B0 c I, then w^ ' need not tend to zero for j/B0 .
Thus eventually the largest components of >r ' will be those associated
with the inactive constraints. This implies that ||x. -x j| = 0(r, ) .
But g. , jcB- , is o(r,) which suggests that, in general, the SO

,1 u ic
condition does not apply. This case should be contrasted with the log
and inverse cases where the contributions of the inactive constraints
do not dominate in w (in the inverse case the active constraints
dominate). We note that the SO condition is only sufficient for (^.l)
to provide an error estimate and numerical experience indicates that it
is applicable in the calculations with the log sequence. However, the
above discussion suggests that to attain the maximum rate of convergence
with the members of the log sequence, the inactive constraints should be
identified and dir-carded. A possible way to do this automatically is by
the use of a separable barrier objective function

m
=f +r u (U.52)
^*'h} W k ^ i ^gi^
k-1 k-1
where p = r, u , r, is the usual barrier parameter, and u is
the multiplier estimate obtained from the previous minimization. This
objective function has the property of forcing the multiplier estimates
108
J i
-■-"'•■ gjtiMKüiiatai - -■-■■ ■ -■ '——' »a -■.-.■-■

for the inactive constraints to zero at a very fast rate. We have
k+1
12? J :
i ~r*k+l"i
'vlUi ^-(W
c^g
i
sÂs-^rftî <^
where p. is a bound for 4^- (x.) , k = 1,2, ... .
JU
This choice can also be favorable in the case of nonstrict comple-
mentarity. Consider the previous example

2 x
min x subject to x -x > 0 , -i> 0 .
Set Q = x2-r2(log(x2-x1)+uk_1 log x^^} . Then 7Q = 0 gives
X
2-Xl=rk ' ^l^kVl '
so that
Vk,! = 2l/2 1/2 1/2 .

k x1 k k-1
-k , / = cc^k reduces this to

Setting r, = Q; u /2
0 1 c k
ß ß
k - 2 k-l " 2
The solution to this difference equation satisfying the initial
condition ßn = 0 is
ßk = -k + (l-(|)k) .
From this it follows that u = 0(r.) , and hence that x^. = 0(r ) .
Thus, for this example, we are able to obtain results as favorable as
those in which strict complementarity holds.
Example. Show that the error estimate (U.l) is valid in this case.
L« 109
wmmwM'-i'iJWJiiiiiuii^niPiiiMiiiiiiiiimniKmnniiiiinixuM ni.MOiwiP
There is a penalty to pay for the generality of the barrier

J
function algorithms, and this is a significant burden of calculation
associated with each of the successive unconstrained minimizations.
This can be explained (at least in part) by looking at the Hessian of
the barrier objective function. Experience (in part supported by
theoretical results) indicates that the condition number of the Hessian
ia a good indicator of the degree of difficulty of an unconstrained

•"I ■
optimization problem when it is solved by descent methods. 3 i
On the assumptions that the second order sufficiency conditions
hold at x , and that the active constraint gradients are linearly
independent, then it is possible to deduce fairly complete information
on the eigenvalues and eigenvectors of 7 T(x.,r.) from (5«8).
(i) There are n-t eigenvectors associated with eigenvalues of

Ä
7 T(x ,r,) that are 0(1) as r, -♦ 0 . The smallest eigenvalue tends
to
v 7 X(x ,u )v #
m = rain 1—a—- , Vv such that 7g.(x )v = 0 , Vi e B. .
V V V ~ ^ ~
(ii) There are t eigenvectors associated with eigenvalues of

p >»
7 T(x,, r,) which tend to • as r, -• 0 . These eigenvectors are
asymptotic to vectors of the form C1(x )y. where y. are eigenvectors
of the problem
[C1(/)TC1(x*) - Âl^ = 0
p / O
where A is a diagonal matrix. A.. =—^ / max —5 , i = 1, ...,t
11
Ög2 /i<j<t Ög2 J
110
tm jjgmjn ■ M.^.^:^..^,tikJ,i--^...UTAn;|[rWt|1||..|.[.|riini<|1|.[.|n..|i...|...[||<|.. ..
w~-
•V«-.-.- ST^HmHWIPIWWl fWHWHWWWIWW "'"' ■«^»■iiim.w,i;iii»iHi,.iiiw.i>i«ii,i»|).,ii>iMiiHui,iiiiNijiiiitii,iiti,.niii.i.iw. P1I!I«!"!','»."I1I,I'|].IW»1IH"I,1M1K3:
The corresponding eigenvalues tend to • like u.r. max —^

l<J<t ög^
u
-- that is, like p..1 —7—r
s ; where a is the maximizing index.
cÄ
This shows that the condition number of vT^x-^r. ) tends to •
like l/sa(xk) or like l/jk -x H , if the SO condition is satisfied.
In this latter case we have shown that our measure of the cost of a
barrier function calculation depends in the main on the accuracy desired
rather than on the choice of barrier functions. However, our estimates
for the log family indicate that these will be somewhat more expensive
than the above estimate except when all constraints are active.
Note that the device introduced to force more effective elimination
of the inactive constraints does not force the Hessian to be worse
conditioned in the case that strict complementarity does not obtain, at
least in the examples that have been worked out. The use of this
device would appear to be an important improvanent in barrier function
algorithms.
5. Analysis of penalty function methods.
Consider now the equality constrained problem (BCP)
min f(x) , S = [x ;h.(x) = 0, i = 1,2, ...,q (ielj} . (3.1)

xeS ~ ~ 1 ~ ^
It is assumed that S is nonempty, and that (3.1) has a bounded minimum
(say f ).
Ill
^ VlMMBMiMMiltofrM^' I MM I -■•■"■'■ -- Itm IM ■ 1 ■■■■-->■"" I -'■--

^<iipi.,l>Hi.|i.lwwî..i,j,<TlH;.>..f.UM,ilii,ji,1jliH1 II^,^^,!!^ I^HIIII!1III.HI|.II<1P.WI,...WMHIP;I»H^|JI iplin.ij|11.»nt|)T.ii..»IWiplim)inii»ini«w-A.ilmw
WIIBKttB0KKKKIttmmmmmmmmmmmmmmmmmMmm-mm,'m*.m*m,mmmi*mm*mmmmmmm™ ii ...'...I, iii ;i.,..
Remark; The inequality constraint g(x) > 0 can be written as the
equality constraint „J
h(x) = min (0,g(x)) = 0 (5.2)
so that formally the ICP is a special case of an BCP. However h(x)
given by (5»2) can have discontinuous first derivatives.
Definition; F(x,\) is a penalty objective function if
F(x,\) = f(x) + \ V fih^x)) (5.5)
where i|r(h) is a monotonic increasing function of |h | , and \|r(0) = 0 .
Example; Let \|f(h) = |h| then \|f is a penalty function if cc > -1 ,
If g(x) is concave then, from (5.2), so is h(x) , and \|r(h) is
convex provided a > 0 . If a < 0 then TJ is unbounded as h - 0 .
Theorem 5.1; Let fX.} t» , and x minimize F(x,\ ) . Then
{f(x )} nondecreasing, {F(x ,\ )} strictly increasing unless x eS ,
and
(iw} nonincreasing. If {x } -• x then x solves
the BCP.
Proof; Let \ < \ . Then, provided x ,x„ ^S ,

r s ~r „.s
F X F X F
F(V^)
jr r < ( „s' r' < (v~s'
S^r) a^Ss') < (V\)
~r' s'
Thus (compare Lemma 1.2) the results for the sequences follow us before.
We have
min F(x,\) < min F(x,\) = min f(x) = f . (5.1+)
x ~ xcS ~ xeS
112
■ ' amamjayâaaAl..^; .1.-. ■ ■"^-"■■'■'■■■^■ûi^mâ' u.--.:..-. . ...n^ ■...^.î■.„■.-.";:. ■.-^.^. .......,..^..i:.j

piîu,iiii...imwip!»»wwpn"ii)|i|i|i|"ii.i"i. .iinifc.i ii [•nH<M"mmvm*»' HIIIU.HJIIII m ^•»■.■"<- iiiiii«jiimKii»ini!ui.^Bi»if"'«'»"'.i)ii-iwi„'»!"M«''".nM'.i 11,1.1..11 imnii
Thus the F(x ,\ ) are bcoonded, and hence x eS . Now
x cS =» f(x ) > f ,
but, by (5.1+),
f(xr) < F(xrAr) < f
so that
lim f(x ) < f .
Thus f (x ) = f , and x solves the BCP. D
Remark; In the more general case in which fx ) is bounded it follows,
by restricting attention to convergent subsequences, that all limit
points of {x } solve the ECP.
Theorem 5.2; Let fx } -» x and assume y{h.) continuously differentlable,
and 7h.(x ) , iel0 , linearly independent. Define u by
u
i " ' Xr ^J sgn(h )
i ' ±
= 1'2>"'><i. (5-5)
then {u } - u , the vector of Lagrange multipliers for the ECP.
T
Proof: Define the matrix B(x ) by /c.(B(x)) =7h. (X) ,
i =1,2, ...,q . The condition that x minimize F(x,\ ) gives
0 = 7F(xr,\r) = 7f(xr) +\r V ^y sgn(hi)7hi(xr) (5.6)
so that
7f(xr)T = B{XT)^ . (5.7)
115
^-J..-,__. -.^..■..^^x.;.lM..-^»-.... ■..., ■-.■..■.. .x. . .L_ ., . , l , -...■> jy^

-' -■■'■ ■-■• I
W^f^ ■^t^^wwwMim,.:" luii^ww^^tffwwwwK^m^'y ^w|-|""w/l».iyw!"i|i" ■.■iiit.i.«i"iiiwnMitipii,iiiii>i»i«.i.iw,^n.t w!wwiiWffppwpi»PW»pMPHpppiPwiw pmwi up; WBWBWmBPW
Now B(x ) has full rank for ||x -x || small enotigh. Thus
•«.i
ur = (B(>:r)T B(xr))-1 B(xr)T 7f(xr)T
- (B(x*)T BCx*))"1 E(X*)

T
7f(x*)T = u* . D (5.8)
Remark: (i) If strict complementarity holds so that |u. | > 0 ,
i = 1,2, ...,q , then the convergence of the Lagrange multiplier estimates
implies (from (5'5)) that sgn(h.) is constant for i large enough as

v V >0 , xeS . Thus the minimizing sequence approaches S 'from one
side1. In this sense S acts like a hairier.
(ii) Note that h.(x) = min(0,g. (x)) = 0 identically in a neighborhood
of x if g. (x ) > 0 . Thus 7h (x ) = 0 so that, in this trivial
sense, the constraint gradients are not linearly independent. However,
if strict ccmplementarity holds, then a multiplier result can be proved
for the active constraints (do this I). In fact, the strict complementarity
restriction can be relaxed somewhat.
Theorem ^.3; If the conditions of Theorem 5'2 hold, and, in addition.
{u } -u , and \ djt2
_ • , i = 1,2,.. .,q , as r -♦ « , then the
~r ~ r
dh^
second order sufficiency conditions hold at x if and only if
7T^x ,\ ) is positive definite for r sufficiently large.
Proof; This is essentially the same as that of Lemmas 5.1 and 3.2. D
Example; Derive the analogues of Lemmas 5.1 and 5*5 which apply when
the BCP is obtained by transforming an ICP by means of (5'2).
Ilk
■wail« iinrn i i II """"""""îliai iHrtMitlflTiiiiirrHiii II I iiiiiiiiK'"'*"--'—-■«H'nin

^^^*^*ll^——«————— ■"■ ■ i»———«— ———■ mmmmmmmmmmmmmmMHBMttKi
Remark: The condition that \ —^ - o» is related to strict

dhi
ccmplementarity. Consider \|f = |h. | , a > 0 . Then
(5.9)
so that
xrr4 = (l-âlh.r1 «Ki (5.10)

dh^
d2
Thus X ^-| - • , h. - 0 if |u. I > 0 . Strict cctrplonentarity
is of particular importance for equality constraints derived from

inequality constraints by (5.2). In this case, the one sided convergence
implied by the multiplier relations is needed if we are to be able to
talk about second derivatives at all.
The parallel development of the treatment of the BCP by penalty
function methods and the treatment of the TCP by barrier functions can
be completed by discussing convergence rates of penalty function algorithms
in much the same way as we treated the barrier function case. For example,
multiplying (5.6) by x -x gives
f (x ) - f (xr) = V uj h.y + 0(|lxr - x*n2*

If) . (5.11)
i=i
The assumption that the SO condition is satisfied can now be used to
provide estimates. From (5.9)
(l + ^^lh.fa sgn(hi) = - u*
so that
115
MlliMMKiailM-l I itMMdawMMlttîüi^^,-^...^..--,.^^ ...., ,. | ..

(5.12)
This sviggests a rate of convergence of O^j—) ' ) , which contrasts

r
favorably with the estimates obtained for the barrier function algorithms.
In particular as a-0, (5.12) suggests that the convergence rate
becomes arbitrarily great. However, the results of the previous section
also indicate that the condition number of the Hessian will become
arbitrarily large as a -* 0 . The next result provides information on
the limiting case a = 0 .
Theorem 5.h: In the IGP let f(x) be convex, and g.(x) , iel ,
concave. Let w be an infeasible point, x an interior point of S ,

# ■*
f x
a = rain g.(x ) , b = ( 0) - f(x ) , and \Q = (b+l)/a . Then x
lcI
l
minimizes
F(x,\) = f(x) -\ f min(0,gi(x)) (5.15)

i^l
provided ^ > \
Ö *
Remeürk; It is necessary to demonstrate the result only for X. = \0 .
For all larger \ it is then a consequence of Theorem 5.1.
Proof; Let v be the boundary point of S on the join of w and x ,
and B be the index set of constraints active at v . Define
s(x) = f(x) -\ Z «iW > (5.1M

ieB
then
116
tmmm M^MM,
-V»•.'■-,■ *■ i,im«v>-B ■",
=f(
^ X ^-o*
^-ô icB
< f(x0) - (b*-!)
= f(x*)-l
< f(v) = s(v) = F(v,X0) (5.15)
As s(x) is convex and v is on the join of x and w , 3 9,
0 < 9 < 1 , such that
s(v) = ös(x0) + (l-ö)s(w)
< 9s(v) + (l-©)s(w)
whence
E(V) < s(w) . (5.16)
Now s(w) <F(w,\ ) so that, from (5.15) and (5.16)
F(v,\0) < F(w>\0) •
Thus min F(x,\ ) must be attained at a feasible point. D

x
6. Accelerated penalty and barrier methods.
The problems of poor conditioning of the ccnrputational problem and
(comparatively) slow convergence make it worthwhile to search for methods
for accelerating the convergence of the penalty / barrier function

algorithms. Consider the (generalized) penalty objective function
»(x,W,T!) = f(x)+ f WÔi^x) + T^) (6.1)
117
. ■- ■■ ..„.L^^.—^ . - .
- flU
fg^H^^J.u.-i Hii^Hiiiîw.Mm i ,..J., . '^■^.-'■■'Jiv.iii.i-yviuiiMi^ -A™™, h^n *'-vi' n gn^—w ^■^r-i« P^...I+...,I.—
where W is the diagonal matrix of penalty parameters, and the T|.
are further parameters to be used in the acceleration process. u

At a minimum of P , x(W,1)) satisfies
q
7f(x) - f ui(W,1))7hi(x) = 0 (6.2)
W X
where ui(W,Tl) = ~ ii ^" * Provided ||x(W,T0 - *II «"^
||u(W,t)) -u || are sufficiently small, the second order sufficiency
conditions hold at x , and Vh.Cx ) , iclp , are linearly independent,
then (by Theorem 5.5) x also solves the EC?
min f(x) , SL = {x;h (x) = h (x(W,Tl)) > iÎp} •
/ x
One sequential strategy for making x(W,Ty -♦ x* is to force
M* «W **0 . i
\ = min W . t • . However, the parameter vector T\ is also available,

11
i
and we ask is it possible to adjust it to make
h^xfoTO) = 0 , irl2 . (6.5)
v c*x.
Let irr- be the matrix with components -rzr- , 1 = 1, ...,n ,
j = 1,2, ...,q . If v P(x(W,T\),W,iri) is nonsingular, then, by the
implicit function theorem, we can solve (6.2) for x = x(T|) holding
W fixed. We have
72 P "ri - -^nî< ^
where all quantities are evaluated at x(W,Tl) • Defining the diagonal
118
|jpwîy"J:WTF.ia!».»i<tiuiJ.iiiiiipvM,Mi.ittiitijl|).|jii..^, .IJIÎ. urownpimnuBpipim J-*~'9J',i''>-^r-.-"rr——- &r^'rj"Wlfinv*'r**^ft7?*ryr^nr?i<y*ivnr~
2 2
Vv^ Ww^ W i
matrix V by ii == ii H = ii ^ '
^ = *ii Sh^ = 1'2—^ ' ^ the
matrix B by K.(B) = 7h. , i = 1,2, ...,q , then (6A) can be written
(72X+BVBT) ^ = - BV (6.5)
2 T
Choose V- to make U = 7 X+BVQB positive definite, and set
v = v v Then, by if min v is
i " o * ^"^» ii s11^101611*^ large
^ ~ -U'1B(BTU"1B)"1+0(V"1) . (6.6)
This relation can be justified if
(i) the second order sufficiency conditions hold at x and 7hi(x) ,
irlp , are linearly independent,
(ii) ||x-x || and l|u-u || are sufficiently small,
(iii) rain V.. -• as min W.. - • for I) = 9 , and

'ii ii
(iv) min W.. sufficiently large,

11
i
Consider now the use of Newton's method for solving (6.5). This
suggests that a correction 6^ to T\ be found by solving
-u <& R« TJT öx t_ .
(6.7)
But, by (6.6),
,T äx
B - I + 0(V'1) (6.8)
3Ü
so that
6T| = h+0(V ) . (6.9)
119
"—"--—"■-^■■■''-n II --■ ■
wmMmmmmimmmmmw-*""* mm '" ' ' ■' '" WPilW ^.'HI'IIIMWHIWHIIHIHI« iiiiiiiiiiii;iB,iij!iii .iiii|iijiipiw.iiiujipy]|ii].ij|iiii,ii<<iiiii.iiiiiii., inii)ni»g.mi|iiM
Thus we expect the sinrple correction Bl) = h to approximate arbitrarily
closely to a second order process provided V is sufficiently large.
Algorithm (BCP)
1
(i) Initialize T/ ^ , W^ .
(ii) Minimize P(x,W^kSTl^kO to determine x, , u, .
(iii) IF £ ui N^) < r0L

' THEN STOP.
i=l
(iv) FOR 1=1 STEP 1 UNTIL Q DO
IF A^C^Cx,)) <DBCR*ABS(h.(xk_1))
THEN 71? ^ = "Hi + hi^)
ELSE w!^ = EXP*wf^

ii ii
(v) K := K+l.
i
(vi) GO TO (ii). -,
Remark; The idea behind the algorithm is that the correcticn (6.9)
is used whenever the convergence of h. to zero is satisfactory. ^
Otherwise it is assumed that W.. is too small and it is increased
accordingly. Tj. is modified at the same time to ensure that

k *■ / ä\lr v -1 öilf j
u. -.u. as h. -.0 . (rf) '" indicates the inverse function to -g- .
For the ICP we consider the modified barrier function
R(x,W(k),T)(k)) = f(x) + f W^ uj"1 ^(g.Cx) + if) (6.10)
120
——^-..-^.«-^.^.^ . , ™-»-;-,,.;,.^:^;a.,;^^.,.: ^ ■ -Ja=^;-ia-»'■''•—"-—.-r»-^:^-^-;:—.».IV-.^.:::.i-.^,;.^:LI

»WlW^Î^P^îiaiWMlWiiwiUiliupiUiipiiiiil^ »ii>piiilwi,iii.i,l„il,liHl,lw.)ii,iiliiliil,minM,..,
■ mmmmm*mmmmmmmmmmmmmmmtim*m - 11 IMII »«■»WIHIIIIIHIIIIIIJIWWWWMWWi
where u. ^ is the vector of multiplier estimates fron the previous

(Xr)
minimization, and VT ' is now the diagonal matrix of barrier parameters.
We note that, in the particular case in which all constrcints are
active, the previous analysis is applicable, at least formally, and
suggests a correction
5^) = ^(k) + g( ) (6#11)
with order of magnitude departure from a second order iteration of

1
((T^SJ )-
Off min W;.' u.' —5 ) \ . However, we require automatic selection
of the active constraints if we are to make use of this result, and it
is important to note that this is provided naturally in the algorithm
by the options
=
(i) if g. -• 0 at a satisfactory rate then 1\. \+&± * and
(ii) if the convergence rate is too slow, then decrease the barrier
parameter.
This second option can be expected to apply to the inactive constraints,
and will drive the contribution to (6.14-) from this source rapidly to
zero by (U.55). Note that the boundedness of the barrier terms requires
that gi+'ni fee

positive. If ^ ' is set to zero then (6.11) ensures
that this condition will be met initially. Provided strict complementarity
holds, the convergence of the multiplier estimates will ensure that it
must hold ultimately. Of course, the calculation must be started from
a feasible point.
121
.i..J-^...,-.i],J..Mj.',i,..-.,.l„...., .-, »^ .■■.■■ ■miiW||,.-||l|1,„| I, ^...J.,,„......„...,....,,. ..,LJ ,.

■ ""■I ■,■*, ■ip, nnmiij-Hiiiniw.ijji .„nun,.ii^ni IIIHHHJ,,,,,!!!,,,!! .„,,,11 mi I|l| |U ,1 liniiinmw
Algorithm (ICP)
(i) Initialize TV1' , W^1' , u« .

"0
(ii) Minimize R(x,W^k^'n^k^) to determine x, , u.

m
(iii) IF £ uk g^XjP < TOL THEN STOP.
(iv) FOR I = 1 STEP 1 UNTIL M DO
IF ABStg.^)) <DECR*ABS(g1(xk_1))
THEN
«if1' ■ -/^ >
ELSE V)^1' = DBCR^W^. ,

ii ii '
(k+l)
(ßiTi'î
(v) K := K+l.
(vi) GO TO (ii)
Remark; As in the previous algorithm (^)" denotes the inverse

og
function to *t- . For example, if 0 = - log g then f] = W .
og
Consider now another modified penalty function for the ECP
S(x) = f(x) -u(x)Th(x) +h(x)TWh(x) (6.12)
where the matrix W is positive definite.
Lemma 6.1: If the second order sufficiency conditions hold for x = x ,
and the vh.(x ) , iel , are linearly independent then S(x) has a
122
-■i---:-;-"'----"--■■„- drnn-ir.. iaC^l-..Û-^L-JL-^^..,.U^..-Â.1 ^^rf-lr^ ^Jb^ . :■.■, .--^.r ■-■.. ' ^..-^ ,- ■■—^.. - , .„,, .t J.,... rJ.L'. i^
WW^^W^'W "MWMMIN llil '.' • ■liii.uiili^.lp. m.. ..,-, „..^ I'nm "in min) ji.i ii Mi . III.III|IWII.
local mlniraum at x = x provided u(x) -«u as x -. x and the

** m* ****** m* t*
smallest eigenvalue of W Is large enough.
Proof: We have
7S(x*) = 7f(x*) -u(xY 7h(x*)
h(x'(')T(7u(x*)-2W7h(x*))
7f(x ) -u 7h(x )
= 0
as u is the vector of Lagrange multipliers for the BCP. Thus S
has a stationary point for x = x . Now
7 S(x ) = 7 £(x ,u ) -7u(x )X7h(x )
7h(x*)V(x )
+ 27h(x*)TW7h(x*) (6.15)
where terms which vanish at x have been ignored. Corollary 3.2 can
2 *
now be applied to show 7 S(x ) is positive definite. We set
V = 7h(x*')T , U = 72l(x*,u*) - 7u(x*)T7h(x*) - 7h(x*)T7u(x*) , and note
0**0 MM ** ** ** ** 9* m* r* m*
that
T T 2 , * *\.
min t Ut = min tA 7 £(x ,u )t = m > 0
^=0, Ht|l=i ~ ^=0, Htll=i ~
»ff
as the second order sufficiency conditions hold at x . G
\
125
in Uf^lUll^l^ ,,■,.,.,;:,„.....J.-;;ltt_,n„...^,.
r
i ■
Theorem 6.1; Let the conditions of Lemma 5.1 hold at x and set
u(x) = B(x)+ vf(x)T (6.11+) u
where B(x) = 7h(x) . Then (6.12) has a local minimum at x provided

the smallest eigenvalue of W is large enough.
J
Proof; This result is an immediate consequence of Lemma 6.1. As the
7h.(x ) , id , are linearly independent, B(x ) is a bounded
operator for ||x - x || small enough. Thus u(x) -»u as x -»x . D ü
Remark; (i) By using (6.lU) we can const.ruct a penalty function which

is differentiable in a neighborhood of x (contrast with (5*15)) and
■x-
J
which has a local minimum at x = x for sufficiently large but finite
values of the penalty parameter. However, (6.11+) requires first derivatives
of the problem functions so that minimization of (6.12) with a method that
0
requires first derivatives of S will require second derivatives of
the problem functions. Two cases have been considered (Fletcher).
(1) S(x) = f(x) -h(x)TB(x)+vf(x)T+0||h(x)U2 , and (6.15) 2
(ii) C(x) . f(x)-h(x)Vx)V(x)T+al|(B(x)+)Th(x)|l2 , (6.16)
where a is a penalty parameter. J
(ii) There is a close connection between the penalty function (6.15) and
2
the algorithm based on (6.1) in the case i|f(h) = h . At a minimum of P
we have (as vP -■ 0 )
2W(h+9) = - B+(7f)T . (6.1?)
12l4
i^LaaafliM^fa^^^—^.^,.,...,..-.^■.■■■..: g -iilitu-i ■rrn '■ ■Mr'«!■■■■— ^..;J.-v^.'^.'..-J..:. ■--^.^^■■.^Mv, m^,^ n-y, ■ r n■ i,- ■ ■ ,...: r_'. .._.'.-_^.:.'.„. .v 'u.-..:^-n.:^...^L^
I'M mniiiiiwijpwywwwpwiiiiii i i IIWIHWI)'!.» iww, nmi]§mmwmimmmwm\-jw'mmwmwmw*wnm%w\m<mmMi.mmwtm)f piiiiunnmnn
Thus the correction formula corresponds to updating the Lagrange
multiplier estimate by (6.1U) at the end of each unconstrained minimiza-
tion rather than continuously which the use of S requires.
(iii) Note that S(x) can be interpreted as a Lagrangian. For
example, in the case S(x) is given by (6.16),
S(x) = X(x,w(x)) (6.18)
where
w(x) = B(x)+7f(x)T-oB(x)+(B(x)+)Th(x) (6.19)
Lemma 6.2; w(t) defined by (6.19) is the vector of Lagrange multipliers
for the problem
minimize f(t) + 7f(t)(x-t) + § |lx-tl|2 (6.20)

x ~ ~~~ ^~~
subject to the linear constraints
h(t)+7h(t)(x-t) = 0 , (6.21)
provided this minimum exists.
Proof; Any point satisfying the constraints (6.21) has the form
x = t-(B(t)T)+h(t)+A(t)z (6.22)
where B(t)aA(t) = 0 . The multiplier relation for (6.20), (6.21) is
7f (t) + c(x-t)T = uTB(t)T (6.23)
so that u can be taken as (substituting (6.22) into (6.25))
u = B(t)+{7f(t)T-ff(B(t)T)+h(t)} . D
I I
125
^j^g^^gjlg^gg^l .,.,%.,..„,;A.Jf..., ^.^^.^ .J,r^J-,..

WPWWIIIPPPpillWWPIipBPWWlWWWil^^
If (6,22) is substituted into (6,20) the problem becomes one of
minijnizing w.r.t. z
u
a T T
7f Az +| z A A z
whence
Ü
z = - i (ATA)"1AT7fr . (6.2U)
Thus a plays a role in ensuring that x(t) , the minimum of (6,20),
cannot deviate far from t (cf. remark (ii) following MP Corollary 5«l)» *^ |
Example: (i) The Lagrangian interpretation provides a method for
generalizing the above discussion to inequality constraints. Consider
the problan min £(x,w(x)) where w(t) is the vector of multipliers for
x ~
the problem
a
2
min f(t) +Vf(t)(x-t) + | ||x-t||
x '
subject to g(t)+ 7g(t)(x-t) >0 .
Under what conditions does £ have an unconstrained minimum at x .
What role does strict complementarity play in this problem?
(ii) (6.12) can be generalized to other penalty functions and to a

barrier functions (cf. Remark (ii) above). How much of the above
analysis goes through? What modifications are required? Evaluate the
resulting algorithms.
126
.^tJ«aiLlJ^^*«L*J^^.îJi^^^^ ....■.■■ ^.i;...^. .■^^^.,.„^..;t.v.,J....a„,.-l.-^ ■■ ..■-^■■...,^. ■_„■..v^.^.^Jf.^l....^^^., ■^.:..1,..^..,-l,.:

PWP^WW^^W^^^W^^^^ww^ww^^^^^^^w^wwIIWUIII i». iiiiBii.wiimwi.uii.i.iw.my u m, .miw,» i m.i ■■i.i.nwi,.. , „I
Notes
1., 2. See Fiacco and McCormlck's book. Also the paper 'Penalty
function methods for mathematical programming problems';
J. Math. Anal, and Applic. (1970), by Osbome and Ryan.
3. Fiacco and McCormick were the first to draw attention to the
importance of these (as they were to much of the material in this
section).
U. The log family is due to Osbome and Ryan. The importance of the
conditioning of the Hessian to Walter Murray. Rate of convergence
formulae have also been developed by F. A. Lootsma, (Thesis, also
survey paper at Dundee conference).
5. Fiacco and McCormick. The exact penalty function is due to
Zangwill.
2
6. The algorithm for the BCP is due to Powell in the case | = h
(Harwell report, also Procedings of Keele Conference). The exact
penalty function S(x) is due to Fletcher who has developed it
together with his student Shirley Lill and described it In several
Harwell reports. The extension to inequality constraints (example (i))
is also due to Fletcher.
References
A. V. Fiacco and G. P. McCormick (1968): Nonlinear Programming:
Sequential Unconstrained Minimization Techniques, Wiley.
R. Fletcher and S. A. Lill (1970): A class of methods for non-linear
programming, in Proceedings of Nonlinear Programming Syraposium,
Madison.
127
iiinVfi.ftH'llMrmltiilfliiliiiii.111 ■ 1 I _■..-.,.. „.,, .,..■.„ ^...-.\ .

PmmPül—H—HWWW II Ulli III>»I 11 ii ii m ii i ■i.ii.miiiwiii. ■iniiiiiiinwn wimiiii I.UIIII i I.IIIIPII.HI, wi' WIMIMDI iimm ,HIIIII!I>«III.II..^
F. A. Lootsma (1970): Boundary properties of penalty functions for

constrained minimization, Phillips Res. Repts. Suppl., No. 5.
w
W. Murray (1969): Constrained Optimization, Nat. Bxys. Lab. Rep. Ma79'

M. R. Osborne and D. M. Ryan (1970): On penalty functions for non-
linear programming problems, J. Math. Anal. Appl., 31, pp. 559-578.
U
M. J. Powell (i960): A Method for non-linear constraints in

minimization problems, in Optimization (editor: R. Fletcher),
Academic Press.
.'
,J
I
128
filtttMU
'"-'"-— ■--- ■■-"
■ - ■-■ m mmm MUMM BUI
|pHiifiHiiPt^-^.Ti» i.^rrtwwF'T—■ ■»■■— \mi ■ ,.,mmmnim» trr}*-Tt:r~*.:.,.,l,l,*immr--yi,i>mr-rwwrr>,S. J.I')'-i"iif..M^ P.'.' •wiwvm-iw; T*TTVTWPT^'■"""""TOTTHT-WI-fWITTIWTft.^"^^11JWW '1'" ■ JUJ■■■I■W■ W|'j'IWM,-yWS-^ -^TPW'Tf ff l|■ W I'jW If WVTH'V—^•WWV'ff'TT-T-""T■nnrTt^lHTH—Tti.^
Bibliography
Abadie, J. (editor), (196?) Nonlinear Programming, (1970) Integer

and Nonlinear Progranuning, North Holland.
Ablow, C. M. and Brigham, G. (1955). An analog solution of programming

problems. Op. Res., 3, pp 388-394.
Akaike, H. (1959). On a successive transformation of probability-

distribution and its application to the analysis of the optimal
gradient method. Ann. Inst. Stat. Math., Tokyoy, 11, p 1.
Allran, R. R. and Johnsen, S. E. J. (1970). An algorithm for solving

nonlinear programming problems subject to nonlinear inequality
constraints. Comp. J., 13(2), pp 171-177.
Arrow, K. J. and Hurwicz, L. (1956). Reduction of constrained maxima

to saddle point problems; in "Proceedings of the Third Berkeley
Symposium on Mathematical Statistics and Probability", ed.
J. Neyman. University of California Press, pp 1-20.
Arrow, K. J., Hurwicz, L. and Uzawa, H. (1958). "Studies in Linear

and Nonlinear Programming". Stanford University Press.
Arrow, K. J., Hurwicz, L. and Uzawa, H. (1961). Constraint qualifica-

tions in maximization problems. Nav. Res. Log. Q., 8, pp 175-191.
Bard, Y. (1968). On a numerical instability of Davidon-like methods.

Maths. Comp., 22 (103), pp 665-666.
Bard, Y. (1970). Comparison of gradient methods for the solution of

nonlinear parameter estimation problems. SIAM J. Numer. Anal.,
7 (1), pp 157-186.
Bartels, R. H., Golub, G. H., and Saunders, M. A. (1970), Numerical

techniques in Mathematical Programming, in Nonlinear Prograxmning,
Academic Press.
Bazaraa, M. S. Goode, J. J., Nashed, M. Z., and Shetty, C. M. (1971).

Nonlinear programming without differentiability in Banach spaces:
Necessary and sufficient constraint qualification. Applicable
Analysis, to appear.
129
■üV ]iniiMiiv,r 1 toith^ nrWiaiiMAa^^^^.^,.^^.^.^^^.....-.--... •-•■'-■•■--'nnnri'r'-lBliidi-'iriitl.iifM rîii

miti^^mmm^mfmßßmimsKiß^vmm wmrmmmmmfmmjmmm - -
Bealo, E. M. L. (1955). On minimizing a convex function subject to

linear inequalities. J. Roy. Stat. Soc, Ser. B, 17 (2), pp
175-184.
Beale, E. M. L. (1959). Numerical methods; in "Nonlinear Prograraning",

ed. J. Adadie. North-Holland Publishing Co., p 150.
Bellmore, M., Greenberg, H. J. and Jarvis, J. J. (1970). Generalized

penalty function concepts in mathematical optimization.
Op. Res., 18 (2), pp 229-252.
Beltrami, E. J. (1969). A constructive proof of the Kuhn-Tucker

multiplier rule. J. Math. Anal. Appl., 26, pp 297-306.
Beltrami, K. J. (1970). "An Algorithmic Approach to Nonlinear

Analysis and Optimization". Academic Press.
U
Beltrami, E. J. and McGill, R. (1966). A class of variational
problems in search theory and the maximum principle. Op. Res.,
Ik (2), pp 267-278.
Box, M. J. (1966). A comparison of several current optimization

methods and the use of transformations in constrained problems.
Comp. J., 9, pp 67-77.
Box, M. J., Davies, D. and Swann, W. H. (1969). "Nonlinear Optimiza-

tion Techniques". ICI Monograph, Oliver and Boyd.
Bracken, J. and McCorraick, G. P. (1968). "Selected Applications of

Nonlinear rrogrammin^;". Wiley.
Hroydon, C. G. (1965). A class of methods for solving nonlinear

simultaneous equations. Math. Comp., 19 (92), pp 577-593.
Broyden, C. G. (1967). Quasi-Newton methods and their application to
function minimization. Math. Comp., 21 (19), PP 368-381.
Broyden, C. G. (1970a). The convergence of a class of double-rank
minimization algorithms; 1. general considerations. J. Inst.
Math. Appl., 6 (1) pp 76-90.
Broyden, C. G. (1970b). The convergence of single-rank quasi-Newton
methods. Math. Comp., 2k (110), pp 365-382.
150
-
ûiT.ww^*>H»i..IiJ,W^Li*wii.i»!JM.^,p*.-^.#rtpwT^-v-WBB-- »iLmi^v^m^w^trvn^ jin.ij»mw,mwyJ.f w- ^mmmmim/WW ^.'l-H -n-uiiwi ■■ —I u^ I .^MIM^W i» M-X^ÛJ.M ,m. . n K W'vmiJS.JqFinjy*
Bui Trong Lieu and Hoard, P. (1966). La m^thode des centres dans un
espace topologique. Num^r. Math., 8 (l), pp 56-67.
Butler, T. and Martin, A. V. (1962). On a method of Courant for

minimizing functionals. J. Math. Fhys., kl, pp 291-299.
Canrp. G. D. (1955). Inequality-constrained stationary value problems.

J. Op. Res. Soc. Am., 3, pp 5^8-550.
Carrol, C. W. (1961). The created response surface technique for

optimizing nonlinear restrained systems. Op. Res., 9 (2),
pp 169-184.
Charnes, A. and Lemke, C. E. (19510. Minimization of nonlinear

separable convex functionals. Nav. Res. Log. Q., 1, pp 301-302.
Cheney, E. W. and Goldstein, A. A. (1959). Newton's method for convex

programming and Tchebycheff approximation. Numer. Math., 1,
pp 25^-268.
Colville, A. R. (1968). A comparative study on nonlinear progranming

codes. IBM New York Scientific Centre Tech. Rep. No. 320-2949.
Courant, R. (1943). Variational methods for the solution of problems

of equilibrium and vibrations. Bull. Am. Math. Soc, 49, PP 1-23.
Crockett, J. B. and Chernoff, H. (1955). Gradient methods of maximiza-

tion. Pac. J. Math., 5, PP 33-50.
Curry, H. B. (1944). The method of steepest descent for non-linear

minimization problems. Quart. Appl. Math., 2 (3), pp 250-261.
Daniel, J. W. (1967a). Ihe conjugate gradient method for linear and

nonlinear operator equations. SIAM J. Numer. Anal., 4, pp 10-26.
Daniel, J. W. (1967b). Convergence of the conjugate gradient method

with computationally convenient modifications. Numer. Math., 10,
pp 125-131.
Dantzig, G. B. (1951). Maximization of a linear function of variables
subject to linear inequalities: in "Activity Analysis of Produc-
tion and Allocation", ed. T. C. Koopmans. Cowles Ccnanission
Monograph No. 13, Wiley.
131
I—,H.I- -i.-!.—!..!'!- ■ ^'XI"."L..I.. m^mmm^îm*.: . —• T"-■■-■» ^!"M .T-—»r—wi-c-^-f ■
Dantzig, G. B. (1963). Variable metric method for minimization.

U.S.A.E.C. Research and Development Rep. ANL-5990, Argonne
National Laboratory.
Davidon, W. C. (1968). Variance algorithm for minimization. Comp. J.,

10, pp 406-M0.
Dorn, W. S. (i960). Duality in quadratic programming. Quart. Appl.

Math., 18, pp 155-162.
Edelhaum, T. N. (1962). Iheory of maxima and minima: in "Optimiza-

tion Techniques", ed. G. Leitma-m. Academic Press.
El-Hodiri, Mohamed A. (1971). Constrained Extrema, Lecture Notes in

Operations Research and Mathematical Systems 56, Springer-Verlag.
Kvans, J. P. and Gould, F. J. (1970). Stability in non-linear

programming. Op. Res., 18 (l), pp 107-118. wJ
Kaure, P. and Huard, P. (1966). Resultats nouveaux relatifs A la

mdthode des centres. Quatri&ne Conference de Recherche
Op^rationelle, Cambridge, Mass.
Fiacco, A. V. (1967). Sequential unconstrained minimization methods

for nonlinear programming. Fh.D. Dissertation, Northwestern
University, 111.
Fiacco, A. V. (1970). Penalty methods for mathematical programming in
!;n with general constraint sets. J. Opt. Ih. Appl., 6 (3),
pp 2l32-268.
Fiacco, A. V. and McCormick, G. P. (1963). Programming under non-

linear constraints by unconstrained minimization: a primal-dual
method. Tech. Paper RAC-TP-96, Research Analysis Corporation.
Fiacco, A. V. and McConnick, G. P. (196â). The sequential unconstrained

minimization technique for nonlinear programming: a primal-dual
method. Man. Sei., 10 (2), pp 360-366.
Kiacco, A. V. and McCormick, G. P. (1964b). Computational algorithm

for the sequential unconstrained minimisation technique for
nonlinear programming. Man. Sei., 10 (2), pp 601-617.
152
r- —■■ • '■--- ■ ■■-- — ■—"——

iJLI »JM,immm^wmwwitmmmmrwwmmmfm^'•PBPPWWWWB«« J!ijMiiiJii.«ii.ji«."iuui.ju^«M.«iii«|j('w^wiwiHuuwuiuii4iiJiw»w!ii«uiiBjii.^
Fiacco, A. V. and McCormick, G. P. (1966). Extensions of SUMT for

nonlinear programming: equality constraints and extrapolation.
Man. Sei., 12 (ll), pp 816-829.
Fiacco, A. V. and McCormick, G. P. (1967). The slacked unconstrained
minimization technique for convex programming. SXAM J. Appo.
Math., 15 (3), pp 505-515.
Fiacco, A. V. and McCormick, G. P. (1968). "Nonlinear Programming:
Sequential Unconstrained Minimization Techniques". Wiley.
Fletcher, R. (1965). Function minimization without evaluating

derivatives - a review. Corap. J., 8, pp 33-41.
Fletcher, R. (1968). Programming under linear equality and inequality

constraints. ICI Rep. No. MSDH/68/19.
Fletcher, R. (Ed.) (1969a). "Optimization". Academic Press.
Fletcher, R. (1969b). A review of methods for unconstrained optimiza-

tion: in "Optimization", ed. R. Fletcher. Academic Press.
Fletcher, R. (1969c). A new approach to variable metric algorithms.

AERE Tech. Rep. No. 383.
Fletcher, R. (1970). A general quadratic programming algorithm.

AERE Tech. Rep. No. hOl.
Fletcher, R. (1970). A new approach to variable metric algorithms.

Computer J., 13, pp 317-322.
Fletcher, R. and McCann, A. P. (1969). Acceleration techniques for

nonlinear programming: in "Optimization", ed. R. Fletcher.
Academic Press.
Fletcher, R. and Powell, M. J. D. (1963). A rapidly convergent
descent method for minimization. Comp. J., 6, pp 163-168.
Fletcher, R. and Reeves, C. M. (1964). Function minimization by

conjugate gradients. Comp. J., 7, pp 1^9- 154.
Forsythe, G. E. (1967). On the asymptotic directions of the s-

diraensional optimum gradient method. Tech. Rep. No. CS6l,
Stanford University.
135
gjMMiriaMjag^g-a,,-.—,,—.-„,.,....,.....,^..„—
nT^rvi^T-T^^gT^-W-^yTtrwv^f^TTnr^
Forsythe, G. E. and Motzkin, T. S. (1951). Asymptotic properties of

the optimum gradient method (abstract). Bull. Am. Math. Soc,
57, P 183.
Frisch, K. R. (1955). The logarithmic potential method of convex

programming. Memorandum May 13, 1955, University Institute of
Economics, Oslo.
Geoffrion, A. M. (1966). Strictly concave parametric programming,

part 1: basic theory. Man. Sei., 13, pp 244-253.
Geoff rion, A. M. (1967). Strictly concave parametric programming,

part 2: additional theory and computational considerations.
Man. Sei., 13 (5), PP 359-370.
Gill, P. E. and Murray, W. (1970). Stable implementation of unconstrained
optimization algorithms, NFL report Maths. 97-
Goldfarb, D. (1969a). Sufficient conditions for the convergence of a

variable metric algorithm: in "Optimization", ed. R. Fletcher.
Academic Press.
Goldfarb, D. (1969b). Extension of Davidon's variable metric method

to maximization under linear inequality and equality constraints.
SIAM J. Appl. Math., 17, pp 739-764.
Goldfarb, D. (1970). A family of variable-metric methods derived by

variational means. Math. Comp., 24, pp 23-26.
Goldfarb, D. and Lapidus, L. (1968). Conjugate gradient method for

nonlinear programming problems with linear constraints.
I. and E. C. Fundamentals, 7 (l), PP 142-151.
Goldstein, A. A. (1962). Cauchy's method of iminmization. Numer.

Math., 4, pp 146-150.
Goldstein, A. A. (1965). On steepest descent, SIAM J. Control, 3,

pp 147-151.
Goldstein, A. A. (1968). Constructive Real Analysis, Harper and Row.
Goldstein, A. A. and Price, J. (1967). An effective algorithm for

minimization, Num. Math., 10, pp. 184-189.
154
^JICiHIIBIUMlMUMKjMMaUifi
||pWPPîWM^|iiWJ!»'r''l'W<IIIJy»l!F. I»1'IP|MJIIWIIIIIÎUILII|1PI!|^»1.,W|IW>II.|II»H,BI|^MÎ''«WIH"MPI"'J'I|.B .-H UPTB'»!»!'!!".'"Hili.imiiyiUmMCT.'ilHI.illH iiiUliW|^»,».>.PH|.|.iiiHIII-i].iHllii IIIIIIIIH^PIJ.HIIIJIWJIIIIIIIIII ! uni iji.i.np.
Golub, G. H., and Saunders, M. A. (1969). Numerical methods for

solving linear least squares problems, Computer Science Dept.
Tech. Rep. CS134, Stanford University.
Gould, F. J. and Tolle, J. W. (1971). A necessary and sufficient

constraint qualification, SIAM J. Appl. Maths. 20, pp 164-172.
Greenstadt, J. (1967). On the relative efficiencies of gradient

method?.. Math. Ccanp., 21, pp 360-367.
Greenstadt, J. (1970). Variations on variable-metric methods.

Math. Comp., 24, pp 1-25.
Haarhoff, P. C. and Buys, J. D. (1970). A new method for the

optimization of a nonlinear function subject to nonlinear
constraints. Comp. J., 13 (2), pp 178-184.
hadley, G. (1964). "Nonlinear and dynamic Programming". Addison-

Wesley.
Hartley, H. 0. and Hocking, R. R. (1963). Convex programming by

tangential approximation. Man. Sei., 9» PP 600-612.
Hestenes, M. R. (1966). "Calculus of Variations and Optimal Control

Theory". Wiley.
Huang, H. Y. (1969). Unified approach to quadratically convergent

algorithms for function minlmization. J.O.T.A., 5, pp 405-425.
Huang, H. Y. and Levy, A. V. (1969). Numerical experiments on

quadratically convergent algorithms for function minimization.
Aero-Astronautics Rep. No. 66, Rice: University.
Huard, P. (1964). Resolution de programmes math6natiques & constraintes

non-linöaires par la m£thode des centres. Note E.D.F., HR5690/317.
Huard, P. (1967). Resolution of mathematical programming with

nonlinear constraints by the method of centres; in "Nonlinear
Programming", ed. J. Adadie. North-Holland Pub. Co., Amsterdam.
Huard, P. (1968). Programmation math6natique convex. R.I.R.O., 7»

pp 43-59.
f 155
-"'•"""Tr ' " ' ■■'■ ' '_
liiir-T T'"irms: If^wwtfWii 'yr^r*riwv!,»A'mjn"mmi
John, J1'. (19^8). Extremum problems with inequalities as subsidiary

conditions: in "studies and Essays". Courant Anniversary Volume,
interscience, pp 187-204.
Kelley, )I. J. (1962). Methods of gradients: in "Optimization

Techniques", ed. G. Leitmann. Academic Press.
Kelley, J. E., Jr. (i960). The cutting plane method for solving
convex programs. J. Soc. Ind. Appl. Math., 8 (4), pp 703-712.
Kowalik, J. (1966). Nonlinear programming procedures and design

optimization. Acta Polytechnica Scandinavica, 13, Trondheim.
Kowalik, J. and Osborne, M. R. (1968). "Methods for Unconstrained

Problems". American Elsevier.
Kowalik, J., Osborne, M. R. and Ryan, D. M. (1969). A new method for

constrained optimization problems. Op. Res., 17 (6), pp 973-983.
Kuhn, H. W. and Tucker, A. W. (1951). Non-linear Programming: in

"Proceedings of the Second Berkeley Symposium on Mathematical
statistics and Probability", ed. J. Neyman. University of
California Press, pp ^81-493.
Kunzi, 11. P. and Oettli, W. (1969). Nichtlineare Optimierung,

Lecture Notes in Operations Research and Mathematical Systems 16,
Springer-Verlag.
Lemke, G. J'.'. (1962). A method for solution of quadratic programming

problems. Man. 3ci., 8, pp 4^2-453.
Lootsma, b''. A. (1967). Logarithmic programming: a method of solving

nonlinear-programming problems. Philips Res. Rep., 22 (3),
pp 329-344.
Lootsma, F. A. (1968a). Extrapolation in logarithmic programming.

Hiilips Res. Res. Rep., 23, pp IO8-II6.
Lootsma, F. A. (1968b). Constrained optimization via penalty functions.

Philips Res. Rep., 23, pp 408-423.
Lootsma, F. A. (1969). Hessian matrices of penalty functions for

colving constrained-optimization problems. Philips Res. Resp.,
24, pp 322-331.
156
m- - -' - " ... .-.^.^ ■ .- ........±.

IIIIÜPUHHIII IHHlJIUlill.lilHHIIIIpiTWHWWIWiWi.l'Wll' I'll. Mi|illi.lHipiBpiUHHifllMI»»»lli|i|ll"li'lH.lni|lv.Pî.,»nwi ill i<iM>i|lwvi-»iii;i.iiii-«li ill» J l|jiîiw;ill|l,,||l|iwil|imwwil|IWfl|lllll||l|ffWIW^«^tlil|l|l|llfl| IMIII PHHpm
—— m—ii ■ — mmimmmmmmm—mmmmm
Lootsma, F. A. (1970). Boundary properties of penalty functions

for constrained minimization. Philips Res. Resp. Supp. No. 3.
Lootsma, F. A. (editor), (1972). Numerical Methods for Nonlinear

Optimization, Academic Press.
Luenburger, D. G. (1969). "Optimization by Vector Space Methods".

Wiley.
Mangasarian, 0. L. (1969). Nonlinear Programming. McGraw-Hill.
Mangasarian, 0. L. and Fromowitz, S. (1967). The Fritz Jo'm necessary

optimality conditions in the presence of equality and inequality
constraints. J. Math. Anal. Appl., 17 (l), PP 37-V7.
McCormick, G. P. (1969). The rate of convergence of the reset
Davidon variable metric method. MRC Tech. Rep. No. 1012,
University of Wisconsin.
McConnick, G. P. and Pearson, J. D. (1969). Variable metric methods

and unconstrained optimization: in "Optimization", ed. R. Fletcher.
Academic Press.
Meyers, G. E. (1968). Properties of the conjugate-gradient and

Davidon methods. J. Opt. Th. Appl., 2 (k), pp 209-219.
Morrison, D. D. (1968). Optimization by least squares. SIAM J.

Numer. Anal., 5 (l), PP 83-88.. """"
Murray, W. (1969a). Indefinite quadratic programming. Nat. Phys.

Lab. Rep. Ma 76.
Murray, W. (1969b). Behaviour of Hessian matrices of barrier and

penalty functions arising in optimization. Nat. Phys. Lab.
Rep. Ma 77.
Murray. W. (1969c). Constrained Optimization. Nat. Fhys, Lab. Rep.

Ma 79.
Murtagh, B. A. and Sargent, R. W. H. (1969). A constrained minimiza-
tion method with quadratic convergence; in "Optimization, ed.
R. Fletcher, Academic Press.
Murtagh, B. A. and Sargent, R. W. H. (1970). Computational experience

■with quadratically convergent minimization methods. Comp. J.,
13 (2), pp 185-19^.
157
■^■■■,J.--..^-,-.^—^ .. . ...
rw»w-Tyf')^l'.lw'"g',WWf^,w' v.^ffivm-'•'<;.iw.'T>'r ^.-TiP-wTl"i""W'"TT-wiw"w.^.w-F^'"wl.^l.'mn-,'i'>in' "' im'iû'i'■.■■".'■■'WTI»..HI"I.BI,HI ^PI ,fi«ii H MM II' I.II«I ^.*m-*.•l'.y■^r*•w*^•'^'w••*rl•*":l"^^" n'>vi"\.n"H'm\\wm'^^ w*mwm\ttm%^t*fimxiHli
Neider, J. A. and Mead, R. (1965). A sluiplex method for function

minimization. Corap. J., 7, pp 308-313.
Osborne, M. R. and Ryan, D. M. (1970a). On penalty function methods

for nonlinear programming problems. J. Math. Anal. APPl.> 31 (3),
PP 559-578.
Osborne, M. R. and Ryan, D. M. (1970b). An algorithm for nonlinear

programming. Tech. Rep. No. 35, Computer Centre, A.N.U.
Osborne, M. R. and Ryan, D. M. (1972). A hybrid method for nonlinear

programming, In Numerical Methods for Nonlinear Optimization
(ed. F. A. Lootsma), Academic Press.
Ostrowski, A. M. (1966). Solution of Equations and Systems of

Wquations (2nd Edition), Academic Press.
Parisot, G. R. (1961). Resolution numärique approchde du probl&ne

de programmation linâire par application de la programmation
logarithmique. Rev. Fr. Recherche Opdrationelle, 20, pp 227-259.
Pearson, J. D. (1969). On variable metric methods of minimization.

Corap. J., 12, pp 171-178.
Pietrzykowski, T. (1962). Application of the steepest descent method

to concave programming; in "Proceedings of IFIPS Congress",
M\mich, North-Holland Pub. Co., pp 185-189.
Polak, K. and Ribi&re, G. (1969). Note sur la convergence de m£thodes

de directions conjuguês. R.I.R.O., 3, pp 35-^3.
Pomentale, T. (1965). A new method for solving conditioned maxima

problems. J. Math. Anal. Appl., 10, pp 216-220.
Powell, M. J. D. (1967). A method for nonlinear constraints in

minimization problems. AERE Tech. Rep. No. 310.
Powell, M. J. D. (I968). A survey of numerical methods for unconstrained

optimization. AERE Tech. Rep. No. 340.
Powell, M. J. D. (1969a). Rank one methods for unconstrained optimiza-

tion. AERE Tech. Rep. No. 372.
138
J i
■ MM -^-•- ■■■-■■ . ■ ■'■■+■ ■■ -^ ■■-..::.. ■ ■ J- ^—... .::■ .;.....^-...-^

Iill».^.^!«^"..^«»..!...!.-■».,. WIW»i--li.W''''^-....'r.>.T«1.|.IWTl»T-^îraw^
Powell, M. J. D. (1969b). On the convergence of the variable metric

algorithm. AERE Tech. Rep. No. 382.
Powell, M. J. D. (1970). A new algorithm for unconstrained optimiza-

tion. AERE Tech. Rep. No. 393.
Rosen, E. M. (1966). A review of quasi-Newton methods in nonlinear

equation solving and unconstrained optimization; in "Proceedings
21st National Conference of the A.CM.". Thompson Book Co.,
pp 37-41.
Rosen, J. B. (i960). The gradient projection method for nonlinear

programming, part 1: linear constraints. J. Soc. Ind. Appl.
Math., 8 (1), pp 181-217.
Rosen, J, B. and Suzuki, S. (1965). Construction of nonlinear

programming test problems. Comm. A.CM., 8, p 113.
Rosenbrock, H. H. (i960). An automatic method for finding the

greatest or least value of a function. Corap. J., 3, pp 175-184.
Rudin, W. (1953). "Principles of Mathematical Analysis". McGraw-

Hill, N. Y.
Schmit, L. A. and Fox R. L. (1965). Integrated approach to structural

symhesis, AIAA J., 3 (6), pp 1104-1112.
Shah, B. V., Buehler, R. J. and Kempthorne, 0. (1964). Some algorithms

for minimizing a function of several variables. J. Soc. Ind.
Appl. Math., 12, pp 7^-92.
Spang, H. A. (1962). A review of minimization techniques for nonlinear

functions. SIAM Review, k (4), pp 343-365.
Stoer, J. (1971). On the numerical solution of constrained loast

squares problems, SIAM J. Numerical Analysis, 8, pp 382-4x1.
Strong, R. E. (1965). A note on the sequential unconstrained minimiza-

tion technique for non-linear programming. Man. Sei., 12 (l),
pp 142-144.
Tremoli^res, R. (1968). La m^thode des centre & troncature variable.

Th^se, Paris.
159
ÜIMttÜiÜültt "ilii^&Mfiaiiiiii'ii' 1.1111 - -^-- ^--■III'»,, 1

p«IJWHWi|i||IHIi|iT»IHIIi,llil|lll.iii||IIWil»HI inilllim^niillulliKiJil.i I,ILH!»HH.II|IM»HWPII«M»H.J iummi p.. inn Wim>iiii|ni;iniiin|||ii.i|lip„pnip.ww.,|i| ,, nmij i,,^,,., .1 ^ , i.,,--..!...... ■.„., hlf in, mg-m
Whittle, P. (1971). "Optimization under Constraints", Wiley.
Wolfe, P. (1959) • The simplex method for quadratic progrananing.

Econometrica, 27 (3), pp 382-398.
Wolfe, P. (1961). A duality theorem for nonlinear programning.

Quart. Appl. Math., 19 (3), pp 239-24^.
Zangwill, W. I. (1967). Nonlinear programming via penalty functions.
Man. Sei., 13 (5), PP 344-358.
Zangwill, W. I. (1969). "Nonlinear Programming: A Unified Approach".
Prentice-Hall.
Zoutendljk, G. (i960). "Methods of Feasible Directions". Elsevier.
IkO
M
—-■-TMMaMllllMH.Itollil ■ 1 I —^■-

Ad 0744313

Uploaded by

Copyright:

Available Formats

You might also like

Ad 0744313

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ad 0744313

Uploaded by

Copyright:

Available Formats

nilwpip«pipp*pnimnmn>P«<n^nH"mRip*i*ipnin)nHiP)9 illipwPWilW"IPW!

COMPUTER SCIENCE DEPARTMENT

4 DESCRIPTIVE NOTts (Typ* ol r»port mnd Ineluiiv» dmfi)

6 REPORT 0*TE ?a. TOTAL NO. OP PACES 7b. NO. OF REFS

Approved for public release; distribution unlimited.

These notes are based on a course of lectures given at Stanford,

...t„....-,.,^,..,..,Jl.MJ..A^.^,J......^^„-||r|.|| ;.,,ri,,.,.. L;i..-i-...^^-'.. ■ .^ .: .... _L

Mathematical Programming, unconstrained

DD .^..1473 <BACK) Unclassified

f,....-, .L ■,.■... ... . ■..

Computer Science Department

Australian National University-

These notes are based on a course of lectures given at Stanford,

and cover three major topics relevant to optimization theory. First

an introduction is given to those results in mathematical programming

which appear to be most important for the development and analysis of

practical algorithms. Next unconstrained optimization problems are

considered. The main emphasis is on that subclass of descent methods

which (a) requires the evaluation of first derivatives of the objective

methods. Numerical results obtained using a program based on this

material are discussed in an Appendix. In the third section, penalty

and barrier function methods for mathematical programming problems are

studied in some detail, and possible methods for accelerating their

This research was supported in part by the National. Science Foundation

These notes were prepared for a course on optimization given in

the Computer Science Department at Stanford University during the fall

year of study in numerical analysis funded by the United Kingdon Science

Research Council at the University of Dundee, and on courses given at the

Australian National University.

The choice of material has been regulated by limitations of time as

well as by personal preference. Also, much material appropriate to the

was covered in the parallel course on numerical linear algebra given by

Professor Golub. Thus, despite some ambition to cover a larger range,

the course eventually consisted of three main sections. These notes

cover these sections and have been supplemented by brief additional

comments and a list of references. A more extensive bibliography is

also included. This is an amended version of a bibliography prepared

by my former student Dr. D. M. Ryan.

The first section is intended to provide a solid introduction to

the main results in mathematical programming (or at least to those results

of practical algorithms). The main aim has been to characterize local

purposes of the remaining sections. Opportunity has been taken to

and rather complete description of the first order conditions for an

extremum. The second order conditions are also considered in detail.

The sGcond section on unconstrained optimization is largely restricted

to that imbclass of descent mothodc which (a) requires the evaluation

of first derivatives of the objective function, and (b) has some ü

family connection with the so-called conjugate direction methods. This

is an area in which there has been considerable recent activity, and

here an attempt is made both to summarize significant recent developments

with the help of M. A. Saunders) summarizes numerical results obtained

with a program based on this material. One significant omission from ^

this section is any detailed discussion of convergence. However, the

convergence of certain algorithms (those that reset the Hessian estimate

of the material given.

linear programming are considered. This turns out to be a very nice

application, in particular, of the results of the first section. These

methods have advantages of robustness and simplicity but carry a definite

cost penalty. However, attempts to remedy this situation chow some

nilwpip«pipppnimnmn>P«<n^nH"mRipi*ipnin)nHiP)9 illipwPWilW"IPW!

f(x) < f(x) VxcN(x,6) (1.2)