Ad 0744313

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 146

nilwpip«pipp*pnimnmn>P«<n^nH"mRip*i*ipnin)nHiP)9 illipwPWilW"IPW!

IWBiBiillPilWi

CO

CO TOPICS !N OPTIMIZATION

^5
BY

MICHAEL OSBORNE

STAN-CS-72-279

APRIL 1972

Reproduced by
■ I ■
NATIONAL TECHNICAL
INFORMATION SERVICE
U S Deparfment of Comm«rc«
Springfield VA22I}1

COMPUTER SCIENCE DEPARTMENT


School of Humanities and Sciences
STANFORD UNIVERSITY

D D C
ill 3 1372

nins
M
--'-'■ ...-.--.... - „,.:-... .
^
•V.Kjry^T^r'rr^ iTT-'.:'--'r--7^.-y'", V7~t^r»- i ji^^,.:'-!! j«"'"" "T""'-71'7''"" S^CTTT«»nn n-wr^T- '-T-TS-T «giimi'. in J'^Ufff) V

BWMBWWWWWWI ►'wsfftffww^ff'W^^

Unclassified
SIM .intv Classification
DOCUMENT CONTROL DATA -R&D
iSrrurity rlamtirtiion of iltlu, body ol abilrmrl «iirf Mailing tmnofllon muni be «nfrad when Hi» ov»tall rapott It clmttllM)
ONiaiN«rrNC «CTIVITV (Coipont» mulhot) 1«. «EPO«T SECURITY CLASSIFICATION
Unclassified
Stanford University 26. OftOUP

I BFPORT TITLE

Topics in Optimization

4 DESCRIPTIVE NOTts (Typ* ol r»port mnd Ineluiiv» dmfi)


Technical Report -
5 AU THORISI (Firti name, middlm inlllml, lm»l name;

Michael Osborne

6 REPORT 0*TE ?a. TOTAL NO. OP PACES 7b. NO. OF REFS


April, 1972 lUl 1U8
•a. CONTRACT OR GRANT NO •a. ORlOINATOn't REPORT NUMBER(SI
ONR N0001U-67-A-0112-OOO29 NR 0^-211
b. PROJEC T NO STAN-CS-72- 279
c
NSF GJ 29988X ab. OTHER REPORT NO(S) (Any olhtt numbar* (bal may b» attlgnad
Ihl* tapott)

10 DISTRIBUTION STATEMENT

Approved for public release; distribution unlimited.


11 *jPPLEMENTABV NOTES 12. SPONSORING MILITARY ACTIVITY
Office of Naval Research
Pasadena Branch

U ABSTRACT

These notes are based on a course of lectures given at Stanford,


and cover three major topics relevant to optimization theory. First
an introduction is given to those results in mathematical programming
which appear to be most important for the development and analysis of
practical algorithms. Next unconstrained optimization problems are
considered. The main emphasis is on that subclass of descent methods
which (a) requires the evaluation of first derivatives of the objective
function, and (b) has a family connection with the conjugate direction
methods. Numerical results obtained using a program based on this
material are discussed in an Appendix. In the third section, penalty
and barrier function methods for mathematical programming problems are
studied in some detail, and possible methods for accelerating their
convergence indicated./

X
i i

DD.'r..1473 """*
I NOV I
I)
Unclassified
S/N 0101.807.6801 Security Cltstification

...t„....-,.,^,..,..,Jl.MJ..A^.^,J......^^„-||r|.|| ;.,,ri,,.,.. L;i..-i-...^^-'.. ■ .^ .: .... _L


PPÜpignNi !VrappiiPPHmMPI|IW<( pnmn^W^P? ■wn»jj-ii;*p^.f».(iTO.i-Pw.w.i^TA'^ ^W^i,^;'W^"F!^-|!^J-i^il.iWlPl)iipWiiilWf!^iil" "'1 .. •'-"■ '■.'»■. pM . 'W " ".I WPIHUJWiii.^iiiifS»^!,.

Unclassified
Sacurity CU«»lftc>tion
LIMK * LINK • LINK C
KtV WONOt
NOLI NOLI MOLK

Mathematical Programming, unconstrained


optimization, descent methods, penalty functions
barrier functions, rate of convergence

DD .^..1473 <BACK) Unclassified


(PAGE 2) Steuritjr Clasalflcatloii

f,....-, .L ■,.■... ... . ■..


■■-'■■■-■-•'"-^ ' ■-MMKiftirriiirii ir ii nin1 m i--—^ ■'-^.-
^^^^^^ mmmmmmmnmmmm

'B??w5?(^^^!?''f?HÄ!p"(W*iHBW9'Ww i ■ iiMimmirr-'"———"

TOPICS IN OPTIMIZATION

BY

Michael Osborne

STAN-CS-72-279
April 1972

Computer Science Department


School of Humanities and Sciences
Stanford University

UL

-— ■
naMaaMMui aa— mjaaiMMMittitt 1
TOPICS IN OPTIMIZATION

Michael Osborne
Computer Science Department
Stanford University

and

Australian National University-


Canberra, Australia

Abstract

These notes are based on a course of lectures given at Stanford,

and cover three major topics relevant to optimization theory. First

an introduction is given to those results in mathematical programming

which appear to be most important for the development and analysis of

practical algorithms. Next unconstrained optimization problems are

considered. The main emphasis is on that subclass of descent methods

which (a) requires the evaluation of first derivatives of the objective

function, and (b) has a family connection with the conjugate direction

methods. Numerical results obtained using a program based on this

material are discussed in an Appendix. In the third section, penalty

and barrier function methods for mathematical programming problems are

studied in some detail, and possible methods for accelerating their

convergence indicated.

This research was supported in part by the National. Science Foundation


under grant number 29988X, and the Office of Naval Research under contract
number N-00011f-67-A-0112-00029 NR 044-211 . Reproduction in whole or in
part is permitted for any purpose of the United States Government.

^
Introduction

These notes were prepared for a course on optimization given in

the Computer Science Department at Stanford University during the fall

quarter of 1971. In part they ara based on lectures given during the

year of study in numerical analysis funded by the United Kingdon Science

Research Council at the University of Dundee, and on courses given at the

Australian National University.

The choice of material has been regulated by limitations of time as

well as by personal preference. Also, much material appropriate to the


development of algorithms for linearly constrained optimization problems

was covered in the parallel course on numerical linear algebra given by

Professor Golub. Thus, despite some ambition to cover a larger range,

the course eventually consisted of three main sections. These notes

cover these sections and have been supplemented by brief additional

comments and a list of references. A more extensive bibliography is

also included. This is an amended version of a bibliography prepared

by my former student Dr. D. M. Ryan.

The first section is intended to provide a solid introduction to

the main results in mathematical programming (or at least to those results

which appear to be the most important for the development and analysis

of practical algorithms). The main aim has been to characterize local

extrema, so that convexity and duality theory are not treated in any

great detail. However, the material given is more than adequate for the

purposes of the remaining sections. Opportunity has been taken to


prevent the recent results of Gould and Tolle which provide an accessible

and rather complete description of the first order conditions for an

extremum. The second order conditions are also considered in detail.


mmm
r3T»1WW^-T-nT-7i^«ff^TOJ'W^!W?W^,'*,^**B^^^

The sGcond section on unconstrained optimization is largely restricted

to that imbclass of descent mothodc which (a) requires the evaluation

of first derivatives of the objective function, and (b) has some ü

family connection with the so-called conjugate direction methods. This

is an area in which there has been considerable recent activity, and

here an attempt is made both to summarize significant recent developments


and to indicate their algorithmic possibilities. An appendix (prepared

with the help of M. A. Saunders) summarizes numerical results obtained

with a program based on this material. One significant omission from ^

this section is any detailed discussion of convergence. However, the

convergence of certain algorithms (those that reset the Hessian estimate


2
periodically or according to appropriate criteria) is an easy consequence

of the material given.

In the third section, penalty and barrier function methods for non-

linear programming are considered. This turns out to be a very nice

application, in particular, of the results of the first section. These

methods have advantages of robustness and simplicity but carry a definite

cost penalty. However, attempts to remedy this situation chow some

promiso. The material presented in this section has Jiiiportant connections

with other areas: for example, with the method of regularization for

the approximate solution of improperly posed problems.

Ac knowledgment s
O
The material presented here has benefited greatly from discussions

with Roger Fletcher, Gene Golub, John Greenstadt, Shmuel Oren, Mike Powell,

and Mike Saunders. The presentation of the course was shaped more by the

tüft ■- i i ■■ ■ ■■ i ■■ ■ ■ i —.■ .— == ...,■...,,., ,, t.


l^g^PPiPWPBIBPg?!^^?!^ w-tm-yii w •••mtmw -II . m ^■.•^•fnvim*'HM*.WMV?:rv^**?W^T*n^^ i»^Tr?7r'"rf?'a>»>1i7'^^>';'^™"i7r,7r-:sw"'

determination of the lecturer to cover as large a field as possible

than by any consideration for the audience. In spite of this the level

of continued interest was most gratifying, and it is a pleasure to single

out the credit students Linda Kauftaan, John Lewis and Margaret Wright

in this respect.

A special vote of thanks is required for George Forsythe who was

responsible for the invitation to Stanford, and for completing the

subsequent arrangements despite the efforts of the Australian and U.K.

Post Offices. In spite of this he was still prepared to combine with

John Herriot in keeping the lecturer in reasonable check, and to provide

the incentive to produce these notes as a CS report.

mm "trt'iir^nVi-i'fciaV! ir-iiifi i-NT! r , _;^^^.MÜ^-^»^.iWWM.*^ i


MU».''^»«'^'«'"»«^»^^ ^fuiw»'^'^^«^»!^^»^«..»^»«^

Contents

I. Introduction to Mathematical Programming 5

1. Minimum of a constrained function 6

2. Some properties of linear inequalities 12

't>. Multiplier relations ih

h. Second order conditions 20

5« Convex programming problems 25

II. Descent Methods for Unconstrained Optimization 36

1. General properties of descent methods 37

2. Methods based on conjugate directions ^

Appendix. Numerical questions relating to Fletcher's algorithm . . 68

III. Barrier and Penalty Function Methods 78

1. Basic properties of barrier functions 79

2. Multiplier relations (first order analysis) 83

3. Second order conditions 86

U. Rate of convergence results 91

5. Analysis of penalty function methods Ill

6. Accelerated penalty and barrier methods 117

Note. Equations, Theorems, etc. are numbered in sequence according to

subsection. Cross references between sections are indicated explicitly.

For example, equation 3 of subsection h in the Mathematical Programming

section is referenced as MP equation U.3 in other sections.


'i i

- - - — - ■ - ■-'■ ■■- ■■— ■-


wypj.iT,Twwiffg|,.rg^^.r.Twf'ry:^iv.,^v^ yiy0tri5.yrrj.yrgTW7-7yjy- 'iT'FtTw^f g^^-f "iWJ'.T'^'F H w ■ -j,1" .'^. ■■! i t-ufwmn'y i »y f "B w w ^"'"^ '*w w yrir fl^'-q.tf *--l.:; ~^T=- 7 ■

I. Introduction to Mathematical Programming

] i

mm *t*m iiiMttmmm ■
pitl.lilWl|UWWlW.iW

1. Minimuro of a constrained function«


Consider a function f(x)
x on S c E -* E, where S is a given
~ — n 1
point set.

Definition; x is the global mininum of f on S if

f(x*) < f(x) VxeS • (1.1)

Remark; x exists, for example, if S is finite, or if S is

compact and f(x) continuous on S .

Definition; x is a local minimum of f on S if 51 8 > 0 such that

f(x*) < f(x) VxcN(x*,6) (1.2)

where

N(x,&) = {t ; S n [t ; l|t -xlji/ < 6}} . (1.5)

If strict inequality holds in either (1.1) or (1.2) whenever x / x

then the minimum is said to be isolated.

Definition; S is convex if x ,x e S =» 9x + (l-3)x e S for 0 < 9 < 1 .

Example; If S is convex all finite combinations of points in S is


m m
again in S . That is J^\.x. eS where x. e S , ^ \. = 1 ,
i=l *-* ~' i=l
X.i —>0,' l<m<<»

Definition; f(x) is a convex function on the convex set S if

f(öx + (l-9)x2) < 9f(x1) + (l-9)f(x2) , 0 < 9 < 1 . (1.1*)

Ä 2 l/2
11*11 ^ ty,'1'-} > ^he euclidean vector norm of t .
i-1 1

«MMMMMt
-■- ■ -
PPBfWWWBg^lWy ?"• [Hp^jj^yagg—ppyi -n
■rr .y. -■■ »J -w- ^ in ■■..am

If strict inequality holds when 0 < 9 < 1 then f is strictly convex.

Say c(x) is concave (strictly concave) if -g is convex (strictly

convex).

Lemma 1.1; If f(x) is a convex function on the convex set S then

a local minimum of f is the global minimum. If f is strictly convex

then the minimum is unique.

Proof; It is necessary to consider only the case f bounded below.


-x- **
If x is a local minimum but not the global minimum g x such that
x-x* x- *
f(x ) < f(x ) . Now, by assumption, 3 6 > 0 such that f(x) > f(x )

for x r N(x ,6) . Choose 0 > 0 sufficiently small for


#*• 'V *
Ox + (,l-e)x cN(x ,5) then

(i) f(x ) < f(öx + (l-O)x ) as x is a local minimum, and

(ii) ±'(9x + (l-e)x ) < 9f(x )+ (l-9)f(x ) by convexity

< f(x ) unless f(x ) = f(x ) .

Now assume x , x both aro global minima and that f is strictly

convex. Then

l>(9x*+ (l-9)xXX) < wf(:-:,() + (l^^x**) , '» < 9 < 1

which gives a contradiction. 3

Definition; A set C is a cone with vertex at the origin if

xeC=»?wxeC, \>0. C is a cone with vertex at p if

[x-p ; xcC] is a cone with vertex at the origin.

II mm, i 1 ■■! ■■■■


^^m^^^^K^KfmK^^^^"' ' W-iu. i.^ ■»-»^.■■IUWI,U..J ^-..IIUL»^..^. i-. ........ I.,IIIIIJ ■ - ,i. . .-..-v. ji m.»......,,^,,,.,,,.,,,^.

Definition; x is in the tangent cone T(S,x ) to S at x if

3 sequences {A ] > 0 , fx } - xri , [x_} c S such that

lim llV>c-x0)-xll =0 . (1.5)


n -••

Example; (i) S = [x ; (jx - wjj = r} , T(S,x0) = tx;xT(x0-w) = 0} .

(ii) S = (x ; Hx-w'j < r} . T(S,x0) = En if x0 in interior of S ,

otherwise T(S,x ) = [x ; xT (x -w) < 0} .

Lemma 1.2; T(S,x0) is closed.

Proof; Consider a sequence (t.jeTCS^x ) suchthat ljt.-t|j -» 0 , i -♦ •

It is required to show that t eT(S,x ) . Now t. eT(S,x ) ^ 3

[\x] > 0 , [x1} c S such that lijn (|\ (x^-x ) -t.jl =0 . Prescribe

[e.} iO . Select t. such that \\t. -t\\ < £^2 , and j = i(d) such

that ||\J(xJ - x0) -tj < £^2 . Then |l\J(xJ -x0) -t|| < ^ =» t e T(S,x0) .
' 3

Leroma 1.3; (Necessary condition for a local minimum.) If f(x) e (T -'

and if x is a local minimum of f on S then yf(x )x > 0 ,

VxrT(G,x0) .

Proof; Let x be defined by sequences [\ } , [x } . As x is a local

minimum 3 & > 0 such that f(x ) > f(x0) Vx £N(xn,6) . Consider now

the restriction of the sequences (X.n } , [x


~n } suchthat x G N(x
~n ~u ,6) .

i/fTc1 at x0 if f(x) = f(x0) + vf(x0)(x-x0) + o(Hx-x0||) . Higher


2
rrder continuity classes are defined similarly. For example, f cC
if the o( ) term can be estimated in the form
| (x-x0)T 72f(x0)(x-x0) + o(l|x-x0|l2) .

8
m mmmmmiKmm wipppw^^ip^w^mnyppwiywBW"^1 «'■"■■ I"1'! V.-H.M . II,.IIII..I.II twwmpawappwiwwwwww—o

We have

0 <f(xn) -f(x0)
V.i

whence (note it is sufficient to consider x such that \\x\\ = 1 )


CJ
0 <7f(x0)\ (x -x^) + o(X ilx -x-ll)

<7f(x )x+o(l) as n -« . r-i

u
I'lxiunplc: (i) IT x ( G (the jnterior of S ) then 1(3,0 = E .

Thur. x can bo chosen arbitrarily so that Vf (x ) = 0 .

then 7 s =
(ii) If S - {x;||x-wH = r} ( >*o) ^;$T^0"^ = 0
^ '
In particular if xeT(S,x ) then -xeT(S,xn) . Thus we must have
T
7f(x0)x = 0 Yx suchthat x (x -w) = 0 . Thus Vf(x0) = a(x -w)

for some a .
Ü

(iii) If S = [x ; l|x-w|l < r} and Hx0-wH=r then


m
T(G,x ) - [x ; x (x -w) < 0} . In this case we have yf(x )x > 0

Vx euch that xT (x -w) <0 . Thus 7f(x ) = a(w - x ) for some U


nonne^ativc <.t .

Let A be a set in E
n

Definition; The polar cone to A is the set A = [x ; x y < 0 Vy e A}


*■

A has the following properties.

(i) A is a closed convex cone.

(ii) If A., c A^ then Ag c A


**
(iii) A = A if and only if A is a closed convex cone.

MMa ..^■^v ■ ... ;^^:ti--..J,/,.-..,;',,.,,,wrTl^n|-af.vBJ^»^;-.--i.--:-.i,.,-....<... .-,-.- ....„„..^ , ,...;..,. .,„.-, ^..u..,:^^


'I i ' IIIMI
WPHJBHPfPWI WUPPPUBBPWffPBWl pffBpWBfWWI WHWilBW !^j',^'l Iffwww fKjwwwjffWwwwwgww w^^wgwwwig'ywPBWW

iMWnfBMWWWW'Wggilga^W»»'!^^ WWWWWWBW^WWMWWMWWi

(iv) A = (A ) — the polar cone of the closure of the convex hull

of A . The convex hull of a set is the smallest convex set

containing it. Thus AC=nX, AcX, X convex.

(v) If A is a subspace then A1 = A .

Remark; Lemma 1.5 can be restated; • if x is a local minimum of f


* •*■ #

on S then -7f(x ) eT(S,x ) ».

Lemma l.k: If y€T(S,x ) then -y is the gradient of a function

having a local minimum on S at x. .

Remark; It is sufficient to consider the case jj y || = 1 , x0 = 0 .

Proof; Let C = (x ; x y < } e = 1,2,.... We first show that

for each e, ae(e)>0 such that N(0,e(e)) cz C . For assume this is

not the case. Then 3 {x } c E -C with x €N(0,l/p) , p = 1,2,...

such that

T
~p ~ > ^- > V = 1,2,... . (1.6)
e
itepit

The sequence ( ~2 ) is bounded and therefore contains a convergent

#
subsequence z . By definition z e T(S,0) , but, by (1.6),

zT y > -^
1 > 0

which contradicts y€T(S,0)

10

MM aaia i ...■.it-, .m -.^i..^-. -.„. -^ ..^V^V.^1.K.VI. ^.^ -,.^ aaiteiiaaaaaMt anü MMMüiitMifcflfliii
mmmm IPWW ■iiWip»|i»l!fll^.tiP>jllJliMlJ:»^»--N^.v^^iiiijitj|..|Wiji|i|Wi-iMiBiip.il||iWi.Jii.wi,,,|ig^

Now let e. = sup{e,N(0, E) c C. } . We define


K K

e^ = min(l, e^ ,

E = min e
k ^2 k-l,^k^ ' k > 1 ,
and

P(z) =2ll2 H , jlzjl >ep

z -e k+l 2 z e
k-Lzll G [
k-1 e
k " ekH e
k ' k+1 E ' 11 5 11 ^+1' ek] '

o , 11 z ll = o

It is clear that ev > 0 and e monotonically decreasing. Further


K K
P(z) > 0 , ?(z) is an increasing function of H z H , and
2
p
1I * 11
II21! < ^k ^ (f) < k-l •
u

Thuc P(z) = o(llzl|) so that vP(0) =0

Now let z = x - (x y)y . We show that, under appropriate conditions,


T T v i
x y < P(z) . It is sufficient to consider x y > 0 , and in this case

iM!-xTy < 1MI < ll^ll + fy . (1.7)

X
If xeC then xT y < —^
11 11 • , ^ vre have
Using (I.?)

^11^11 < Ml < ^|!x (1.8)

Now assume xeN(0, e) , e < E, • Then jj x H e [ e^.-,, £k] for some k>5

whence x c C. . This gives

ii

i^amMim
mimammatiaiiiamamiikäutik maMüMafaMfeMMÜlMIMIMi -'■■—'^Mi-iinf in-m-^:--^- -■■- A. .... .....■., ■....;.;.- ... ..^L...
,
^^^^*^^^^^^^WWIWWIW^tWWWP^»wiw Mimmn wmun-iM wn ■■■winu i... nn.uBmi mi. HIP.. ■!. -. ->^-^~ . in

e > E whence
However, II f II > -^ kfl k+2

2|| 2 ||
P(z) > ^ (1-^)

so that, combining (1.9) and (1.10)

xTy < ^jy P(z) < P(z) . (1.11)

Thus the function

has a local minimum on S at x = 0 . Further feC at 0 , and

7f(0) = -y . Q

2. Some properties of linear inequalities.


T
Definition; The set H(u,v) = (x ;u x = v} is a hypsrplane. Note that

the hyperplane separates E into two disjoint half spaces


T T
R4 = [x ; u x > v} ^ R^ = [x ; u x < v} •

Lemma 2.1: (lemma of separating hyperplane). Let S be a closed convex set

in E , and let x ^S . Then 3 a hyperplane separating x and S .

Proof: Let x be any point in S . Then min jjx - x jj < |jx1 - xJ| = r .
~-L xe S ~ ~U ~ ~

The function \\x -xJj is continuous on the closed set S (1 tx 5 Hx "xoll S T)


*
and hence the minimum is attained. Let this point be x . From

Figure 2.1 it is suggested that

(x-x*)T(x*-x ) =0 (2.1)

12

^.■rtViftt.»-^^^.-^^^ H —.■•■■■■^-^-tii.riiiiiirfi.'niiittuiiiiUKi mi j , , ,.11


mtmm w^m&mmmm^iFmsmmwimi'rw*mmmmiinm, uim.mr'w'^vrvvi-m.mw. -

■\ % T1 , *
(x-^ )-(x -x0) = C

Figure 2.1 u

is an appropriate hyperplane. To verify this, note that X0GR_ SO

that it remains to show that S c R+ . Let xeS then for 0 < 9 < 1 ,

|l9x+(l-9)/-x0|l2 > |lx*-x0ll2

a
so that

.2,1jx-x
_*II2 . ^/_. .*\T/.*
|f+29(x-x )1(x -x0) > 0

and, letting 9 -. 0 , D

(x-x )1(x -x0) >0

whence x c R+ . Q

Definition: C is finitely generated if


P
C = [x;x=y[\. c.,X. >0,i= 1,2, ...,p} . It is clear that C
J

is a cone. It can he shown that C is closed.

Lemma 2.2; (Farkas Lemma). Let A be a pxn matrix. If for every


■. •

solution y of the system of linear inequalities

Ay > 0 (2.2)

15

"^"m"- ,.M.Tl..i,M.. r..^^„.,,...,„-..,.,.. .|.|


" liM l<lllfl
' <' mmammmmmmm ■mamigajBBäsm .,,.^..... ^^^/g
mni ^■.piwwiiUM , ...i.m i ij mn.i ..^r... ,..,•
üffRVWff^imRnivq^wvpip7^ wmfmmwmamwiimmm*ii*vm*f*vmKrimi*rmw<**w**i**r**rw- ■

W/m^mmmK^mmmMmmmmmmmmmmmmmmmmmmmmmmmmmmmmmium m*mmmmmimm0mmmm*mmm»mmmmmmmmim0K$HRGtßi «»iflirwjWW^^^'ftWI'WWI'ffll

it is true that

aTy > 0 (2.5)

T
then a x > 0 such that A x = a .

Proof; Let C be the cone generated by p.(A) , i = 1,2, ...,p .

Then the result of Parkas lemma is that if (2.2) =» (2.5) then acC .

We assume a^C and seek a contradiction. By Lemma 2.1 there exists

a separating hyperplane. To construct it let x be the closest point

in C to a . Then IJXx* - a||2 has a minimum at \ = 1 . Differentiating

and setting \ = 1 gives

(x - a) x - 0 (2.10

By (2.1) the equation of the separating hyperplane is

(x-x ) (x -a) = x (x -a) = 0 (2.5)

which shows that it passes through the origin.

By Lemma 2.1 C c R+ whence

vTA(x - a) > 0

for arbitrary v > 0 so that

A(x - a) > 0 , (2.6)

but aeR whence

aT(x* -a) < 0 (2.7)

which gives the desired contradiction. D

15.1

t^-^^.^ mMMmmmmmtmMmmkitäiMmaäaiiiiummi^^ -^.^u..-^.^,^.^.,,^..,.^,.,.^.,:,.,,,...,.^:,.,


i^^!»)^^^PW^^»P«!I^T^Wll1wrg'11 .'I.'.I','»NW.il'iiJIWII'IT"1"""" .■'"■'"" '■■ '•■'•" »l».»'l.!i'ilP»«,WiH»"HW.HBW"""g"' ^-'l-'■»■■■i'll'"-T«T7^-f ^v. mi .I.UI^.I.;^

Remark; Another way of looking at this result is that at most one of

the following pair of systems can have a solution.

(1) Ax = b , x > 0

(ii) ATy > 0 , bTy < 0 .

This is an example of a 'theorem of the alternative'.

3. Multiplier relations.

We consider now the mathematical programming problem (MPP)

min f(x) subject to

g1(x) > 0 , i- ^ ,

hi(x) = 0 , i cI2

Wc assume that f , g. , i G I, , and h. , i c !„ , are in C and that


' 0i ' 1 ' i 2 '
Llie constraints on the problem are not contradictory. This corresponds

to the problem discussed in Section 1 with S given by

S = [x ; gi(x) > 0 , i c^ , h^x) = 0 , i e I2} (5.1)

At any point x ?S let Bn be the index set for the constraints

:at icfying g.(x ) = 0 . If i c Bn we say that g. is active at x

Definition: S is Lagrange regular at x. iff for every f such that

(i) f has a minimum on S at x , and (ii) f cC at x (i.e.,

fcF ) 3 u , v such that

(i) vf(x0) = £ ui76i^+ X vi7hi^0^ (5.2)


ieB. id.

€ B
(ii) ^ > 0 , i o •

Ik

frMMmMMmMMr iii.iaifivirniri .n.-n ,«*Mläilimililtlmaaiinitttl-m*mr. - • .n. mi i--........^-^^-..,^^..,,,...,. ■...,J,„....J..„....-„,..,J:...,^,„^.„... ....■■-,.».„.,._,.i.....,^


m^^*mmmmfmmm UWMi Ji H ii mm. IHUIIIIUIPIJIIIIJIII .>|ijyj|>|l|jii.iiu^i||i)M>i!Biiii>iBiwiiajiiiWiii,iiljA.yiW.|,,i|1,ii,.iT,w

mmmmmmmmmmmmm

This can also be written

(i) ^f(x0) = ifl


X Vgi(^+ iel.
^ V11!^ '

(ii) uTg(x ) = 0 , and

(iii) u > 0

where zero raultipliers axe introduced corresponding to the inactive

constraints.

Remark; If (5.2) holds for f e FQ ; then f satisfies the Kuhn-Tucker

conditions.

Example; It is important to realize that (5.2) need not hold. Consider

the MPP

min f = -x1 ,

subject to g1 = x1 > 0 , gg <= x2 > 0 , g5 = (1 - x^ - x2 > 0 .

Frcsn Figure 5.1 it is clear that the minimum is attained at x1 = 1 ,

x = 0 , and here g. and g, are active. We nave

Vg-L = -7g5 = e2

while

7f = -^

so that a relation of the form (5-2) is Impossible.

Figure 5.1

15
V
L'i>w^tfl.il,Hiyiitt-rfiWajhU,i.r^-.lv,-vr, ., ■■^^.v^^tL.i^^. ^....^„j.^^,**^^.^^.* -■^.—--^ ^k^-^-..-,.-,,..'...-.—■ ....... .,._.■.. ......
m ■ • I,. ,)M "P »i||!W>"iii!»fiJji» l ,|iiilni)M .».wiwv-v ...-wi.i.11,.. i .lim m I.UXIV PDH
^mmm wm wtm

Let

H0 = {x jy^Cx^x = 0 , iel2} ,

G = ;Vg 0 :LcB
0 fe i^0^ ^ ' o^ '

Lemma j.l; S is Lagrange regular at x iff -yf(x ) e (Gn n Hn)


v I

for all f EF .

Proof: If -7f(x0) e (GQ fl H0) then

-Vf(x0)y > 0

Y.Y suc?i that

tf'i^h ^ ü ,

Vg.1(x0)y > 0 , i r B0 .

Thus, by Fai'kas Lemma, 7f(x ) is a linear combination with nonnegative

weights of Vgi(x0) , i r B0 , and V\{x0) , -7hi(x0) , i e I2 Thus

{5.2.) holds. On the other hand, if (5.2) holds then 7f(x )y > 0 for V !

all y r G0 n H0 . 3

Remark: Lemma 5-1 shows the difficulty with the above example. Here

T ^ {x = -ae, , a > 0] , G- 0 H0 = (x = ae, , a unconstrained} . We have

T -- right half plane , (Gn 0 H.) the %n axis. By Lemma 1.4 for

every xcT there is a function with a minimum at (1,0) and such that

-yf = x . Thus the conditions of Lemma 5.1 e e not met in this case.

16

--"■"■"■-■—■"! iriiiiiUM-"—^-— — - ■- ^■^^■■^L.^;...^/^..-...-.. ■^.:. ..■...^.i.^^.i^.'v^-.J..^ .,v^..';-i.^ ■■-....; „^ .^ ■ ■■■.^:;,.^^..^..-..:;.^.U-.-.JJ^^:t.^. a


wummm '■ WgpiliplgjiPBPWipppiBpiliB WB^i^iwwwifiw^wiPwpffwwwwpqwpCTiiiBifffiwi ■■WBWWBqiWWgtBHW i»»/1 '^jf uvi i^^i ^mqp "T7" .7""rv«f»7Tii; rITTC'f •-TI"W-' .-rnK^^ 'irTTV^Mirv*"

Lemma 3.2; (GQ n HQ) C T(S,X0) .

Proof; This result follows if we show that T(S,x0) c HQ 0 G0 . If

x eT(S,x0) 3 [xn} - x , [xn} c S , {\n} > 0 such that

^Ä " ^0) ^ ^ ' We have

0 = h^^) = ^(XQ) + Vhfy) (x^ - x0) + odl^ - x0ll) , i e I2 ,

and
O^g^) =61U0)+7gi(x0)(xn-x0)+o(llxn-x0ll) , i€B0

Multiplying by \ and repeating the argument used in Lemma 1.5 we have


n

Vh^x^x = 0 , iel2 , ^(XQ)^ >0 , ieB ,

so that x e G- n H0 . D

Theorem 3.1; The set S is Lagrange regular at x0 iff

T(S,x0)* = (G0 nH0)* .

Proof; If T(S,x0)* = (G0 n H0)* then -yf(x0) e (GQ n HQ)* VfeF0 by

Lemma 1.5. Thus (5-2) holds by Lemma 5.1. If S is Lagrange regular at

x0 then by Lemma 5.1 -7f(x0) c (GQ 0 HQ) Vf eF0 . .*. , by Lemma l.k,

T(S,x0)* c (GQ 0 HQ)* . Thus T(S,XQ)* = (GQ 0 HQ)* by Lemma 5.2. 0

Remark: Conditions which ensure that S is Lagrange regular at XQ

are called restraint conditions. Theorem 5.1 gives a necessary and

sufficient restraint condition.

Corollary 5.1; (Kuhn Tucker restraint condition). If vg^x^t > 0 ,

i e B0 , and vh. (xjt = 0 , i€l0 =» t is tangent at x to a once

17

a*^ titm^ummmmm JciMniiiiri aa»Mm»^aäiu»i.v...v..-,,-.i.,-,.^w^ r ^.J..;;..^.......,.,..,...^


P^WI^WJ.WW.W*WIIP;^^W^»IHIIIIIIUI.I,»I. .H)mmvu>m\mHitiim«m.'i*»m9tmnii\i.mwv'"' ~

differentlable arc x = x(9) , x(0) = x contained in N(x0,R) for

some ft > 0 then S in lia^ran^e regular at x .

Proof; It is clear that t c T(S,x0) for consider a sequence (9^ i 0

and define [xjJ = [*(%)] , [\} = [^-} then


^n

dx(0)

Thus the Kuhn Tucker restraint condition implies (G0 0 HQ) C T(S,X0) .

The result now follows from Lemma 3.2 and Theorem 3.1. G

Lemma j.'j: Lot k.(x) c C , k.(x ) = 0 , and ^(x )t = 0 ,

i - 1,L',...,s <n . We assume ^ E > 0 such that the 7k.(x) ,

;i. 1,:',...,L; are linearly independent for j|x-x || < e . Then ^

a smooth arc x - >:(9) , x(0) -= x , such that k.(x(0)) - 0 ,

x-,.
dx(ü)
1 - l,i!, ...,s , for i:jx(9) -x i| < e and
d9
= t .

Proof: Let P(x) = KT(K KT) "1 K where p.(K) = 7k. (x) , i = 1,2, ...,s

Then x(9) can be found by integrating the differential equation

dx
= (I-P(x))t (5.3)
"d9"

subject to the initial condition x(0) = x . G

Remark: Let the k, be as given in the statement of Lemma 3.3. Then

the linear independence of the vk. in a region containing x is a

consequence of the linear independence at x . For consider the matrix

KK . At x = x this matrix is positive definite as K has rank s .

18

-n i - ikM. i-ii'Miiriini-ini - j m I agn^ni '■'^--"■■■■^-•^iiiaiiii^jii-ii^ia^jt


l.„^„.^Wi...il».l .■■»■.I.JIII»I»J|I«,H»),J.IIU..1WI..I». ilWIIJl|»|l»lJ»^JIWIIMUHWMJtJ!^^^

Thus the smallest eigenvalue is positive. Clearly it is a continuous


function of x so that it remains positive in a small enough neighborhood

of x , and in this neighborhood the 7k. (x) are linearly independent.

Lemma J.k: (Restraint condition A). S is Lagrange regular at x

if the set of vectors yg^Cx^) , XeB. , Vh.{x ) , iel2 are linearly

independent.

Proof; This is a consequence of Corollary 3.1 and Lemma 5.5« For

let t e G0 n H- , and let B(t) be the index set such that

7g.(x )t = 0 , ieB(t) . Then by Lemma 5.5 a smooth arc can be constructed

such that x = x(e)) , g.(x(e)) = 0 , ieBft) , h (x(9)) , ielp ,


äx(O)
ü.(x(e)) >0 ,
X M
id -B(t) ,
X
x(©) €N(x,6)
t*t *j
for some 8 >0 , and dQ
=t .a

Lemma 3.5: (Restraint condition B). If 7hi(xC)) , ielg are linearly

independent, and if 3 t such that 76-(xr.)t >0 , ieB. , 7h. (xn)t = 0 ,

icl? , then S is Lagrange regular at x- .


»
Proof; Assume w e GQ n H0 but w^T(S,x0) . Prescribe Uk} 10 **&
w
set k =w+ ei^ • Then vg
i(xo'wk >0
' ieB
0 ' 7h
i^0^k = 0
' ieI
2

Now construct x. = x]^(6) such that ^(0) = x , -~— = wk ,

hi(xk(e)) = 0 , iel2 , for ^(6) is some neighborhood of x0 . By

continuity there will be a subneighborhood (say N(x ,6.) for some

dx (©)
6k>0) suchthat (i) ^(^(9)) -=g >0, ieB0,and

ie
(ü) gi(xk(e))>o, VBo for
*k(e) eN
(*0,6lP ' The
^g""1611*
I I
of Corollary 5.1 now gives w cT(S,x0) . But, by construction,

{w } -♦ w . Thus w e T(S,x ) as T is closed. G

19

--*^-*----1 —■■ii'■ ■■■■■■■UM-* ■■-. - ■'JI,"';"'J"imiriiliMiMafiifiiiiiiiiaMiiii«wiiiniiiii --^h^« ^■^.;.>,„...^;...:,-.^.^ ..


PW"!'«11" ^t^^^WKinr-m.* . ..i^m^mpn i,,,, , ■ *■■*£■>?■ fW^SIB^B twi^'-Ji^u»

U. Second order conditions.

In certain cases it is possible to furthor characterize local minima

of f on S by looking at second derviative information.

Lemma h.l: Let w(x) eC , w have a local minimum on S at x0 ,


.T_2
and ^w(x0) = 0 . Then t1 7 w(x0)t > 0 Vt € T(S,x0) . If

tLT 7 2w(x0)t >0 VteT(S,x0) then 38 > 0 , m >0 suchthat

w(x) > w(x0) + m||x - x0i|ii2 , X€N(x0,6)

Proof: Let [x } , [\ } be defining sequences for t€T(S,x ) . Then

for n lar^e enough we have, as 7w(xn) = 0 ,

0 <w(xn) -w(x0) =| (^-^VwCx^C^-^+od^-xJ2)

■'• 0
- \<V(*r) "w^0^ =
2 * 7 w
(x0)t + o(l) as xn -* x0 .

Now assume t Tv 2w(x )t >0 Vt€T(S,x0) and 3 no m >0 suchthat

w(x) > w(x ) + m||x-x I] for x in any neighborhoc jf x . This

implies that for any integer q , 3 x cS such that (i) x fN(x_,l/q) , LJ

1 2
(ii) w(x ) -w(x ) <- Ijx -x-ll . Select a subsequence of the x such
-q

X - X,.
thai. C if^ 11 / - teT(S,x_) . Then (ii) =» tT7 w(x_)t < 0 which U
l-q'^O'1 J ~ ~u - -U -
c;ives a contradiction. □

Definition; The Lagrangian function associated with the MPP is given by u

£(x,u,v) = f(x) - Eu<8<(x) " E v-Jx (x) . (U.l)


^ —* <** m* . —■ X X «M • «•■ X X <M
itl iel.

20

amaBü n ■ - ■ ■ ■ l^m^m^m,mmmmmm^mmt^ UK. I ■ ' -■ - mmm^^mam^m^


n^ !■■' IIBB 11 m ' l ■■ ll" i ■TWi.i.ii>M>jiii,i!iji.J,gP..giw...^.jJ.illtl in ^t/B^i

wmm ■■■■■■■

It will frequently be convenient to suppress the dependence of x on u

and v in the case where these are Implied by the Kuhn Tucker conditions.

In this case (3-2) becomes

7l(x0) = 0 . (U.2)

Lanma U.2; Let S be Lagrange regulAr at x0 , f(x) have a local

rainlmum on S at x0 , and S1 = [x ; xcS , g^x) - 0, ICBQ} then

LT_2
t^ x(x0)t > o , n c TCS^XQ) (^.5)

Proof; Note that I = f on S1 so that X has a minlaam on S1

at x . Also, as S is Lagrange regular at x0 , 7X(x0) = 0 . Thus

the result follows from Lemma U.l. O

T 2
Remark; If Sn is Lagrange regular at x. then t 7 X(xn)t > 0 Yt

such that 7gi(x0)t = 0 , ICBQ and 7hi(x0)t = 0 , i€l2 .

Example; Consider

g
l =
X^+(X2+1)2-1 >0 , gg l-x2-(x2-l)2 >0

S is illustrated diagramatically in Figure U.l. At x1 = x2 = 0 ,



76-, = 7gp = (0*2) • However S is Lagrange regular at the origin —

for example, e. satisfies

7g > 0 7g 0 80
lf2 ' 2f2 >
that restraint condition B

applies. In this case S. is

the single point x = 6 so that


> I
TCS^ö) is null.
Figure U.l

21

^^
WMIUPPPpBUjPIIIPBimP^BBPIB " '. HUIIJI.Himi^Hlwmpiffpw.iaiilJ.il'il ■ rimiijrr.lMVW?*' '^r^-™^!-.--*^^'-*™--^,^ f~v~^™^ r:r ■.-.^r.m.i-B.raJOT

Lemma >io; If t 7 ,£(x )t >0 Vt e T(S,x0) suchthat yg (x )t =0

V;i < I5#, such that u. > 0 then 'im, 6 > 0 such that
0 1

f (x) > f (x0) + mljx - x0f , x e N^ft) . (h.h)

Proof: Assume :>1 no n. , 5 > 0 such that (h.h) holds. Then for each

integer qax suchthat (i) x eN(x,l/q) , (ii) f (x ) - f (x )

<— ijx -x il . Select a subsequence of the x such that

X • X
0
II"^ Z \\ - t ' T(S,xJ . Set G = r u S-Cx) . Then G>0 on S,
I'-q'^oll ~ ~U ieB0 1 ~

i;(x ) ^ 0 , and f = l+G . For the subsequence defining t we have

r(x ) - r(x ) G(X )


--a p2.+ —=^2 < T >• ^-5)
ifeq^ol! teq-&ll '
Thus

G X
T ? ( J
t 7 l(x0)t + lim sup =3—^ < 0 (U.6)
- ~ q .• ||xq-x0|l

A:- G(x ) > 0 , the second term is bounded and nonnegative. Therefore

ü X )
( Q r^
1 1
q -• IKq ' Soil i€B0 ~u ~

Thus

Vgi(x0)t = 0 , VicB0 such that ^ > 0 (1+.8)

so that (U.6) states that 3tcT(S,x0) suchthat t satisfies (1+.8) and


T 2
that tt 7 X(x )t < 0 . This gives a contradiction. D

22

"^^ -•-■-- - - -—-—-— -^^ - -- - .-.. •■-/>*^J" —-


)l|.!i'.ii..niPMiviniiiii,iw,w>.tili,i.lwl,.wwwl|, ————, mmmmmmmmmmmmmmmmmmmmmmmmmmmmimimmmmmmmmHt&IBK^^

Consider noK the system

VZi*0) = 0

uigi(x) = 0 , i = 1,2, ...,ra ,

^(x) = 0 , i = 1,2, ...,p (M)

where explicit envunerations of I1 "ani I- are assumed.

Definition; J(x0) is ^e jacobian of the system (^.9) with respect to


(x,u,v) .

-VII^XQ)
1
... -7h (x0)T
^^ -^1^0)T •*• -7gm^0)T

ul7g1^0^ g
lW

0
J(x0) = g (
m ^)

0 0
^p^o)

(^.10)

Lemma k.k: If J(x0) is nonsingular, then x0 is an isolated local

minimum of f on S .

Remark; Note that the condition J(x0) nonsingular imposes strong

conditions on the problem. For example,


► •
(i) the active constraint gradients must be linearly independent, and

25

mi*li**iä*Mm*tä*ui*b*i.^^.^^. .*. _ ,. ,.,....„■,. ...-


'"•■■■I"1 'l,l I WIIWWI><PI|PipWIPWP^P»ipwplWPWWIl|pgll>pwiBIWP|j^BIB|^^

(ii) if gi(xn) = 0 then u, > 0 (this condition is called strict

complementarity).
•u
In particular S, is Lagrange regular at x .

Proof; If J is singular there is a vector a satisfying


u

a = 0 . (4.11)
u
b

This relation cives


U
7h = 0
(i) i(^o^ ' i = l,2,...,p,

u 7
(ii) i Si(^o^+aigi^0^ =0
' i = 1,2, ...,ra , and

v 2 X(x0)y - ^Ä a 7g (x ) T - T^- ^^(XQ) T =0


(iii) i i 0 . Ü
i=l i=l

From (ii) we see that u. > 0 =* 7g.(xrt)y = 0 while u, = 0 ^ a. = 0 .

Now consider the problem

T 2
min y y Zi*0)y

= 0 ieI arld = 1
subject to 7Bi(x0)y = 0 , irB0 , ^(^Q)^ > 2 ' H2H
Clearly the constraint gradients are linearly independent as

2y = v(|ly|| ) is in the orthogonal complement of the set spanned by the

other constraint gradients. Thus the set of feasible y is Lagrange

regular at every point by restraint condition A. Let yn minimize the

objective function (the minimum exists as the constraint set is compact),

then the Lagrange regularity ensures that 3 multipliers \ , a. , ieB- ,

b.i ,' iGl02 such that

2h

^■■■■^-^-.-..■. ..... —|Mt|ttt


■■—--' ■ • -
"~—— -.--^-...V ^-.-.—..^.. ^ ...... ..^
i mm*—mmmmmmmmmmmmmmmm

^(&)& - ^o - J^y*^ - J: >7hi(-°)T = 0 ( 12)


''-
v/hence

x
= y^^^o^o= min i^^i z0
Now if \ = 0 , (^.IS) shows that conditions (i) - (iii) above are

satisfied and hence J(x0) singular. Thus if J(x0) nonsingular,

then \ > 0 . In this case Lemma U.5 shows that the minimum of the MPP
is isolated. O

5. Convex programming problems.

If g.(x) concave, iel1 , then the set S = [x;g.(x) >0 , ielol

is convex. The problem of minimizing a convex function on S is called

a convex programming problem. In this section certain properties of this

problem are studied. We require the following characterization of convex

functions.

Lemma ;?.!; If f(x) eC then f(x) is convex on S iff

f(x) +7f(x)(y-x) <f(y) , x,yeS (5.1)

Proof: If f convex then, for 0 < \ < 1 ,

f(x+ (l-\)(y-x)) < f(x) + (l-\)(f(y) - f(x))

whence, if \ < 1 ,

f(x+(l-\)(y-x))-f(x)

25

hiiii^a-iifi^ii^rmitfliiläliifhirtrtifiKiTi-tf' iitfiitrttt^ifiWriirirfiHiiniYi^iitMfaWTiOiniTi-irtiiW-'iiiirtOTrTiWr'rr^ ., ■.■ . ■-.■


,.WI'WUWW»li|!ll|W IIIJiWLy^lWTipiJ.JB^ywiwiyyjl^.iw^HW^Wjjw^ wjfiwr.M'^i'"' wpyiwy^mmmiij^'nim.w^fT.'nnj'jin -■'■»—■

The necessity follows on letting \ -. 1 . Now if (5'1) holds then

f(\x+ (l-\)y) + \7f(Nx+ (l-\)y)(y-x) < f(y) , (5.2)


M n« ^» ****** **

f(\x+ (i-My)
-*
- (i-M^(^•# + (i-My)(y-x)
******
< f(x)
**
. (5.5)
M

Multiplying (5.2) by (l-\) , (5.3) by \ and adding gives (1.1+) which u


demonstrates sufficiency. D

Lemma 5.2; If S = {x ; gi(x) > 0 , g^^ concave, iel^ has an interior

point x , then every point of S is Lagrange regular.

Proof; Consider x eS . Let i€B0 then Lemma 5.1 gives

,i
Vg.^)(x* -x0) ^.(x*) >0 (5A)

as g.(xj = 0 , ieBn . Thus restraint condition B is satisfied. D


i „0 0

Lemma 5.?: If f convex satisfies the Kuhn Tucker conditions at x0


3S

then f has a minimum on S at xrt .


j

Proof; In this case (5.2) gives , I


I
uT x 0 u 0
Vf(x0) = £ V^^O^ ' 6( o) = ' i^ *
iel1
j
Let x be any other point of S , then

f(x) > f(x) - E u.g,(x) = X(x) (5-5)


~ iel1 a :L ~

where £(x) is convex on S as the gi(x) , iel1 , are concave. Thus

f(x) > XCx^+VlCx^Cx-^)

= f(x0) . D

26

flaalHaijia^aiMg—j....;.^,...,,.-^
-■-••-■M"-j-i"-ii niiiiMi
PPIJ^»WWI|IPII(||»|IIIIIII|JMPPW«WV»!»II.II1P«',?VW1"™.™''«.".I"MIP<'P.-™,|.».I' )■ Tr.iiii.iii ii. „mmm*mi«fmtwr-l~wmwiwm.»WWIweiwiVWl-i'>. »m™-»™

Ranark; If f has an interior the Kuhn Tucker conditions are both

necessary and sufficient for a rainimun of the convex programning problem.

Definition; The primal function for the convex programming problem is

a)(z) = inf f(x) , Sz = [x ; g(x) >f} (5-6)


xeS„

Note that if z1 > z then S z c S Z ' so that u)(z-) —> (i)(z ) and that
~x — „d l *~ 2
if S has an interior then Sw nonempty for z > 0 and small enough

Lemma S3: (ü(Z) is convex.

Proof: If x. e S„ , x_ e S^ then, by concavity of gi , ie^ ,


——— „X Z- -.t "p

g(\x1+ (1-Mx2) > ^t (l-\)z2 , 0 < X < 1 .

Thus \x1+ (l-\)x2 € S^^.^ We have

vCKz^ {1--K)z0) < inf f^t (l-\)x2)


€SeS
^l Zl^2 z2

< inf (^(x^ + Cl-MfCxg)) by convexity


^l€S2l^2eSZ2

<\ inf f(x.)+(l-X) inf f(x )


GS eS
^l z ^ z,

< Mz^+Cl-MwCzg) • Ü

Definition; The dual function is

0(z*) = inf f(x) - gT(x)z* , z* > 0 (5.7)


xen

where n is the region on which f , -g^ , ^\ > aa« convex.

27

-"-'-'--- "■ —- -- --"' "-^-~ v- M '-1- ^iMt ..^...-~^i^^..^--^.. , ...


^PHMMMOTP IWJP««WIIIP,niip"ll(«lll'l'WWJ.i«»F|llil.>l,iiWi!llll,.|»,.lii|"W'JWl""M'|i»'1.'■'."■■■ . -■..Twrw'-r,™-™^.^-^"^-^—^-.—- —r^rwipnnrwnn^

Lemma jjiA; 0(z ) is concave.

* *
Proof; Let 0 < X < 1 , and z", z" > 0 , then ,)

0(\z*+ (1-Mz*) = inf (f(x) -gT(x)(\z*+ (1-Mz2)}

= inf {\(f-gTz*) + (l-M(f-gTZ2)}


x

> \ inf (f - g^z*) + (1-M inf (f - gTZ2)


- x ~ ~1 x
ü
a
> \ 0(z*) + (lAWz*) •

Lemma ^; Let r = [z ; 3 xen such that g(x) > z} . Then


L>

^(z*) = inf (u)(z) -z z ) . (5.8)


zer ~ - ~

Proof:
T *N
[z ) = inf (f(x) -g(x) z ) ,

< inf (f(x) -z z ) ,


xeS

T *
= cü(z) - z z

(z*) < inf (u)(z) - zTz*) . (5-9)


~ ~ zer ~ ~ ~

Now let gC^) = ^ . Then

28

"•"-'--'---■•-- -—^"-^ >.--■ - -—~~ ~—m^,~.,..^ .._ , „.■_.■■■■,—^, ——■.■^.■.^—


jify.iiinu.jtmiiiii.i i.i »pf.niiiwpnnDHNi mmmmmmmmmmm***i***immmm*Tm, ■"i-f^T*"!^»" '?*fT* 'fTf**
-r^—WVfr^f,-^,^

f(x1).gT(x)z > inf (f(x)-Z^z )


z
- l

> vizj) - ^ z

> inf (u)(z) - z z )


~ zer ~ ~ ~

inf (fCx.) -gT(x1)z*) > inf (u)(z) - zTz*) . (5.10)


zcr
ll
The result follows from the inequalities (5.9) and (5«10). 0

Theorem 5.1: (Duality theorem). (i) sup 0(z ) < inf f(x) .
z*>0 xeS

(ii) If S has an interior, and 3 x0 such that the Kuhn Tucker

conditions are satisfied, then 3 z maximizing JÖf^ ) and equality

holds in (i).

Proof; From Lanma 5.5 we have that

0(z*) < u)(©) = inf f(x)


xeS
X
holdc for each z > 0 . Thus

sup 0(z ) < inf f(x) (5.11)


xeS
z >0

If 3 x such that the Kuhn Tucker conditions are satisfied then x

minimizes f on S . Defining z = lu1*"^um} where the ^ > 0

are the multipliers in the Kuhn Tucker conditions we see that

0(2*) = f(x0) . D

29

•^^^^•^^■-■-^■"-.vmi-yraäH! - -— —- "-an iiii1tM__i_ij__


ifmmmm rnrnrnrnmimm III..IIIII l,'IIPIV»l')IJI.|»,tl'i|l)lll|ill|l|l,»|lW»mB»Wy"'Vi|l'-»l-'J»ii|JI|l»)ti|Uj,t<»l,i »i,,^.,,,»!^^^ ^^mwr^„rr^„f^r^r„^„rrr**rVmurww^

Corollary $.1: (Wolfe's fonn of the duality theorem). Consider the

primal problem minimize the convex function f(x) subject to the concave

constraints gi(x) > 0 > 1 « 1,2, ...,m , and the dual problem maximize

jC(x,u) subject to 7 1 = 0, u>0. If a solution to the primal exists


MM X ^

then the dual problem has a solution and the objective function values are
u
equal.

Remark; (i) The linear prograjnming problem

min a x subject to Ax -b > 0 (5.32)

is a special case of a convex programming problem as linear functions have

the special property of being both convex and concave — this 1§ an


U
Immediate consequence of Lemma 5»1- This property of linear constraints

peimlts the previous discussion to be extended to permit linear equality

constraints. Note that if the linear equality constraints are not to be

contradictory, then their gradients must be linearly Independent.

(11) If the restraint condition B Is satisfied at x , and f(x) has

a minimum on S at x0 then x. also solves the linear programming

problem

min f(x0)+7f(x0)(x-x0)
[ i
subject to
(i) 8i(x0)+7gi(x0)(x-x0) >0 , i€l1 , and

(ii) ^(XQ) + V\{*0) (x - x0) > 0 ,

-^(XQ)-7hi(x0)(x-x0) >0 , iel2 ,

as the Kuhn Tucker conditions are both necessary and sufficient for a

50

BMBBB; j^^.^,;;:^ _,;..-:- .-^ ^.. „:.-.. „..■.^■w.^;;,.:,.: „: j^^gj^ MttMtiMiiiiiimaMüi


itJUm^m^i;**^**^:*.-. s^iTl^Aml&djä
■'«"PW"' ' liIIBlli,J,1lW>|lll|lll|N.lipi,ll)|Jjl,l|li„HiliUlm,| IHHIIIIUIHP^I,,,,!!,,!,,,^,,,,^,,,,,^,,^,!.,,^.,..,„

solution to the linear programming problem. That the converse need not

be true is readily seen from the example min -x subject to

1 - x2 - y2 > 0 which has a minimum at x = 1 , y = 0 . The associated

linear programming problem is min -x subject to 1-x > 0 which

has the solution (l,y) for any y . Thus additional conditions are

required if the converse is to hold (for example. Lemmas h,3 or k.h


-
could be used).

Example; (i) (Duality in linear programming). Consider the primal

problem
minimize aT x subject to Ax-b >0 .
M# M ***** ^

The corresponding dual is

maximize bTu T
subject to Au-a=0 , u>0 .

If the primal has a solution then so does the dual and the objective

function values are equal.

(ii) (The cutting plane algorithm).


(a) Consider the set S = {x ; g^x) > 0 and gi concave, iel^ .

If x*/te then gi(x ) < 0 for at least one i . Let a satisfy

e (x ) < g.(x ) , iel, . Consider the half space


U = lxjga(x*) + 7ga(x*)(x-x*) >0] . Then x^U . Now if ga(x) >0

then, as g^ concave,

ga(x*) + 7gct(x*) (x - x*) > ga(x) > 0 .

Thus ga(x) > 0 =» xeU so that

Sa = [x ; ga(x) > 0} c U

51

■Hü iüMü IMIahfc«.v,„ r -. .. -^^.^......... ..„■„ „n,-,.,,,,,.,,, ■iiM.||.iti«..,IJ.,^.-,...1h|||r|.|..l|l||


mm&mfmvm iw im'in .11.. mvmi^mnnnnm^w «MIH, I M .TT-mTwmcif M»IM WW'W.T.'-^ vwnn-^T-rr^^-^-.n-r-.-^ .v...... , :,-fW' .n:WT!-r—*^r-r,

if ir ir
We have S c S cU . Thus the hyperplane afx ) + VgL~(x ) (x - x ) =0

separates x and S .

(b) The convex programming problem minimize f (x) subject to xeS is


equivalent to the problem minimize x^,, subject to xeS , x ... - f (x) > 0
n+x w nrx *. —
vhere x ^^ is a new independent variable (note that the new constraint

is concave). This equivalence follows from the Kuhn Tucker conditions u

by noting that the new constraint must be active. Thus a convex programming

problem can be replaced by the problem of minimizing a linear objective

function subject to an enlarged constraint set. u

T
(c) Consider the problem of minimizing c x subject to xeS and S

bounded. In particular we assume that S c Rn = {x ; Ax - b > 0} . We


I'
can now state the cutting plane algorithm

(0) i = 0 .

(i) Let x. minimize cT x subject to xeR. .

(ii) Determine a such that ^(x.) < g..(x.) , Je^ .

(iii) If g^) >0 go to (v).

(iv) Set Ri+1 = Ri n {x ; g^C^) + Vg^) (x - x^ > 0} ,


i := i+1 , go to (i)

(v) Stop.

Note that step (i) requires the solution of a linear programming problem.

(d) The cutting plane algorithm generates a sequence of points x^^ with

the property that


mm rji rp
c x < c x, < ... < c x. < ... < min c x
~ ~ü ~ - -L ~ ~1 xeS " ~

T
as R0 2 R-i 2 • • • 2 S . Thus the sequence {c x. ] is increasing and

52

lüilüiiüiim'iiuiirirrir' i ■ mm^amtmsmmummmmmamammuiimmitiatmmmmmmam^^ ^.L^**^^**.*^


i»'.'W-i*»^J.i!ii'SiqjMi^|iv;^ ^T^f^W^MP m ^'•Srrm^^Tyywr,rrVryrn,pr^

bounded above and therefore convergent. Let x be a limit point of

the [x } . Then x GS and therefore solves the convex programming

problem. To prove this, assvune x /t S . Then

min g,(x ) = gjx ) = -\ <0 .


i i ~

Let a subsequence {_x.} -»x , then, 3 k such that

(i) llär**!! < Ä ' ^


(ii) ^ < -i
where C > ||7gi(x)|| , xeR0 , ie^

Let

min B^x^) = SßC^) •

NoW X a 13jnit p0int 0f


Then gß(xk) < ' 2 ' ^Xi^ ^ ^ e nR
i ' ^
particular, x eRj-i whence

c + 7g (
ß^ ß ^)(~ "^ ^0

But

S^^-^ll < ^c < T


so that
g
ß(^ + ^(^(** " ^ < " T + H^ß^)(x*" ^ ^
< 0

which gives a contradiction.

35

■ ''-•■- -j-: --•--■■■ ■i-.^1.ai.»tiii»i.,i>Ma»-,.v.^.^a.,.ai,1„.r^..;..^,-,.,-.,-■- ,...■■i...J.ai...„.fl.,,-||||||


■^•"■^---^^-"^iViV.l'^^i ^fi - im
PffWP^WWIHPPWP^W^WWi^iWBüWWi^iipH^^ Hü i.i L IIIUU. fQimmmmmmfmiiiiiimmmmmmm ■— ■

Notes

1. For properties of tangent cones, see Hestenes. Luenberger discusses

polar cones (which he calls negative conjugate cones) on pp. 157-159.

Lemma 1.1+ is due to Gould and Tolle. The proof is due to Hashed

et al.

2. Hestenes is a good general reference for this section and Includes

a proof that a finitely generated cone is closed. The proof given

here of Parkas Lemma is standard (see for example Vajda's paper).

An extensive list of alternative theorems is given in Mangasarian.

3. The main result is due to Gould and Tolle. The treatment of the

other restraint conditions follows Flacco and McCormick.

h. The treatment of second order conditions is based on Hestenes.

Similar material is given in Flacco and McCormick.

5. The treatment of duality is based on Luenberger. A related treat-

ment is given by Whittle who is good value on applications. Vajda

is a good reference for the mathematical programming application.

Wolfe's papers in both the Abadie books discuss various aspects of

the cutting plane method.

References

J. Abadie (Editor) (1967): Nonlinear Programning.


(1970): Integer and Nonlinear Progranmlng. North Holland.

M. S. Bazaraa, J. J. Goode, M. Z. Nashed, and C. M. Shetty (1971):


Nonlinear progranming without differentiability in Banach Spaces:

Necessary and Sufficient Constraint Qualification. To appear in

Applicable Analysis.

3U

tlitMm^ämi . ..—..r-^--^...- -:..-..-


■T— .1. ■■■>■.■
^^^■SPipPWIppipww-w" *-'" '"' ' i " IWIJ ■,. , ■ mmm i ' M
J" ' M I II ■ ! l ! H.»^|.i I,,IWII IJI,!,. 4_,

A. V. Fiacco and G. P. McCormick (1968) : Nonlinear Progranming and

Sequential Unconstrained Minimization Techniques. Wiley.

F. J. Gould and J. W. Tolle (1971): A necessary and sufficient

constraint qualification for constrained optimization, SIAM

Applied Maths., 20, pp. 16U-172.


M. R. Hestenes (1966): Calculus of Variations and Optimal Control

Theory. Wiley.
0. L. Mangasarian (1969): Nonlinear Programing. McGraw-Hill.
Peter Whittle (1971): Optimization under Constraints. Wiley.

55
mm mmmnmmmimmm iy IJ.IMUPI.B^^WBW^W^^^P—»I .ILL» »JH..W ."»I''»-u , i MIIUIÜJ»!)^

II. Descent Methods for Unconstrained Minimization

56

nn^iiiMiiM^Mi—W^iaiMii j^l^^MAaBiaftJl • ;-' -M ■■ i ■ ■! nil - ■'-'-—*- '■-'^


ipvMPwmiiiivPimm T*'***rr*^^?***mm*v*!rwp*rrf^rm'w^

——— ■■■—i

1. General properties of descent methods.


The class of descent methods for minimizing an unconstrained

function F(x) solve the problem iteratively by means of a sequence

of one dimensional minimizations. The main idea is illustrated in

Figure 1.1. At the current point x a direction t is provided, and

the closest minimum to x. of the function


~i

Gi(\) = FO^+M^)

sought. At x. , we have

G[i\) = ^(Xi+i^i = 0 (1.1)

where x... = x. + \.t. .

7F(x.)

Figure 1.1

Definition; A step in which x. 1 is determined by satisfying the above

conditions is said to satisfy the descent condition. We consider t.

a profitable search direction if F(x +\t.) decreases initially as \

increases from zero. This condition is formalized as follows.

57

Mj y- ■■r^tafoül*fraiü1n-rir ^ütfemmmMs&a^&äimm* I iM -^'-'•-VJi--"SiAtfiiaänHimiiin I B


m ^mm mrnmrnmimmimmmmiriim PIWfll^WPBWIIP«IBni,'l»iWt!WIPJBi(»il!!HI|l

Definition; (i) The vector t is downhill for minimizing F at x

if 7F(x)t <0 . (ii) The sequence of unit vectors [t.} is downhill


c
for minimizing F at the sequence of points {x.} if 3 6 > 0 ,

independent of i , suchthat 7F(x )t. < -8(|7F(x )H .

Example: The sequence of vectors t-7F(xi) / ^(x^H) satisfies the u


downhill condition with 5 = 1. In this case we say that t. is in the

direction of steepest descent.

An estimate of the value of \ minimizing G. is readily given.

We have

0 = = 7F
^i+l^l ^i)ii + Xi ii ^^i^i
u
where x^ = x.+ \.t. is an appropriate mean vulue. Thus
~i ~i i~i

-7F(x.)t. 6|l^(x)ll
(1.2)
LT 2
t^^FCxJt 2„,-
' 117^(^)11
«^ X «MX K*X M«X

Theorem 1.1; (Ostrowski» s descent theorem). Let R = [x ; F(x) <F } ,

and assvune that F bounded below and t 7''F(x)t < K|| t jj , xeR . J

Define
»11^(^)11
X.^T = x. + and
~i+l ~i *i>

=1
^^i^i ^ ^W^ll ' IfeiH '
for i = 1,2,... where 6 >0 . Then {F(x.)} converges, and the limit

points of [x.} are stationary values of F .

58

mmmmmtämm* , , ■■—!■! ■,
■- ■ — - | - --^-■■->^":-."1
IHIIKIIi.MilMiil.wiigilil i^wymi'wJWWPipppiiiBWiwiff.in.-Ji II. wmmmmmmm pppwwgip I,»I|.I,II»I ,i. [imn.*vw^mtwm*"wwm,wmr.**wws^whimwvr*r^,'v'mfn'm!''«!t

Proof; As {t.} dovmhill then [x } c R . Expanding by the mean

value theorem we obtain

2
SllvF(xi)|| /6ll7F(xJA
1 T72 F _ v
F(xi+1) = n^) + —Y^- ^(x.)^ +1^—r -j H ^ii
where x. is a mean value. We have

(»IIVF^H)2 J 51^(^)11
^^l+l) ^ F( C )
ii K K )"■

< F^i) (1.5)


K

Thus the sequence {F(x.)} is decreasing and bounded below and therefore

convergent. Further, from (1.5)*

\\VF{X±)\\ < i^K^x^-F^)) (l.U)

-♦ 0 , i -* • .

Thus vF(x ) = 0 if x is a limit point of (x.) . D

Remark; By (1.2) the step taken in the direction t. underestimates


> ft
the step to the minimum of G. . Thus (1.5) holds if the descent

condition is satisfied so that the conclusions of the theorem are valid

also in this case.

Theorem 1.2: (Goldstein's descent theorem). Let R = {x ; F(x) <F }

be bounded, and assume FeC and bounded below on R . Define

59

«MMi 'al^*^a''-^—a^-"'- ■>-■•■>■'-'—-^M.1..,,!,|| , .,(,;


wjiifiiiiiWpwwiuiiMWiwipTO^ r
T^-JTa-r^TT-;,--^^:^^.«--.^-.^,^.;-„.,,„„.

Lix-^K) = F^-F^i+^i) ,

L^

where ft.} downhill, and the [x } are generated by the algorithm


u

(i) = if 0
h+1 ~i ^i'**) = •

(ii) If ^(x^l) <a where 0 < a < 1/2


Ü
then choose \, such that o < \|f(x ,X.) < 1-a ,

else choose \. = 1 .
i

(iii) x.^^x^X.^ .
U

Then the limit points of [x.] are stationary points of F .

Proof; A(x.,\) = -\7l)"(x.)t.+ o(\) .

Thus A(x.,\) = 0 => IJVFCX^H 0 ais [t^ downhill so that x.± is a

stationary point. Otherwise 7F(xi)t1 < 0 so that ^(x^K) = 1+o(l)

whence \|r(x.,0) = 1 . Also the boundedness of R implies that J


ä(X.,\) < 0 for some X large enough so that, as T|f(x.,\) is continuous,

\. can be found to satisfy condition (ii) of the algorithm. Note that

fx,} c R . We have a

F{x±) -F(xi+1) = -^(xi,\i)7F(x1)ti

> ^51^(^)11 . (1-5)

Thus {F(x )} decreasing and bounded below and therefore convergent. To

show that the limit points of [x.} are stationary values of F consider

hO

wttmsMluiiUi^itkWHm k i jg^^jjgjgmigUl ^..l,tfXi,:ii,st


pppfjp^Pff-lwiP^JIiJPli!^^ Mf.S^W-^' '--■' ■■ ■r»rff:tWT.-Tf.,-.fTi,T'V-'^',VnTr^7-ty7'IT^Vr ^mnlRmBWM^>ffB!IIVn^*l*n9FaWI\,V<UMIWIIPlUU ■j w iiW .wppqwBB

the subsequence {x } -♦ x and assume ||7F(x )1| > e >0 . Then

|bF(x )!| > e for i > i0 . This implies that inf \ = ^0 > 0 as
" -P-i " 0 i ^i
otherwise sup \|f(x ,\ ) = 1 contradicting ^(x.,\.) <l-a . Thus
i -M-i M-i -.i i -

FCx, )-F(X )
-^
\\vfU )|1 < X0a(S
(1.6)
-M-

The right hand side -.0 as i -. « which establishes a contradiction. G

Remark; There are two aspects of this theorem which are of particular

interest. (i) It is necessary to assume only that feC in R .

However, the boundedness of R is used explicitly. (ii) The algorithm

for determining the step length \. is readily implemented. A value of

\ satisfying condition (ii) of the algorithm will be said to satisfy

the Goldstein condition.

Theorem 1.3; (i) Let the vector sequence in the Goldstein algorithm

be defined by

s. = -A-VFtx.)1 , t. ^s./HsJI (1.7)

\ . (A.)
where A. is positive definite, bounded, and X(A.) = r rrpr > OJ >0 ,
mcuc x

i = 1,2,... . Then [t.} is downhill with constant 6 = ou .


* -1 2
(ii) Assume that {x.} -x , and that ||A. -7 F(xi)|l = o(l) , then

\. = l|s,|j satisfies the Goldstein condition for i large enough.

(iii) The ultimate rate of convergence of the algorithm is super linear

for this choice of \. .

hi

--^ ^--^^^^^--—-u^...-.^^...-^..^^.-- .■^^.,..^.^-J.....-....^.-,.-.-.1.l^^^^,-..,.:;i^Ww,.:- , ■■,.--■>■■■ |J*gmaMAjJ| ^^^I.^.-L.-.^-.^.-U.:.^:...


^w^niiiLiiiinmiii.ijiii.jig.in^nipi , mWmn<invx9W)m»immfmmmm>mmßf*m''?«'m.mßmwmmmfm

Proof: (i)
7F(xi)Ai7F(xi)T
=
^^i^i

Xmln(Ai)ll7F^i)ll

< - cullvF^)! (1.8)

u,
(ii) ^(x^M =

Tirtiere x. is a mean value dependent on \ . Now, writing


J
1
V^Cx.) =A" +Ei

and noting that |1E. || - 0 as i - • , we have


ü
x El
/ ^ -, / ! . ~i h

so that
E
1
i !i
\^*±'V - t " 2^ I ^ if^ a)ll7F(x~)||

llEjl llAjl
- 2 s. U)
(1.9)

U2

. .^ _ . . ,„■.„...^■■■..;-^...^J..V:^..,i ,.^,..:^^-.,.^.th^;^^,^.^^w^^i^^.K^.ft...^^^a:^^^rt^^
PPP^^ II ■■Wmiim^lWW. HI»!!!.!! I,,^., nilJI . ^F^^,
'»l"i..!i.^.»i.—M"W»^'.Hil"».i.H»H|l»|J||iii|i|»Wt^.ji

In particular

\\\\ llAjl
l^llfill) - -* 0 , i -
2 ' - 2 U)

(iii) Another application of the mean value theorems gives

s. =
VF(-i)T =" Ai(7F^i) -VF^*))T
= -A.(72F(x)(x .x*) + o(||x -x*|l))
X «* X **X A# f^X ^M

- (*i-**)+ ^ll^i-x I (1.10)

Thus

X.^T-x = x. +7.s. -x
~i+l ^ -i i~i ~

= (l-r1)(xi-x*)+ 0(11-^-x (1.11)

From (1.11) the choice y. = 1 iK - ||si||) gives superlinear

convergence. D

2
Remark; Theorem 1.5 shows that if y F is positive definite in the

neighbourhood of an unconstrained minimum, then it is possible to have

algorithms with superlinear convergence without the necessity of satisfying

the descent condition. It is not generally considered economic to compute

the second partial derivatives of F , and considerable emphasis has

been placed on developing approximations to the inverse Hessian using

only first derivative information. Although the steepest descent

direction is initially in the direction of most rapid decrease of the

function it gives in general only linear convergence.

U5

gganMiiaMiiüi ..iJ.^—.i.;..;,,...^.
PIPIPIfim RpppUfpi H'l"1 i WJIinmi II miliuWflü^mmmmwnKIV»"!!"!1.1! W^KVlt T«.'^rpTr7»iJi«»niWf|wIwww^i-Trii^^ ''■,">"'

--

2. Methods based on conjugate directions.

The problem of minimizing a positive definite quadratic form is


u
an Important special case of the general unconstrained optimization

problem. In particular it is frequently used as a model problem for the

development of new algorithms. It is argued that in a neighborhood of

the minimum, a general function having a positive definite Hessian at

the minimum will be well represented by a quadratic form so that methods

which work well in this particular case should work well in general.

Let F be given by

F(x) = a+bTx+ | xTCx (2.1)

where C is a positive definite, necessarily symmetric matrix. We have

7F(x) = bT + xTC . (2.2)

Consider now a descent step from x. in the direction t. . The

descent condition gives

0 = =
^^i+l^i *i(C^i+^i)+y
whence

where g. = 7r(x.)T . To calculate the change in the value of F in

a descent step we have

hh

liiitefflMiJiiiiiititiiiiy^^ ^^^ftto»^.L^^.^^--^^-:^..tJ^----^..^--^^-...^.1^^J ^. „ w ^..■^.j,.^-.^^ .......... a^.V.^ -IJ., „ . -.^^il^Ji


p..|i»ii.|iLWii,ii»iiiWiM.MW»TpilW|!fWipfypw^^ inii.inimmwT^wuMn11"!"»" i'i\j''\iini'.!.m«miwn»>m.uri\i'.*'mrrr*w-mw*m*'i™'m'*vr™?i.

F(
ii+hit) ■ F(Ji) - N *T !i+\ iic ii+1 *•! -ic -i

^ijlVl^lfl0*!

1 (gl ^ij (2A)


2 T
t Ct
c
!i :i

Example; (Linear convergence of the method of steepest descent)

Let F =|xTCx . Then (2.U) gives

! (x^C2x)2
^i+i)-^i) = -i -7^37-
C
~i -i

T
^2
1 (üi gij
f

2
/iC!i

where w. = Cx

We have FCx^^) JwTc^w. so that

(
F(xi+1)
,
j- —
1
,-»
*I ll'2 F(x.)
2
wTc w, ^C'-'-w,

The Kantorovich inequality gives

(J w)' Ug g
l n
T 1
w C" w w C w T
(a1+an)£

where a, and a are the smallest and largest eigenvalues of C

respectively, whence

^5

itrtWmMiiirilM^ irmniitriiBMti-JiiWiin)^^^
*""" I UPPmp^Pmipiy l Wm\\V iHiPi»|Pii.|||^^j^ii||Wil»»l'Wi|,¥ITI"''J'! »»''h"'.T'ITrt,',l''«l'!?'l'|l'^»nri.i i| 'm.lim»<MHMI ■'" '■' |J''i'-l''''W»,«n.'in|.r>f^"gWtr^-CT^,r.wi^^

F(xi+1) <
Vli H*±)
a +(T
n l

which shows that the rate of convergence of steepest descent Is at least

linear.

To shew that It Is exactly linear consider the particular case in

which
x. = oc v.. + or v
„1 1 „1 n „JI
9
where v, and v are the normalized eigenvectors associated with o.

and a respectively. We have

+ i+1 i
x.^,
~i+l -- oi1 \~1 + an ~nv = (1
v - l.a,
^i I7)aj-1 ~1
v. + (l
v - x
ft4a )ci! v
i n' n ~n

with (from (2.5))

C^2 "l- (an)2 < O


v v
1' 1 n7 n

so that

v
^ 1' 1 n7 n

and
0

a'i+l
n

In particular

a,i+1 or a:1-1
n
a'1+1 CC a'1-1

he

—- - ■ ^. ..-.„:...;.■ ....:,. ■ ■ aMiiBlüiAgl


^F^i.^p^yy mmmmmm^mmmmmmtmtmmmmmim .mm . ppupn

J. /^i2'
so that the ratios oct/a assume Just two values for all 1 (depending

on 1 even or odd). Now

and

so that

a:
1
aX ai+1
"n n < 1 , and 7 is independent
where 7 = min ( +1
1+
a:
1+
4i+1
or a'
n n
of i . This inequality shows that the rate of convergence of steepest

descent is linear.

Definition; Directions t, , tg are conjugate with respect to C if

t^ct2 = o . (2.5)

In what follows it will frequently be convenient to speak about a

hi
I

ma - ■
W^W^WIIW^WW^<^^^^^^^ff^Wi»»..i»li|».WJ 1 I III ■! »um ill ■■■■..■ i -"" "—

'direction of search* without intending to imply that its norm is


unity. However, the null vector is excluded fron any set of mutually
conjugate directions. It is clear that any set of mutually conjugate
directions are linearly independent.

Example: The eigenvectors of C are conjugate. The property of being


both conjugate and orthogonal specializes the eigenvectors.

Lemma 2.1; Let t.,...,t be a set of mutually conjugate directions


«J- *****
x
(with respect to C ). Starting from x, let o»xy ••*5rH.i be points
produced by descent steps applied to (2.1). Then

g"? t = 0 , j - l,2,...,i-l . (2.6)

T
Proof: The descent condition gives g. o =0 so it is necessary

only to verify the result for j < i-1 . We huve

=
§i!s ^i+?T!s

= 0

Coronary 2.1; The minimum of a positive definite quadratic form can


be found by making at most one descent step along each of n mutually
conjugate directions.

U8

■ -^ .. . -.
-- -■ - -
Proof; From Lenma 2.1 we have T t
gV, =0 , i = 1,2, ...,n .
Thus e^1 is orthogonal to n linearly independent directions and
therefore vanishes identically. G

Remark; A method which minimizes a quadratic form in a finite number


of steps is said to have a quadratic terminatlop property.

Example; The sequence of vectors

t. = - g. •«• "^ A t , i.2,...,n (2.7)


1 H 11
- li^if-
are conjugate. The algorithm based on this choice is called the method
of conjugate gradients.
We now consider the generation of sequences of conjugate directions
to provide a basis for a descent calculation. To do this we note that
the minimum of (2.1) is at x = -c' b so that if we minimize in the
direction t = -C' (Cx. + b) = -c'^7F(x1) then the minimum is found in a
single step. In general C~ is not known in advance, so that we are
lead to consider processes in which each step consists of two parts
(i) a descent calculation in the direction

t± = - H^ (2.8)

where H. is the current estimate of C" , and (11) the calculation


of a correction to H. which serves both the purposes of making the
t. conjugate and making H. approach c" . It is convenient in what
follows to assume that the H. are symmetric. This seems a natural
condition given the symmetry of C but is in fact not necessary.

U9

-- - ■ -— - ~- ■ -
PfWWWB'.'i'n )> ri.i.-i,^,'"i."^»M>M"u^i''»'j."^v;'-M,'Hr*'»^^^

If we assrjme that t , s < i , are mutually conjugate then the


condition that each be conjugate to t. is

*icis =
SiHic~s = o
' 8<i
'

and, by Lemma 2.1, this is certainly satisfied if


u
H C = s<i
i !s Psis ' •

We write this eqxiation in the equivalent form (raultipQying both sides

by xs) u

6 i {2
«i^s = Ps^s ' < ' ^
where

i =^i+i^i ' li =
!i+i-!i * (2 10)
-
Consider the symmetric updating formula

where i* > \ t Q- are to be determined (or prescribed). We have


H^i y« = H-i y« =:
Pc dB ^ s < i , provided (2.9) holds as J

!^s = hhficls= 0'^ iiE±is = ^5:TiHic!s = Ps^^iis =


a \ X-äPci = 0 . Thus (2.9) is satisfied for i := i+1 if

0 = 1
*\{l±E±li)'tx((S.l±) ' (2,12)

and
= l KJ
Pi i^^-^iHi^ * (2 15
- )

If i. and TL are expressed in terms of p. and ^. fron (2,12) and


(2.15) we have

50

<äm*m*mm*ämm****~**^^.^..-..- -...--■ ., . ■M . ^1— ..I.: .-.. ^ ., ..... u ...-.- ■.■^■^,.„ ■ ■■ ,j|


mmimmmmmmmmm mmmmmmm '!^mmir^^mmw'^w'v["-,m>'''",'"''mmmmfmwmwwwv-''vi.*w>H'f'' WPWWWIIW ȟPWWWPIW!

T
d y
i i
\ = - T c<l
b
T
H
yi i5:i 5:1 Hi 5:1
and
p y H y
i i i i
ip W ill '

so that equation (2.11) becomes

d d H y y H
i i i i i i
H
i+1-Hi Pi.T
!iyi y H y
i i i

+ + H y y H d y H H y d
k{ i^!iS
y
i7^-
y n y
i i i i- i i i- i i i
„i i i i i

= D(pi,Hi) + CiTiviyi {2.1k)

where

(2.15)
:i -n Ti "i a
and
y H y
i i i • (2.16)
\ T

=
Example; The particular case p. = 1 , C 0 * i = 1,2, ... gives
the variable metric or DFP formula which is the most frequently used

member of the family.

The class of formulae described by (2.110 generate recursively a set

of conjugate directions so that the first of our aims is satisfied. It

still remains to show the relationship between the H. and C- . To do

51

-.>■...,^■.■■..^-„■■.,^ ....-., ..^.„...w.v..^.-.:-^..^. ;.„^......^


iipil,l^|WW^PjiW,li|U||iW^
^^^^TT'i'l II ""ill ii mi ii ii)|i ■mi iij^imiijiij H j-

this note that (2.9) can be written (introducing the symmetric square
1/2
root C ' of the positive definite matrix C ).

C^^C^C^tg = pßC1/2ts , s » 1,2,...,1-1 ,

or, more briefly,

= Cl
Kls sL ' s = 1,2,...,1-1 . (2.9a)
t u
Defining the matrix f by <.(T) = —=—~s ■, nr , 1 = 1,2, ...,n , and the
1
(tTCt )1/2

diagonal matrix P by V±± = P^ > 1 = 1,2,.. .,n , we can write (2.9a)

in the case i = n+1 in the form

Ö^f «TP . (2.9b)


n+l

Now T is an orthogonal matrix so that ü

A AT
H^T = TPT ,

whence
1 2
H^, ^C-^TPTV / . (2.17)
n+l

In particular, if P •= pi ,

Hn+1 = pC"1 . (2.18) D

Remark; Remember the motivation for developing the recursion (2.1^) is

the search for efficient descent directions. Specifically we are looking

not only for conjugate directions but also for good estimates of the

inverse Hessian. This indicates that p = 1 is the natural choice (or

at least p = constant ), and almost all published methods use p = 1 . j

However, from (2.17), the choice of p variable may well have, scaling

advantages in the Initial phases of a computation with a general objective

function. Presumably the strategy for choosing p should make

p -♦ constant to ensure a fast rate of ultimate convergence.

52

--■■-„■.... , ■-^nüinift^ii-umfii
MMfcMMhiBttaMli MMmMMMMBMyai I ■iifi
..■-■_. i^^ii-^.^^^^ i^iwi^ - _^,
p*r',Tr«'*''f'wnww*'»w'fWf^
|^l^|WWyqpilWBgnw^yifTWrwfiWW*ffTBWF"fW

Lemma 2.2; Provided the descent condition is satisfied,

H ornu11
i+3ii+il^i -

Remark; In what follows it is convenient to drop the i subscripts.

Quantities subscripted i+1 will be starred. In what follows we assume

p is constant.

Proof; We have (using the descent condition, the definition of t ,

and d = xt )

T T
dd1 HyyH
D(p,H)g* = (H+ p =5 f^-) f
dy y-'Hy

Hy/HCy+g)
= Hy + Hg -
"T
y Hy

i yd
i (d - -f^- Hy) .
K
y Hy ~

Whence

H g (7 + C-rvg )v . D (2.19)

* *
Remark; (i) The condition that H g =0 when v ^ 0 gives a

condition which determines £ . We have

T* 1 *T „ 1*T*
vg =-rg Hy=--g Hg

so that (from (2.19))

1 (2.20)
C =
Xg Hg

53

.■„^■■^.,^j-..l,rfinj,.w,,-:,^......,....,.■„..-■^■■. ...■ ■- . .
MiaaüBcataitijaatiaMartBBiaiaMMflMaaMliBB ■-•••,: - ■ 1 •;.
^W.^M^^H'IIF ^^'w^wtimmmmmimmmmmm'QmimmmBm 1 i 11
1 . ^; ffw^pBBffmniwgpi Try" iF<rrjT.Twm TCy Wi'JiyTI TfunrnffBrnw^ f TOI- ■»7-T7f»?r j iTJT'^-^vT-^rT.f«- .^^^j—",»,^^-^.^^^^ ^

*
Provided this value of C is excluded from consideration then x is

independent of ^ . Note that this result is true for a general


u
function as no properties specific to a quadratic form have been used

in its derivation.

(ii) We can only have v = 0 with d and g nonnull if H is


singular, and in this case H is also singular and the null space of

H* is at least as large as that of H . This follows from (2.15) which

can vanish only if (a) Hg* = 0 and 1 + -^ = 0 , or (b) Hg and


> i
Hg* are parallel. Now if H is singular a w , vTH = 0 . Thus
T T *
w d = 0 > and hence w H = 0 .
Clearly it is important that H. positive definite =» Hi+1 positive

definite, i = 1,2,... in order that premature termination should be

avoided (H e = 0 and H positive definite =» g =0 whence x is

a stationary point). Conditions which ensure this are given in the

following lemma (due to Powell).

Lemma 2.?; If 0 < p,T <• , H positive semidefinite, and


HH v = v (where H is the generalized inverse of H ), then H is

positive semidefinite, and the null space of H is equal to that of H

provided

y d
(2.21)
c > (dTH+d)(yTHy)-(dTy)2

Proof: We first note the identity

D(p,H) = (l + uyT)H(I + yuT) (2.22)

where

5I+

laynmumg^ug^ij ngju^i. i g^ ^ m a k ^MiiiMiMaBiiiaMiiiBiaafcnaai^^


wjBMuxuaaasamäjBBsmsuiitmmisiiüiuiMmiS^äau
WiWilllWWffWiiwff^iiww^pimiijptiiiPij^uw ..,:;-,,^,.-^^

1
u = r^d-^Hy^l . (2.^)
T T
\/(d y)(y Hy) ^ VT J

and

det(l + uyT) = l+yTu = \/-2- (2.2U)

T T
so that, by the assunrptions, I + uy is nonsingular. New y v = 0

so that H can be written

H* = (l + uyT)(H+CTwT)(l + yuT) . (2.25)

Thus the problem reduces to considering H+ ^TW . We have

H+Cf wT = H(I+C1^ vv
) *
* + T
The null spaces of H and H will agree provided I+^TH VV is

nonsingular. The condition for singularity is

0 = det(l+ C;TH+vvT)

T +
= 1+ CTV H v

Noting that HH+H=H,and HH v = v=»HH d = d we have

T T
dy y1Hy
1+ C T V.T1 TI T„+
H+ v = 1+ C T (d1 H d - 2 ^p + ^-f- )

(d y)
= 1 + ^T (dTH+d
T
y H y

and this vanishes provided

y d
C = 3T„+ ,x/ T 1^x2
(d^H^^C/H^-Cy^d)

55

"•■"■'-"
i ii ■ —— - —M—MM
"" ' ''" "J- ' - 'J *mmmm | iwiii m*rmmimmmmmmm»mmmmmmmmmmm~~--

The stated result is a consequence of this and the observation that

decreasing £ below this value will make H indefinite. D u

Remark; (i) The condition on T is autcmatically satisfied if

H is positive definite and the descent condition is satisfied for then


T T T
dy=-gd = xgHg. However the lemma does not require that the

descent condition be satisfied and remains valid even though the exact

minimum in the direction t is not found. In this case the condition


i.i
on T is necessary.

Corolary 2.2; If H1 positive definite, and Hi+1 = D(p,H.) ,

i = 1,2,.., then provided the descent condition is satisfied for

i = 1,2,... then H. , is positive definite.

Proof; This is a consequence of (2.22) and the above remark which shows
i)
that if H is positive definite, and if the descent condition is

satisfied, then I + uyT is nonsingular. D

Theorem 2.1; (Dixon's equivalence theorem). If (i) the formula {2.lh)

is used to generate descent directions, (ii) £. satisfies (2.21) for

i = 1,2,... and H, is positive definite, and (iii) the descent

condition is satisfied in each descent step» then the sequence of points 1.;

generated by the alcorithm depends only on F , H, , p , and x. and

is independent of £., i = 1,2,... .

Remark; It is important to note that F is not restricted to be a

quadratic form in this result.

56

•—^— ^^MlMtmuüm* , in „■■-, ■^.n^„i.1„..,,,8l..l..,,.^.i..,..,...,.,.l,..,.l. „:,..,.,_...■,


—i^.ji_^.,_n.j.i„.-f:i^uji
^^^^^W^^' ■|- "l 11 [ BHWiWPPIPWiil"^^'1 PWPWWIPWPPIB WiSiPiBpwwipqippiippfi^pBiiiMpi^ "PWWWW^BywBgUi WBWWT w

Proof; Let 0^^=11-, D. = D(p,Di ,) , i = 2,3,... • We show that

if H. =D.+ad. dT , then H.^n = D.^n + K


ßd.^n d^. . By Lemma 2.2 we
i i -vi-i ' i+l i+1 ~l+l-i+l "

have H = D(p,H) + 7 d* d^ . Now

ddT (D + addT) yyT(D + addT)


D(p,H) =D+p==- + add"1
T T
dy y (D + add )y

rn m rn '"PT ^TPT1
ddT D y y1 D + a(y d) (D y d1 + d y1 D) + ^(d'-y) d d1
= D + p ^- + add
dy y Dy + a(y1d)£:

-Dyy Da + a(yTd) (D y dT + d yT D) - a(yT D y) d dT


T
y D y
= D
y Dy + a^dr

T
* a(yT D y) y d yd
D + (d - -~^ Dy)(d - -=—- Dy)J (2.26)
y Dy + aCy^)^ ~ yDy~~ yDy"

By Lanma ^.2, d ||D(p}H)E . By (2.26) D(p,H)g || D g . Thus

cLJlDjß., j-l,2,...,i ^ d^JlD^^g


11 . But the case j = 1
Zi+l i+12i+l
is a consequence of Lemma 2.2 so the result follows by induction. D

Example; Equivalence results for a wide class of conjugate direction

algorithms applied to a given positive definite quadratic form can be

demonstrated by noting that at the i-th stage we find the minimum in the

translation to x of the subspace spanned by t-, ...,t. , and that this

sub space is aleo spanned by H-g-, ...^H-g. . Thus x.+1 depends only on

57

,x
■ JMMJMlliiiiliilifiMiiMfciiMttiihaiaiaaaaaaa
PPPP^WW'^'-'^MiMiitJI^IipiJJJluipii.i.^ wmimvwmH'vwm'i'n^^W'*r^'>'m-.f™rw'"™^"wm*'

x , ...,x and not on the particular updating formula for the inverse

Hessian estimate. If H, = I this equivalence extends to the conjugate

gradient algorithm (2.7).

Lemma 2.U; If the descent condition is satisfied at each stage then


the sequence g?D. g. , i = 1,2,... is strictly decreasing provided D1
A«! X **x ^
u
is positive definite.

Proof; We have (as g d » 0 )

u
(g^Dy)2
g D g = g Dg £-
V Dy

/ *T„ *N2
(g Dg ) u
gDg +gDg

(g Dg )(giDg)
L)
g Dg +g Dg

Thus

O
(2.2?)
«T # # -»(T^ ♦ T^
gDg gDg gDg

By Corollary 2.2, the D. are positive definite so that the desired


u
result follows from (2.27). G

Remark; This result indicates a potential defect of the DFP algorithm.

For if the choice of D, is poor in the sense that it leads to too

small a value of g? D, g, then the algorithm has no mechanism to coi. «ct

this, and must initially generate a sequence of directions which are

nearly orthogonal to the gradient. This must also happen if, for any

58

mmmm MWMifilitM^fihtiirii-'iiiiiiiiMiiaiiiBia^^
^mtm i iilMHIJHIJl.miviiiiii^fiunjiW.MWJ lJ.|.»»t^l'r^»l»»l>TVIIIi»JVW^riHivJ>|»i!FiBa»»lTOWmCT!JrJUü|UlilU|J»lul,i||i|~Mf'ii|.Mt'iMi;i^ wiror-

reason, an abnormally small value of g Dg is generated at seme stage.


A possible cause of such behaviour is poor scaling of the problem.

Lemma 2.5; Corresponding to the formula (2.lU) for updating H there


is a similar formula for updating H" . Specifically we have

H = D(p,H) ■'•+7liWWA (2.28)

where

,-11 - H--
D^H)-" „-1.,_!_.
1 x yy
ü
* i^r+V.)^ 1 /..^„-1^-l^JS
1 (yd H^+H'-'-dy1) , (2.29)
p
dxy d'y

T -1
diH ■'•d
V- = " " ' (2.30)
T
dy

w = y - — H^d , (2.31)

and 7 is related to g by

7 = - XllL (2.32)
TX -1
1+CTV H v

Proof: From (2.22) we have

■XM)-1 - (i -sffyJ) H-^I -7i 2 f)


and (2.29) follows from this by an elementary calculation. From (2.25)

K*-1 - (I -v/f yuT) (Hn^vf)"1 (I -y/f uy1) .


1
- (I -x/i^KH-
v
LL_^„-1VA-1)(1 -/Tuy^y
<" - - i+frv^H ^T "- y p -- -

(2.33)

59
iaiMiiiiiiiiilHiiit ''M'M'M«»^*1"*'**'*'"^^ 1 ...„.A ■■'*-'■"*---''-—■'-
ipillSPiPPIIII^^

Now

« -^w , (2.5^)

so that (2.28) is a direct consequence of (2.53) and (2.3^). D


u
Remark: If we take 7 = - -J- in (2.28) then we obtain
yd

i 1 1 yyT H^dd1^1
p yd d H d

-(I+zdV^I+dz1) (2.35) Ü

where

z = (2.36)

We have

D(p,H_1)"1 = G(p,H"1) + -^wwT


yd""

= (l+z d1)^"1* -|-wwT)(l+dTz) (2.37)


~ ~ yTd ~ ~ o\ m

as dTw = 0 .

60

- -»-'-^-'--i--..-^^vV-.^^...^.^>!-^-^.--^^-^-'- ...^^.U^J^M«!
' ''I 'I Wi i'.iiiwi"» W!MWglM»»ff,JiJIIIII,,ll||J|lH|l|4luitl|YuWiiiiiWu.u,ll|U,iilull|l>»J,ilui|^^ «■;'."-"»

To summarize these results we have the following:

(i) D(p,H) = (l+uyT)H(l + yuT) >

^(p^H"1) = (l+2dT)H"1(l+dzT)

(ii) DC^H)"1 = GCp^"1) + -jp WWT ,


y1d -"

GCPJH
-1
)"
1
= D(p,H) + ~ vv7 .

> update foraula update formula for Inverse

T T
dd1 Hyy^H yyT
1
D(p,H) " > p m ■ ijf H" * (7+^)^
p
s- (yd'H^+H^dy1)
dTy T
d y -"•
******** \

, .y/ H^dd1^1 ^ n tn m
GC^H" )
1
H+(p+T) «a i- (dy^ + Hyd1)
dy dV ','~ "" j

D(p,H) , GCp,!!'1) have been called dual formulae by Fletcher.

Lemma 2.6t Let A be a symmetric matrix, A = TATT where A diagonal


iff
as
(AIJ X^ »
i = 1*2, ...,n) , and T orthogonal. Let \^ f i « 1,2, ...,n
T
be the eigenvalues of A+aaa , then either cr > 0 and
Xj. < X* < Xi+1 > i = 1,2,.. .,n , or o < 0 and Xi.1 <\^<\ •

61

■MM mm - -»" ■■ - ■ •
JIJJ,.JJJililW|W|liliJlii.WJflW.iu.,IIIIMiilpp.yii>(.iiia,ii,t. i ,. , ,,.i .■Wn,p^.l.WyMpM,t »„ mmimmmm — ._

Proof; We have

det[A+aaaT-\I} = 1T(\-M
1
det{l + o(A-\l)"1(TTa)(TTa)T}
"- i=l - -

= ft (VM
1
(1+ <y(TTa)T(A - XI)"Va))
i=l
n 2
i
1=1 &. V^ ii-

and the desired result Is an easy consequence of this expression. D

In the following theorem we consider specifically the minimization


of a positive definite quadratic fozn. We assume that the initial
estimate of the Hessian H, is positive definite, and we make use of
the following sequences of updates for the current Hessian estimates

(a) H^-DCP,^) , 1-1,2,..., and

(h) H1+1 - G^H"1)'1 , 1-1,2,... .

Further we do not assume that the descent condition is satisfied.

Theorem 2.2; (l) Let K, - C^ VL.C1* , and let the eigenvalues of


K. ordered in increasing magnitude be M ' , J - 1,2, ...,n . Then

if \W > p then \^ > \^ > .. • > p , «hile If M1' < p then

p for
^ - ^ ^ ••* - ^ ' 1'2'**-»n * f11) !•«* Kj. - C^2HiC1^2 ,

and let the eigenvalues of K. be ^^ , J - 1,2, ...,n . If M' > p

then fä > }Sf) > ••• > p » while if ffl < P then

\^ <^ < ••• <P for i - 1,2, ...,n .

62

^M . i *i m 1^.———
■ IIIJII m I ■;.. ■ ^WL^jiw^i« mv
I . MI
•'•■"• p—rn ^ i i j-L IIJ^JJ^I . i , i-.iij umigii

Remark; This result Is Important because it shows that we have a

•weak' convergence result for these Hessian estimates when minimizing a

positive definite quadratic form even «hen the descent conditions is

not satisfied at each step.

Proof; Noting that C^ d = C~^'y = a,we can write the formula for

updating K as
T T
aax Kaa K
K = K+p T T
a a a Ka

We can break this into the two operations

T
J = K - and
axKa

aa
J+p
a a

Note that J has a zero eigenvalue, and that a is the corresponding

eigenvector. By Lenma 2.6 we have \(J) ■ 0 , and X., < X (j) < \

for J « 2,3j...>n . The rank one modification which takes J into


K changes the zero eigenvalue to p and leaves the other eigenvalues

of J unchanged. Assume that X.(j) < p < ^S+TCJ) then reordering the

eigenvalues in increasing order of magnitude we have XT ■ Vo-i (J) »


x
k « 1,2, ...,j-l , j - P » ^k " \(J) ' k-J+l,...,n. This
establishes the first part of the theorem. The second is demonstrated
*-l
in similar fashion by noting that K satisfies a formally similar

update relation. This establishes the result for the eigenvalues of K

and hence for their reciprocals. D

63

■-•■■--•
fffyP^^g-l^f ^^WI'^WM^^Jilll^Pg^Pg^^W*1 ^-T-.'?^'>'■''.W, ■ F^WSg»!!- i*..mAjg^psj^BBPgg|ffW^!WWPW»WWrB?Bf!' -t^^^^^-v^^rWi^H^'^V^^^^t^^^^^^^ ^r;r^^Trs7P'^^^?/^y^?JiL^.M^.''^^^yp*''.'!^.!i.ij'i'fft>iP'^

Remark; Note that both H. and H. t*re positive definite i = 2,5».

if H, is positive definite. In this case the result does not depend

on the descent conditions being satisfied.

Theorem 2.3: Let H be positive definite, and consider a step d in

the direction -Hg . Let H* = D(p,H) , H = GtaH-1)"1 =

D(p>H) + -J- vvT, and H0 = eH+ (1-9)H* = D(p,H) + -^r vJ1 . Let
yd yd

*
K = C1/2HC1'2 , and define K" , K , IL similarly. Let the eigenvalues
* *
.9 respectively,
of K , K , K , IC be \. , X. , *... , and X..
9
J=l,2, ..,,n. Let 0<0<1. If X. >p then \ > ^1 > ^^ > ^^ > P
6 ,9
while if X. < p then >**<>•* <K <>**< P • If ©^ [0,1] then \.
need not lie in the interval defined by \ and p .

Proof; It follows from the definition of H , H , and HQ and


* 0 *
Lemma 2.6 that ^1 < ^-4 < ^i > J = 1,2, ...,n , provided 0 < 9 < 1 .
The first part of the result is now a consequence of Theorem 2.2. To
Q
show that \. need not lie in the interval defined by \. and p ,

consider the example

1+e /E 0
C = H = I , p =1
/E e 1

We have ^ = T] , \2 = 1+ 2e - 11 where 7) = | (1+ 2e -Vl+p-e ) . Thus

T) is positive and 0(e). In this case we have

aT Ka V/e
,1/2
) i v = a —1 v
Ka =
K = C T =
aT a 0

61+

M^Mia^iiitariJiiahäatoitettiiiiiaaMi11 ni ■ - t.:-..^.-..^.^^ ■.aflty ■■ ■■--wi'"- -'.,*^- --^-"^^-^^fc' ->-"^■■J.,.-.--..^^.i^.^^a^t^^.


wmmnmm , y .....I.IU...II...
WCT^IWiWltPippilipiWI!iyp"WWgWW^fwwwaBPPPPBW»q!P^w^^

It is readily verified that K '{% °].scttet Kt\l \\


1+ 2e In
IL. = 0 i • ^o*11 cases eigenvalues lie outside the

prescribed interval. In the first case we have 0 < 1] , and in the


second l+2e > l+2e-1| .

Remark; This res-ilt shews that fi' gives the best improvement in the
eigenvalues < p , while H has a similar property for those > p .
This suggests an algorithm in which a choice is made between updating
H to Ö or H depending on some appropriate criterion. Fletcher
suggests that if T > 1 (that is, yTHy > ^C^y ) then H
MM M t»0
should
be used, while if T < 1 then ft is chosen. He has used this criterion
in an implementation of Goldstein's algorithm, and has reported satisfactory
results.

65

MMMMi ■ ■
Wm^m, UliJ|NIJ)l.iyiimiWjpypjiWBJjJBj,,im w _
pi WjWWUBWWHW ■ "". "I;;".-, IIIIUI Dip,!

Notes

1. For Ostrowski's theorem see his book 'Solution of Equations' vJ

(2nd edition) or Kcwalik and Osborne. Goldstein's theorem is from

his paper 'On steepest descent« in SIAM Control, I965. Theorem 1.5
is abstracted from Goldstein and Price, 'An Effective Algorithm V'

for Minimization', Num. Math. I967.

2. For background material see Kcwalik and Osborne. The form of the

update for the inverse Hessian is due to Powell 'Recent Advances in J

Unconstrained Optimization' to appear in Math. Prog. It is a

specialization of a form derived in Huang, 'Unified approach to

quadratically terminating algorithms for function minimization', .)

JOTA, 1970. The form (2.1U) and the result of Lemma 2.2 are

probably due (in the case p = 1 ) to Fletcher 'A new approach to

variable metric algorithms', Comp. J., 1970, and Broyden, 'Convergence

of a class of double rank minimization algorithms', JIMA, 1970.

Lemma 2.5 is due to Powell (to be published). The product update

form (2.22) is due to Greenstadt (to be published). Dixon's paper

containing Theorem 2.1 is to appear in Math. Prog. The significance

of (2.27) for the successful performance of the DFP algorithm was

noted in Powell's survey paper already cited. Attention was drawn

to the dual updating foimulae by Fletcher. This material together

with Theorems 2.2 and 2.5 are included in his paper already cited.

References

C. G. Broyden (1970): The convergence of a class of double rank minimization


algorithms, J. Inst. Math. Applic, 6, pp. 76-90.

66

MWta ■-- ' l


-'--

-"' ■ ■ i 11 - '
WPWP|IPWPBiip^^wwimwi|8wiMi.iii flUiiNjw^jiJWWiiiwiwiim^ww^^

R. Fletcher (1970): A new approach to variable metric algorithms,

Computer J., 13, pp. 317-522.


A. A. Goldstein (I965): On steepest descent, SIAM J. Control, 5,

pp. IU7-151.
A- A. Goldstein and J. Price (1967): An effective algorithm for
minimization. Num. Math., 10, pp. I8U-I89.

N. Y. Huang (1970): Unified approach to quadratically convergent

algorithms for function minimization, J.O.T.A., 5, pp. k05-k25.

J. Kowalik and M. R. Osborne (1968): Methods for Unconstrained

Optimization Problems, Elsevier.


A. M. Ostrowski (I966): Solutions of Equations and Systems of Equations

(second edition), Academic Press.

M. J. D. Powell (1970): Recent advances in unconstrained optimization.

Report TP.JOO, AERE Harwell.

67

MilMgiaaillftillgCMaam, ni,, n.....».-....^..--^.,.,....,,, „...,...■,... :.,.


»jniV-W^^Mf»ffwnwTWre,.r^.;i,i.i, ,,W,I;IW^

APPENDIX Numerical Questions Relating to Fletcher's Algorithm


Ü
1. Implementat ion

In this section we consider two questions relating to the implemen-

tation of Fletcher's algorithm. These are

(i) an appropriate strategy for determining \ to satisfy the

Goldstein condition, and


(ii) the use of the product updating formulae for the inverse Hessian ~

estimate.

In his program Fletcher uses a cubic line search to determine \ . Here

we use a somewhat simpler procedure which has the advantage of requiring O

only additional function values. Also we work with the Choleski decompo-

sition of the inverse Hessian estimate. This has certain numerical

advantages which have been outlined by Gill and Murray^ . In particular, O

it is possible to ensure the positive definiteness of H. , and this can

be lost through the effect of accumulated rounding error when direct

evaluation of the updating formulae is used. Another possible advantage J

of the Choleski decomposition is that we can work with an estimate of

the Hessian (that is H" ) rather than with H as division by a triangular

matrix does not differ greatly in cost to multiplication. We felt this -'

could well be an advantage in problems with singular or near singular

Hessians, in which case H would be likely to contain large numbers.


1
1
To Implement the line search we note that by Theorem 1.2 we should

test first if T|f(x.,t.,i|s.||) = \|f(x ,s.,1) satisfies the Goldstein condition.

This requires the evaluation of F(x + s.) , and this, together with the
_7 ~—~ '
-' NPL Mathematics Division, Report 97, 1970.

68

i^lj-Mln.^^.-,,,. ,.. .,.,_,^ .......1,. -,.,....^ M:...v„. , ,


W» »miiufiii.ninimiijii .11 l|l ■»IIJUII..P l|l|ll<ll|i|l|,^HIIJl||lt.lll.liIHMI».-«!IH'll".»lli»Vl^r»'1llilH»WWJ.II".|l,Mmi-'!illl, |.|Hi»»ll,VI.>H»; ■), .fflliliinilii i»1|tt.)i||l.;il|,.g|lii|l..T^.l.p1|jWi|ijlMff. i a^|.i,|ir,l.,.....|..,.^;rnM

known values FCx.) and F^x ) = 7F(x1)8i , gives sufficient information

to determine a quadratic interpolating polynomial to F . We write this

as

P(\) = F(x,)+Fl(x,)\ + A\c (A.l)

where A is to be determined by setting P(l) = F(x. +6.) . This gives


wX MX

A = F^+s^ -FO^) -F«^)

= F'(xi)(t(xi,si,l)-l) • (A.2)

The minimum of P(\) is given by

(A.3)

To test if this is an appropriate value we compute i|r(x.,s.,\) . This

gives

^(^fiA) = 2
i 2i \/
+ 1 l^v^i)
1
\? =:—
(A.10

where \ is a mean value. Thus, if F is quadratic and


ilf(x.,s ,1) <CT then \ given by equation (A.3) satisfies the Goldstein

condition for any allowable a (normally a is chosen small —

say 10 ): For nonquadratic F the test is satisfied if the relative

error in estimating ^ F'^x^^s.) by A is not too large.

This analysis provides the basis for our method which is given below.

Algorithm
(i) Calculate \\B^\ , set w = min(l,||si||) , \ = 1 .

(ii) Evaluate \|f = i|f(x.,s.,\) .

69

v
MafcMjj.|-a,M|aj^iJia^yHB||.i ^^ ,■,, ■...^^.., .,...^.... .,,J..,.
wmmmm ■■■■.nijiiiji..,.,!„„.„ mmmmmgmmagmm

(iii) If \ < a then begin p\ = \ ,

go to (ii), end.

(iv) If \ < 1-a 6° "to EXIT .


LJ
If \ > 1 then begin if \ > l/w go to EXIT,

\ = 2\ , end.
+
else \ = .5(^ PM •
go to (ii).

Remark
(i) Numerical experience has shown that the value of X predicted in Li

(iii) can be too small, and that an additional instruction

If \ < s^p\ then \ = B*pX


should be included. A value for s of about .1 has proved I 1

satisfactory (1/8 was ueed in the numerical experiments reported

in the next section).


(ii) It is readily verified that lim +(x ,s ,\) = 1 . Thus the u

algorithm can be expected to return a value of X satisfying the

Goldstein condition unless i exhibits rather pathological

behavior.

We write the Choleski decomposition of H as


T (A. 5)
H = R R
where R is an upper triangular matrix. Thus we require to find R

such that
*T * *
R R = H (A.6)

where H is given by either

70

■MMMlkMIHMa ■UMMflU
—"- ->""'*Mi«i'mrr -»rlHT-irHriHi I ^it^^^^^^gj/jjgj^
^fffßwm*&wwwwwww*i*vi\*v***w:;'»ri ww^ mwmfm*ww rsvpy^TT^-^r

.^«TT
(i) H = (l + uy^rRd+yu^ , or

(ii) H* = (I + uyT) {RTR + t T w1}(I + yux) .

The second case can be reduced to the first if we vrite

ftT
rR = RXR+ST w" (A.7)

To calculate R note that


R
RTR+5TvyT = [RTl/^v]
/fFvT

= [RT | /? v]QTCi (A.8)


/TT v1

where Q is orthogonal. Thus we seek an orthogonal matrix Q such

that
r R 1 r.1
R
(A.9)
Q
1 0
/Fv
Let W(i,j,{p,q}) be the plane rotation such that W(i, J, {p,q])A

combines the i-th and j-th rows of A , and reduces A^ to zero. It

is necessary that p be either i or j . Then Q is given explicitly

hy
(A.10)
Q ="n'w(i,n+l,[nfl,i})
i=n

It is readily verified that the zero introduced by each transformation


is preserved by the subsequent transformations provided they are carried

out in the order indicated.

71

- - ■ ■"—'^- ■:■'-■ - ' '■" -'- - —■ ■' . IS. ■ . ■ ■-


ppijppiPiJipffiPBiwiipwpwpiiiww immmmmmmmmmmmmtiMmmmmmiimimmm - Wfmtf^ffwfHw^nfiwwg^ffiigiigfim

Consider now the problem of consti-ucting the Choleski decomposition

of STS where S = T + ab , and T is upper triangular. This corresponds


ü
to our problem with the identifications T = R or ft ,

a = Ry or Ry , and b = u . In this c-ase the decomposition is done in

two stages. Our method uses ideas due independently to Stoer, Golub,

and Gill and Murray.

(i) We determine an orthogonal matrix Q. such that

Q^ = ||a||en . (A.ll)

If we set

^ =J|Tw(i,n,{i,*}) (A.12)
i=l

where the * indicates that the rotation is defined by being applied


m
to zero an element of a vector, then (^S = Q^T+Hal^b differs from
O
an upper triangular matrix only in having possible nonzero elements in

the last row.

(ii) To complete the determination of R we sweep out the elements ;>

in the first (n-1) places in the last row of ^S by plane rotations.

Thus R is given by
.i
R* =Q2(Q1T+lHlenbT) (A.15)

where

Q2 = TT W(i,n,{n,i}) . (A.llO
i=n-l

It will be seen that the updating of the Choleski factorization can

be carried out very cheaply. Depending on the update formula used, the

major cost is either 2n or 5n plane rotations. It should be noted that

72
ffPffWPffff'.^'." ■ ■■li|i!fini^iww.i.m»yiiwi,fvi^""-^" BPPIPP W.riT»"1? [Wr'm^mwiVWp*'* "i fT»irmr» W"1* 'WfWW^W-^T^^^miVTW^v/fWim^

yTHy = llRyll2 a (A.15)

is required In the update formula. Thus a can be already available

for S .

2. Numerical Results
In this section we report the results of numerical experiments

carried out to test some of our techniques. We consider four line search

strategies:

(I) a standard cubic Interpolation procedure with \ = 1 as Initial

search Interval,

(II) a standard cubic Interpolation procedure with \ given by the

step to the minimum in the previous line search,

(iii) a strategy for satisfying the Goldstein condition in which X

is reduced by the factor l/8 if \|f < a , and

(iv) the method for satisfVlng the Goldstein condition given in the

previous section.

Product form updating for the Choleski factorization of both H and

H* = G has been implemented, and the results obtained for each are

given.

The problems considered include:

(i) Hilbert: Minimization of a quadratic form with matrix given by

the Hilbert matrix of order 5 . Here

(x1-l)(xj-l)

-■iki i+J-1

and the starting point is given by

xi = - Vi * i = 1,2,...,5 •

75

iMMMrifeaiHAdiawu —■-— -'■" ■kA^-^j-^j-.. ■.-..■ j. g^-mf .. ,.. ■:.....■


■^^^^^W'HWWI.IIIHUIiiaililJ'lll'l—'*—'■■ I
^"""^"" II | ,._ |, . I | .1.11., I I 11 1,1. 11 .1) .11. , JIJI) .1 . I LIU 1 jll HUI,. | .|l HHjlHm.-j

(ii) Banana(n) : The Banana function In the cases n = 2 (the

Rosenbrock function) and n = 8 . Here


n-1
F= E [100(x -x^)^+(l-x)d} ,

and the starting point is given by

x. = -1'2 If 1 odd, otherwise x. = 1

(111) Woods: Here

F = 100(x2-x^)2+(l-x1)2+90(xj+-x^)2

+ (l-x5)2+10.1((l-x2)2+(l.xu)2)

+ 19.8(1-x2)(l-xu) ,
Li
and the starting point is

xT = [-5, -1, -3, -1} •

U
(iv) Singular: Powell's singular function is designed to test the

performance of algorithms on a function with a singular Hessian

at the solution. Here


0
2 2
F = (x1+10x2) +5(x5-xu) + (x2-2x5)^+10(x;L-xu)^ ,

and the starting point is

xT = { 5 , - 1, 0 , 1] . o1

(v) Helix: Here we define

„R = {X
r 2^ 2-ll/2
1+X2} ' , :).

T = if x1 > 0 then 5- arctan I — 1

0) • ■■
, (
if xn < 0 then yr- arctan
J. cTT

71.
:>)

äa^UbUtaÜkim^M^^i..^^ ^^..-.i^., ^-1. .^^^.^■||| .„^ .. ..,, ■■■■^■'-■-'■aiiiiimnhBifc» ■■ —üifMB 1 tarn ' r,
P!^P^WW!^^WWP^^^gWWfWP^''l^|i^ll|.,w^^mw'^

and set
2 2
F = 100((x -1JOT) + (R-1) ) + X?

The starting vector is

xT = [-1,0,0} .

Numerical results are given in Table A.l. For historical reasons,

the test for terminating the calculations was based on the size of ||s |1
-8
( \\s. jj < EPS/n with EPS = 10' ). This proved reasonably satisfactory

for all cases except the singular function — in fact in all other cases

the ultimate convergence was clearly superlinear, and the results were

accordingly only marginally affected by the size of EPS. In the case

of singular the convergence test proved difficult to satisfy in most

cases (indicated by * in Table A.l), and these computations were

terminated by the number of iterations exceeding the specified limit.

However, in all cases the answers were correct to at least six decimal

places. There is some variation in the H and G columns. This shows

the effect of rounding error, as these would be identical in exact

arithmetic. The most interesting case is the H column in both cases

of the Banana (8) when satisfying the Goldstein condition. In these

cases both H and G foiroulae produce very similar results until the

10-th iteration at which point the H formulae produce much larger

reductions in F than do the G . However, this progress is not main-

tained and at the 20-th iteration (in the case of the line search

algorithm of Section 5) the H matrix becomes singular and the iteration

is terminated. A restart procedure could have been used at this stage.

The numerical results indicate that the new algorithm is promising.

In general, although more iterations are required, we make significantly

75

. | ........ ^JJ^JIJI . n .. 11 - - ..^ ..-._.


!,-.uM-..^f-inB^.ii.. nmfBuwm^f^m^ *■■ » .»»£. WCT^y '1 ■■■ ■ —Tf-
■P*'.\* *'.»W""^« ■■■JiL ini-i

fever function evaluations In comparison with the routine using a


standard line search. As only one derivative evaluation is required in
each Iteration, the real saving can be considerable. We note that on
the basis of the evidence presented it is not possible to draw conclusions
as to the relative values of the H and G algorithms. Hoirever, that
both manage to produce very comparable results provides some evidence of
their stability.
The program which gave the results presented here is coded in
ALGOL W for the IBM 560/67 at Stanford University. The calculations
were carried out using long precision (lU hexadecimal digits).
A FORTRAN version of the program has been developed at the Australian
National University.

76

-
'J ill,HPWWfflfip»|Pii ■ i. i i. ^i ^mjm.i !■ . , W^fj p J I IM. 1P>.—. J--_»"H^HHW..-»^h..'.-.-

V fv K 1
g
o
•H
■P
\ H
l
\ «5
\
\3
\ -*
|\« \ CO
V
1 o

\
Ä
\
s\ A Vo\

1 \M

k« k
0)
\1 \ J
cJ
^
•y g V
^ » V \ \

i
\ ! ^

r
s
\
^ \r^ \p \ I
C
\ \ K*

«A N
•H
« \ ^
\ 00
CJ
V V i\3
o
A A §\ §\
%

> V V \*
x_^ \ \ \ J
-

\ «o
as V
\ \ s V
\ Ü3

s\ Ö\
^\ S^
*

\ \ \ \* \
\ o> \ "^ \m
o \ -*
\cvi
\ K^ \H
Vy
\ Q
§
v" v v 1
H
II
vo\ -* \
CM \
CVJ \
t- \
ON \
\
s\ t(\ \
\
P
O

M ^ k V \
^
K
V
\ "A \ "^
V
\J O

0)
1 0)
10

1 fl>
vt> \
\
-* \
w \
\i
J-\
^~ \\
A w \
ir\ \
•^ \
\

\l
\

.
5
' H
-P
-P
s.
\
\ o>
\\
\ -*
\ t-
1
\ -*
\\ H
t- \ 00
\ t>-
\*
\ K^
\ t^
\H
V
\ w
\ Q
1 I
(0
y
•H O \ \
•^
O
3 >
«

Pi
\
VO \

\
^\
\
\
l/N \
A R\
\J
\

'*
\
ir\ \
\
\l
i §
•H

I
(U
V*
\^ \ 0
\5 1
^ 1
\H
X \ \ V o
*,
^ VO \ i^\
A R\
\ '
u

04 ! CO 1
s
*^ (
ati
tV 05
1
H j •s
S
m
i i
3 '
-8 j g)
a
*n I v
En

•* O 1
X j
! 1 * 1 m x

77
wm^mmmmmmmm'mmm i^qnpipRpipiininpii. i wm WPPWP^PPWBP

III. Barrier and Penalty Function Methods

78
IX

. ■-- ---
iVURllllllllMU ^BiPffPWBWiBP^ i^u'^ff MyWP^^ ''^^^^.^t1 ^/MW'-rw^i^Tf^"i^^^i!ii^w^^jj}pTT"Tw

1. Basic properties of barrier functions.

Consider the inequality constrained problem (ICP)

min f(x)

subject to gAx) >0 , i = 1,2, ...,m (iclj ,

where we assume (as before) that f-, g , iel , are in C . We also

assume that S = (x ; g.(x) > 0 , iel-,} is compact, has a nonvoid interior

S , and satisfies the regularity condition that every neighborhood of points

of S contains points of S- (this precludes S having 'whiskers1). If

xrS and G.(X) = 0 for some i then it is assumed that x/S„ .


i ~ ~r 0

Definition; 0(g(x)) is a barrier function for S if the following

conditions are satisfied.


p
(i) 0 >0 , xeS . If X closed set, X c S0 , then 0 € C on X .

(ii) 0 -• > gi -»0 , ie^ .

(iii) *<•*-
og. < 0 if g.
v
' c>
i < o.
K
i whtre the p.
K , iel. , are fixed positive constants.
i' 1 ' ^

(iv) Ij^-j bounded on N(x,6) if gi > 0 on N(x,6) .

m
e Example; (i) 0 = J^ l/ßi(x) (inverse barrier function).
i=l
m
(ii) 0 = £ (log(l+g (x)) -log g (x))
i=l

Remark: In the second example the term with argument 1+ g.(x) merely

ensures that the positivity condition is satisfied. It could be


replaced by a bound k. for log(l+g.(x)) on S if this is known. In

79

M^MMU^M
M^l^m^p^^^^^^^^^^^w.ii» ■iil,.. i.-..-,..,■ n.miwmfwtfp ipu-i^nimninijuminiün.i, I.I.IJ»-HJ .,.. »..■■IU,^,,.

practice it is of no consequence. The barrier function


m
0 = T* (k. -log g.(x)) is called the log barrier function.
i=l i 1
~

Definition; T(x,r) is a barrier objective function if

T(x,r) =f(x) + r0(g(x)) (1.1)

where r > 0 .

Lemma 1.1; 3 x = x(r) e Sn such that T(x(r),r) = min T(x,r) .


"" ~ ~ u ~ xeS ~

Proof; T(x,r) is bounded belo*/ on S , and T(x,r) -» + • as x -. öS .


a
Lenina 1.2; Let {r,} iO , and let x(r.) = x . Then

(i) [T(x ,r.)} is strictly decreasing,


~J d

(ii) [f(x )} is nonincreasing, and

(iii) {0(x.)] is nondecreasing.

•J
Proof; Let r.X < r.J then

< fCx^ + r^gCXj)) ,

< fCx^ + r^gC^)) .

This demonstrates (1). Subtracting the inside and outside inequalities

gives

(rj-r^g^)) ^(rj-r^CgUj))

8o
:>

i
—~*—-" . . _
|WI»I...»J jm j. m.m.jupMiu, ,i.u muni11,,,, „. i , u,.,,, ,„,,. u. „j.,,,,.,,,.

which gives (ii). From the first inequality we have

KtixJ . D

Remark; If T(x,r) is strictly convex, then all inequalities are

strict.

Theoram 1.1; The sequence fT^x^r^} converges, and

lijn T(x ,r.) = rain f(x) .


i-«) ~ ^^ xeS

Proof; By Lemma 1.2, fTCx.,^)} is decreasing and bounded below and

hence convergent. Let f = min f(x) , then


xeS

Ttx,^) > f(x) > f

whence
lim T(x,,r,) > f (1.2)
i -• -1 1

Now let e > 0 be given. Choose XCSQ such that f(x) -f < e/2 (this

is possible because of the regularity condition on S ), and choose ^

such that ri0(g(x)) < e/2 • Then

min T(x,ri) < TCx,^) < f + e

whence
llJt» T(x ,r ) < f . a (1.3)
i-« ~

Corollary 1.1; The limit points of [xj are local minima of the TCP.

81
I^I^^^^^^WWWWPWIHWWilWPPB^^ ,
!! WWWBWPIBHIIIWI

Remark; The generality of these results should be noted. For example,


we have not required S to be Lagrange regular at the Units points
of (x^ .

Definition; Q(x,r) is a separable barrier objective function if

Q(x,r) = f(x)+ 2 ^.(g^x)) = f(x) + rT0(x) (lA)

where r > 0 , and 0. is a barrier function for Si = (x ; g^x) > 0} ,


i = 1>2, • • »jin •

The previous results are readily extended to this case and are
summarized in the following theorem.

Theorem 1.2; Let r. > r..., , i = 1,2,... , and 11m r. = 0 . Then

(i) min Q(x,r.) is attained for some x. c S ,


xcS - "^ "^ 0
(ii) te(V]0} is strictly decreasing, WO} is nonincreasing, and

(iii) lim Q^O = f* » and the limit points of [x^] are local
k
minima of the TCP.

Remark; Given a sequence of positive vectors tending to zero then it


is possible to select a subsequence which is strictly decreasing.
Conclusions (i) and (iii) remain valid in this more general case.

82

-—
2. Multiplier relations (first order analysis)

In this section we assume sequences {r.} i 0 , {x. } -» x . The

condition that T(x,rk) is stationary at r. gives

VTCXj^) = vfC^) + ^ rk ^ 78^)

m
- k
= vti^) - f uj Vg^) » 0 (2.1)

u r Nofce tbat u 0 r 0 k
where i = " k ^- (8^)) • i " ( ^ ^ ' -• '

if i / B0 , and u^ > 0 for i e B0 and ^ > k hy the conditions


defining barrier functions. Equation (2.1) is formally similar to the

multiplier relations given earlier (MP(3.2)), and it is comparatively

straightforward to deduce these relations from (2.1) in certain special

cases. We assume that B. - {1,2, ...,t} , that the rank of the system

of vectors 7g.(x ) , ieB,. is s <t , and that 7g,(x ),...,7g(x )

are linearly independent. We define matrices (Mx) , C2(x) by

'C1(C1) = ^(x)1 , i = 1, ,...,s , and K^C^ - Vg^x) ,

i = 1,2, ...,t-s , and vectors u^^T « ^,...,uj} , u^1 . (u^.1--«Ug} •

Lemma 2.1; If (u } is bounded then the Kuhn-Tucker conditions hold


*
at x .

Proof: Fran (2.1) we have

7f (
!hc)T = C (
i !5k)!tik) + ^b)*^ + 0(rk) • (2 2)
-
The linear dependence of the set of vectors 7g1(x ) , ICBQ , gives

85
C2(x ) = C^x )R (2.5)

so that (2.2) can be written

7f(x/ = C^x^) Cu[k> + Ru(k)} + {^(XJP -C1(xk)R>u^k) + 0(rk) . (2.U)

Provided k is large enough the rank of C:L(x.) will he s (see

MP Remark following Lemma 3*3)» Thus

üik) + Bi2k) =
tcÄ)\(ifk>5'^i^)TWi)T+(c2(St)-ci(5t>R5u^) + o(rk)}
(2.5)
(k)
As uA hounded we conclude from (2.5)
(k)
(i) u^ hounded, and

(ii) lljn u^k) + Ru(k) - [C1(x*)TC1(x*)}-:iC1(x*)T7f(x*) .


k -•• ~ ~ " ~

A8 lui } > (u2 } are


bouncl«^ wd nonnegative (at least for k large
enough) this property is shared hy the limit points of the sequences.

Consider subsequences tending to u. , u« respectively. From (2.2)

we have

7f(x*)T - C1(x*)u* + C2(x*)u*

or
u
7f(x*) - E i ^(x*) . (2.6)
- iCB0

Thus the Kuhn-Tucker conditions are satisfied. Q

«
Corollary 2.1; If the Kuhn-Tucker conditions do not hold at x ,
k k
then fu.} , {vu) are unbounded.

8^
mm" pppiipmBpi^n^.wc^iTD^^iw.ii^i, ^-»jpy,-"^ »'.".^ «».;. ».i|;..i..w?w UPPPWBH >m« jmawmngawiwiw

Corollary 2.2; If restraint conclition A holds then the VgAx. )



are linearly independent for ieBn . In this case {u0} is null,
and [uk } converges. If restraint condition A holds then the multipliers
in the Kuhn-Tucker conditions are uniquely determined.

Lemma 2.2; If restraint condition B holds then {u^ ^} is bounded.

Remark; By MP Lemma 5.2, this implies that {u^ ^} is bounded for the
convex programming problem provided S has an interior.

Proof: If restraint condition B holds then 3 d such that 7g. (x )d > 0 ,


i = 1,2, ..^t . From (2.2) ve have

t
i=l
»i VgjC^d • Vf^d+O^) (2.7)

As 7g1(x)d is a continuous function, ve must have u. > 0 and


7g1(xk)d >0 , i • 1,2, ...,t , provided k large enough. Thus (2.7)
gives

f uk ^ *f^+0^ (2.8)

This relation shows that the u are bounded as k -• • . D

Remark; The results of the first section shoved that convergence of


barrier function algorithms can be proved under very fev assumptions.
The results of this section show that valuable structural infoimation
on the problem is available as a by-product of the conputation. Note
v
that the condition that the u. be bounded is a veaker restraint condition
than either A or B.

85

v
-

mm
Bi^^^^^W'^^P|BW^W^^g»W"-H"l'.»l'"U,|g|',-|i'11 miiJi^njlH, HI ipppp i,.iw^.i-A..,;i»..^.,Wr.?^..i;uwiWi....1.L..ii.iii..,,,iii. uijui..... .,>.!. . ,»..,....1.1....,,..,,-

5. Second order conditions.


Consider now a barrier function 0 and a sequence (r.) i 0 such
that {x. } -♦ x , {u. } -» u . It is convenient to assume the following
properties which are satisfied by all barrier functions of practical
interest.
o

1 og^

r
(11) k 2 W -, + BB
* k -• , leB0 . (But see Example U(ii) p. 100
6
i for qualification.)

Lenana ?.l; If the matrices VT^x-jr.) are positive definite for


k > 11 and the 7g1(x ) , ICBQ , are linearly independent then

vT7 l(x*,u*)v > 0 , Yv ^ 0 such that 7g1(x )v « 0 , VI e B0 .

Proof; Differentiating T(x,r.) gives

which can be written

2
r +c D c T r
^^ k) -^5cük) Ä> k i(;k) %i€lTt-BR kM^i^V^)
dg 1 0 1

where

86
mmmmmm mmimm |lt.MP:SIJ...JIIM^»»!fm»M^tlj.i,m.i.liw.«WWiW.JU.P»igt^^^

Left

P (5.2)
k = ^^^l^)

then, for arbitrary nonzero v such that (I-P.)v ^ 9

0 < vT(l - Pk)^(^rk) (I - Pk)v

= vT(l - PJP^K^,^) (I r Pk)v+ o(l)

as

The desired result follows from this on letting k -♦ • . G

Corollary 5.1: If in addition to the conditions of Lenma 5.1 we have

also strict ccraplementarity then the second order sufficiency conditions

(the conditions of MP Lemma U.J) hold at x .

Remark; The problan of generalizing this result to the case where the

active constraint gradients are not linearly independent is the following.

In general, when k < • , ranklC^x^ ^(x^ ] > s . Thus

V = ; 76 (
k ^ i ^- = 0 , Vi c B0} c Uk - W ; v - (I - Pk)u , u€En} .

We have

llm Uk - V ' [v; vs^ix )v - 0 , Yi £ BQ)


In —

It is not difficult to construct examples in which 11m Vk c V .

Consider C^j^) - ^ , Cg^) - t^+e^x-x )e2 . Then

87
PÜP! ffgTffPg^^jBB^figjWg^^g^.^wm^^iitM^Tf^

V. = [v ; e^ v = ej v = 0} = lim V. c V* = (v j e^ v = 0} . The

argument of Lemma 3.1 shows only that v 7 X(x ,u )v > 0 for

v € lim V. .

Lemma 3.2; Let

W = U+7VVr ,

N = tt ;||tl| = 1,/t =0} ,

M= (uj||u|| = l^eN1} ,

v=mintTut>0 , a= min u U u , n « min jlvüll > 0 ,


tcN ~ ~ ucM ~ ~ ueM ~

Tl = min t Uu , p = min(0,TO
t€N , ueM ~ ~

then W is positive definite provided

7 > Q ^V^gV . (5.3)

Proof; Any unit vector w can be written

a u + ßt where ucM , teN , and or + ß »1

Thus

/ww « a2uTUu+2ClpuTUt +ß2tT Ut+TQ^Il/uJI2

> (^(a + 7ii2 - v) + 2|a|(l -a2)1^* v

>0?{a + yp2 - v) + 2|a|p+v (5.M

88
.) i

' — -. -—.^ - - - t
^HIPIPBilPimPi !>
ffJ'"'""■'»■ »■ '■lP^'lJ,W^<!ViJi|^Wiilaiil^.wrPliiM^Jii»piw,t«l|yi^WJHi|^ H..W..JIHI ■i.-iaiiygwn»

as p <0 . Provided 7 > ^^ (a weaker condition than (3.5))» the


V-
right hand side has a minimum at a = ^= > 0 . The value at
0+7^ -v
2
this minimum is ^ + v which is positive if (5.5) holds. D
0+7^ -v

Corollary 5.2: If W=U+VDVI where D diagonal, and if the


conditions of Lemma 5.2 hold, then W is positive definite provided
2 2
min D.. > *—^ " v . If D is positive definite the result holds
i vy,
provided the smallest eigenvalue of D satisfies this inequality.

Proof; We have

wT VD /w = O^u1 VD V1 u 'C?' llh,w^n


2
> min D..
11
C^W^MW
-
.
i

The result now follows as fron equation (5.^) above. Q

Lanma 5.5; If the second order sufficiency conditions hold at x


then 7T(xk,rk) is positive definite for k large enough.

Proof; We have

V cU = (v ;7g-(x )v ■ 0 , ticB such that u. > 0} .

Thus the second order sufficiency conditions imply that


vT7 X(x ,u )v > 0 , Yvc V such that y /t 0 . From rorollary 5.2
it follows that 3 D0 such that V2t(x*,u*) ♦ ^^^(x*) |C2(x*) lD[C1(x*) |C2(x*) ]T

89
FU^^pgg^ppi^^p!»*» m «ism. ««...j. . | .LU-.L. .-. .u-..,^— ^ TT , ....«^p-, r- > ■"•^ -

is positive definite for D > D- . By continuity this implies that the

corresponding matrix evaluated at x. is positive definite for k large

enough. The desired result follows from this as (Dv)..* -• • *


i = 1,2, ...,t as k -» • . G

Lemma 3.k: If U is nonsingular, D diagonal, and V of full rank,

then the system of linear equations

[U + VD/JX = vy (5.5)

has the solution

x =U"1V(l + M)"1My (3.6)

where M = (V1!! V)"T) provided I+M is nonsingular. A sufficient

condition for I+M nonsingular is ||M|| <1 which is satisfied if

min |u..| is large enough,


i

Proof; The result follows on substituting (5.6) into (3.5)» Ü

Remark; From (5.6) it followB that

x ~ u"1V(v:ru-1V)"1D'1y (5.7)

as rain |D.. | - • .
1X
i

Corollary 5.5; If the right hand side of equation (5.5) is z ,

a general vector, then the solution is given by

x = U"1V(I + M)":1>I(VTU"1V)"1VXU'12

+ u"1(i-v(v;ru*1v)'1vTu":L)z . (5.8)

90

■MtaMBMMkM
|^l^(W!W!IIlp>iB!!P^W«plB»l»|IBWr^WW^ .. i J..1JJ1111111., j.mpi

k. Rate of convergence resailts.


In this section, rate of convergence estimates for barrier function
algorithms are considered. Unless stated otherwise the conditions
imposed in Section 3 are assumed, together with the condition that
llvg^x^H/O, i€B0 .

Lemma h.l: Provided {u } is bounded then

*II2
+ 0(n1aX{llxk-xr,r k||xk-x||}) (lf.l)

Proof; The result follows by taking the scalar product of (2.2) with

x, - x and identifying with terms in the Taylor series expansion. D

Definition; We say that u. is SO(v.) (strict order v, ) provided

(i) \ = 0(vk) , and

(ii) 3 1^ < • and p, > 0 such that k | > ^|v, | for k > It. .

Remark; (lul) gives an error estimate provided the remainder terra is


small. A sufficient condition for this is

f^-fOc*) =SO(|lxk-x*||) . (U.2)

u
This implies that for at least one 1 , i S*^) = ^(ll3^ "x II) •

If (^.2) does not hold then for i = 1,2, ...,t either

(1) u -• 0 , k -♦ «B , or

(ii) S^Xj,) =0(11^-^*11) •

91

t^ct» ..^„M..^:..^^^,,^,.....^^.;,^,.,.^,^^.^^^.
ppppp^^ppawiBppiwiiiiiiMpiiw,!»^ ' ■" pi».ii»n.n».ip»iiiiwii»'j^j".i*"ii»mi!""^ «jgwan i. H.iiuuiwimi»!». mmiiP«iu*m\mmMmmm<.t imm 'Mimmmmmmiimmmmmmm^mmKl

If (ii) holds then the approach of x. to x is tangential to the


■x- _—*
siirface g.(x) =0 at x = x .

Lemma ^.2; If the ICP is convex, then


are d:ual
(i) ^^ ^ feasible, and

(ü) fc^-ftx') < i__l


Y. *U±i\) •

Proof; The dual feasibility is a consequence of (2.1) and assumption

(i) of Section 3. This follows directly from Wolfe's fonn of the duality

theorem (MP Corollary 5*1) • We have

^VV
~ ~ =^ t ui ^^
~ - i=l ~ $
x(x u#) = f(x )
~^ ~ * ^ ^h) (
^5)

which demonstrates the second part of the desired result. □

Example; For icB- let S-i\) = S0(||XL -x |l) , u. > 0 .

(a) inverse barrier function. We have

U
i ^k/^^

whence

e v(x.)
i ~"V =^/r~^uf
"*k' i •

This gives

H^-xl =0(rV2) .

(b) log barrier functions. In this case

u. = r, / &g.(x. )
i k' i ^k'

which gives

92

mMmMji^mv^i^tMijAi^^^ ■.:..-. ■^M..^..r..^M.^^.... ^.y.^...,^:.^^^.^^..^^^^!^.,^ .ÄÜÜlteJÜti:


mammmmm^mmmmmmmmninmß i i m*im*^*mi^^*mm*^**'*v^^mmm

= r
h^ k/ui
and

115,-xl =0(rk)

Thus the strict order condition peiroits us to deduce a rate of convergence

result. We now show that the SO condition is equivalent to the

condition of strict complementarity for the inverse and log barrier

functions. In these cases the remark following Lemma h,l gives us a


geometric interpretation of strict complementarity.

To discuss this equivalence, consider the following system of

equations which define x^ and u. as functions of r. .

m .
=0
^-^WbJ
1=1
'
and
u =
i/|^^ -rk ' i=l>2,...,m . (^)

If the Jacobian H(x,u) of this system with respect to x , u or an

appropriate transform of it is nonsingular then we can study the behavior

of x(r) , u(r) as a function of r by integrating the system of

differential equations

— "
dx
0
dr
H(x,u) (^.5)
du I |
e
_ dr _ u. " -

where e = [1,1, ...,1} . We have


. :

93

„^ '^^—i'*a*^-ai—■"--"'■ ' ' -■■- ».....■■■. a i


p^wfB^Pww*1 f^m w■■WIJNW.T!B'FIP?LI ^.■'■■i-'.' "i '!■■ ■■! -iMM^r^T.:... lamyp^pBH^yjiBBB»k- y>- JH»iiw.^**^t f,'«'JJ,'■ f- »i i MJ wmweBffww ^^v-1"'■■".■»'.gw —■ wwTWPwyi—iiwp ww -L'-^"-' w *wy )W"*" if " m
^^-M¥Vm^W!^n^

p
7 X(x,u) 7ß.(x)T... 7^(x)-

yt
m ™—2 7g
72 i^)
ög

H(x,u) = ik.S)

_m o_^
Vg
m^)

C Kf*
%

Let D be the diagonal matrix

D = (^•7)

w
m

where

■■ ■ Wi (4.8)

then

T
7 x(x,u) - tfe-^x) " ^(^

u yg
i i^^
DH = (M)

L V^m^)

9U

- ■ ■- '—--—IM 'I a/tVhil • V •~1 IJ


- ' ' — • ■■ '■■•■. - .-.^.i.- ..l^.M^^-i:/:-. .-.^^JiAu^Vilio™ «ouAJ
-^WS-'-'■ ',"?" r^mm&mm^Sf^ntWfft i i.|titfng?y^wwü^!J'ww.|l-iiM^>-...t. n-.i' ■W^WMU'-I.' »■'l—'" ^tww-iyiiiB^f,..u-, ..iiiu^Li^ffutiiuj.|i|.p_wj|pjHpMwijff)iw^yfyj^^

Provided -ß- /^-| - 0 , k -. • , leB-0 then DH will be non-


^V *£
4.
singular for k large enough provided J(x ) is nonsingular (see MP

Lemma h.h). This implies

(i) the second order sufficiency conditions hold,

(ii) the active constraint gradients are linearly independent, and

(iii) strict complementarity holds.

In this case
- -
dx -
0
dr
DH (^.10)
du
w
dr
u"-

whence

x, - x
~k
(U.n)
'[:]
= "J (DH) dr
u -u 0

provided the components of w are integrable.

Example: (i) inverse barrier function.

We have

=
V^ "2 ' W/ög? = 2g
i
In particular, it follows that DH(x.) is nonsingular for k large enough,

while -w. = TT vuT/r . We have


1 2 i'

i
95
i

l)gllglB|llllMiM|liaaalBB[B|M|Btt||||^^ fi'iailiMMlMiiMMmmaa-rii-- n ■■■"--■--■-■■■■'■■■■■■ ■-


*aw!rw»»P"WTT''"'"«"^!i«TOTO»»H»!w>nTT^

ox
0
dr
1 (U.32)
(DK)"
=
du 2r^

dr •

1/2
whence, changing to r ' as independent variable,

dx
0
= (DH)-1 (U.15)
u
du

Thus we have

0
x(r) -x
= rVVCx*)]"1 + oCr^2) . (U.ll^)
u(r) -u
NG*

(ii) loß barrier function. In this case

= 1

so that (^.10) becomes

" dx "
" 0
dr
= J"1 (^•15)
du
e
_ dr .

96

■ ■'- - - -■ ■ -"-- ■ -
""S^ '!'VM»I'?"Ww"">«w |p """- L
'■■' '■ I »."".'.»«■'Jim - lij.^!.ilwlnilJt»,"-J-'.iiJH,"-.ipi».iff--Ji.'Jm-)m'».'i m.mp.ii i .1 .ij.n MIM g;»;ppi||»^BBiwpiiggiWj»wiwllW|iwipp^

In particular, x(r) , u(r) inherit the differentiability properties of

f and g , i = 1, ...,m for r small enough (if ^geC1* then

x,ueCP" ). Thus

x(r) -x

u(r) - u -■■-[:] + 0(r2) (1^.16)

Remark; These results can also be derived by differentiating (2.1) with

respect to r . We obtain

^(rhr)^ = - l ^-7gi(x(r))T . (U.I?)


i=l ^i

In this case Lemma 3.5 guarantees that 7T(x(r),r) is positive

definite for r
small enough, and Corollary 3.5 can be used to give
du
the solution to (^.17). Note that (^.l?) results if ■— is eliminated

fron {h.2D). i i

We can now proceed to the main result.

Theorem ^4.1; Provided j(x ) is nonsingular, then the strict

complementarity and strict order conditions are equivalent for the

inverse and log barrier functions.

Proof; The argument is essentially the same for both barrier functions,

but is simplest for the log function. Thus only this case is considered

here.
so
For the log penalty function u = r-J&AxJ) ^at i

97

■•*JJai-^-—'"■■' :"--- -- ■..-■■■'-.-..■.---^■.■•..■.■;„r,aivv- Miaaa..^^a**u.


mm».mm wJiHiima^umi nMi«»yu»*nu.. . . .m il|iiiiii,jilui..tiiM,miJMi|iii,..,i,^w.w,. M u,,, nj, „u lmn,.[ummmm*mmmn „i,. . l m ,wll n,i,,i jmmßmmwm^m^mmmm

f(xk)-f(/)= E i^ry^x,)*
ü v y
od^-/«)
icB0 i -k

= tr^odj^-xjl) . (U.18)

If strict ccmplementarity does not hold then for at least one ieB»

r
k
11m —T—v = 0 .
k-»09 g ?5?
g (
^ i 5c' ''si(i "iht'Ü'*0'"5i'!5 I" iS 0
0lat-i ID at most, It
follows that r, = o(|jx -x ||) and hence, from (^.18), that

f(x ) - f(x ) = o(j|xk-x !|) . Thus the SO condition does not hold.

If strict complementarity does hold then asymptotically for small r

, f •>-,> - v > o ', ieB-0 .


*,i
||7g(x')l| ||x(r)-x| ^ i^xMT
- R.fxfrl) - uii

Thus r = 0(l|x(r) -x jj) . As j(x ) is nonsingular {h.l6) holds and this

implies (as r = 0(||x(r) -x ||)) that

||x(r) -x*l| < Kr + 0(r2)

for some K > 0 . This shows that r = S0(||x(r) -x ||) so that, by (U.lB);

the strict order condition is satisfied. D

Remark; The above argument shows that if strict complementarity does

not hold then the strict order condition cannot. The condition that

J(x ) be nonsingular is required only for the second part of the theorem.

98

MaMtaMlHMh^na.-Mi.,. ,.,.....;....^.- ...-. ..,■.. ,.t]]l , ^


»■-jtn,ra?.--ri»j^«T.-M^-.«tT4L-,.'-...^.v-:'i., .... ^;V_:-,'..^.,. .. „«.j-...^ w»,^.rf.,™;...' ^-.^J,._i-,-.-»--^.yy-i^t
«KPWIMJÄSWliHWPW'BW^^WWWiPIWWPPBiP^^lIPWV

Example; (i) minimize x^Xg


o
subject to -x:+ x2 > 0 , x1 > 0

Figure ^.1

From Figure U.l it is clear that the mlnimm is f = 0 at x1 = x2 = 0 ,

and that strict complementarity holds.

(a) inverse barrier function

= 0 = l+r

lr- = 0 = 1+r \i*2-*l)

99

HM^MMtftMMMMMHMUMI ajaaiiaij»J»..^.a-v.^ji.-,,.^.^-.,>J.....:..-,,
i mi. riiuM». HIP,.PWII».-P.Umm-"
i
"- ! mm> WWPWWPPWWW [■wiiiii>iiiu,i.iiiiij.s«mpii ii IILHIIIHIV in ma ii ii ' ■ ■"

This gives a pair of equations for x aiid x^ as functiom


of v . We have

x1 = r^-r+OCr5/2) ,

x2 = r1/2+r+0(r5/2) .

(b) log barrier function

T = x1+ x2 - r{log (x2 - x^ + log x1}

-2x1 1
= 0 = l-r< 2 +
x,
dx1
x2 - x;L 1

Solving for x and x as functions of r gives

x = r -2r2+0(r5) ,
7
2 '
x = r + r +0(r^) .

(ii) minimize x.

0
subject to x? - x1 > 0 , ^ > •

In this case the minimurii is again f = 0 at x.. = Xp = 0 . However,


T T
7f(0) = Cp is orthogonal to 7g2(0) = e, . Thus, as both constraints

are active at zero, strict complementarity does not hold. Note that

the constraint gp = x, > 0 is redundant, and that the barrier function


trajectory is tangential to the constraint surface g, = 0 . Note also

that the rate of convergence is reduced, and that r, —^ does not


k
ög2
tend to oo for the constraint with the zero multiplier.

100
^-yca- *' iM.'.j..i wi-ugy 11 tuyi irrr-rtr* ^aw»ww*wpyw?Bwrww ■pnyrygffwiicff^^g^p'w vv1'-' >■ 9w*&*%mgmmiQm

mmtm^mmimmmmiaimmmmmm I^W»»MWi>B»«»?^w»>WWg,».,^r-'IWW'l!M^

(a) inverse barrier function.

1
1^-4) 4;
ar = 0 = l+r

whence

x^Cr/^ , u,-! , r^-^


ög1 r

x2 = 3r/2 , u2 = (Ur)1/5 , r ^| = U .

(b) log barrier function.

T = x2 - r[log(x2 - x^ + log x1}

0 =-. -r

1
^-
ar 0 == 1- JlX2-Xl
whence

Xl = (r/2)V* , u1 = x , r4 =i
äg

= 3r/2 , u . (ar)1/2 , r ^f = a
a
4

101

l^HaMMaM MaWiiwMuttitaAi
|
w^m^^mmm^^K^mmmfwm^m^mn^timKf^ VWI ""'^M HBM in L u. HHIWL I I im^.i

(iii) minimize -x.

subject to (1-xJ -Xg^O, Xg > 0 .

This is the example used in MP Section 5 (see Figure 5.1) • The

optimum is f = -1 at x. = 1 , x = 0 . The Kuhn Tucker conditions

do not hold at this point.

(a) inverse barrier function

T = -x1 X
^(l-X^-Xg 2j

3(1-x1)2 "1
= 0 = -l+r<
5x7

ÖT / 1 ll
^" ^^d-x/.x/'x^y
= 0

whence

X2 , 2V2 ^A r5A .

In this case u1(r) = u2(r) = ^ JTC

(b) log barrier function.

T = ^ - rflog( (1 - x^^ - x2) + log x23

STi
D 0
= 0 =
1 J'Hl'xl)2 \
"1'r\(i-x1)5.x2/

102
LU, I 1..1111111I-1 I>J j ■■»II> up i^ i,^. .m.Kini i. uu,.n .j. ..n ii i ii i... _., ,

""U1-^-^*"2)
vnence

x1 = 1 - 6r ,

x2 = lOSr5 .

In this case u, (r) =u„(r) =- ^


108r

The above examples confirm the predicticns of our analysis, and for

a given fixed sequence of r. values effective convergence is attained

more rapidly (i.e., for earlier members of the sequence) with the log

barrier function.

Now let 0 be a barrier function. Then

^ = log(a + 0) , a >l {h.19)

is a barrier function. Let x, minimize f+r^, x^ minimize

f+rji, . Then comparing corresponding Lagrange multiplier estimates

gives

-i > r
k-0

whence

^(^-C^^O as r^ -* 0

Essentially this says that g^^) - 0 more rapidly than g1(xk) » so


that a faster rate of convergence is anticipated for the 0. barrier

function.

103
,■■1,1 illlinill .JIJI.^L.li^ ^^JW,,B1.,.l.llw..>,mp^w^j|| mppMp^Mp m ). .■M.|.i.i..i ii. ,1- ii .,.>. J ■im-im I. i m »i.»....^^,,,, II,I IIJI .ij.i.n.mup,,^.^ „„.y, „, mi, ,v,mi„^„^mv^ri
VK

Consider now the sequence of barrier functions defined recursively

by

0<(1) =
^ =log(k -log(g (x)) ,

^(^ =log(a + 0(i-1)) , i = 2,3,..., a>l.

l(i)
E 0^ • (Iu20)
J=l

In this case the error estimate is

t ^•i) i r
k
i-1
-r i
ftx^-fCx*) = -r

(1+.21)

The right hand side of (^.21) tends to zero as i -• • , and this suggests

that increasingly rapid rates of convergence can be obtained by using

barrier functions associated with large values of i .

However, an even more interesting result is possible. This shows

that in certain circumstances it is possible to choose a barrier function

having the property that the solution to the ICP is approximated

arbitrarily closely by the result of a single unconstrained minimization,

without requiring r to be taken arbitrarily small. Let

T(i)(x,r) =f(x) + r f 0(.i)(g1(x)) , and

m
Q(x,\) = f(x) + J] \ v(k -log(g (x))) .
d J
J=l

lOU

~-~—— ■ 1niri.'rn#iiamyiifci^ftih» ■ -^ L.--......^- ..:.,■■. ...._...,..■.-; -.J


in.. L .11 i.i inji 11 u.nw i.i i wmmm f?mm wimm**m mmm mm. gwywiwiiiiB mammm i' )wuij^ujiiJigw^wfpwg|iiMl'wiJMWiiw^Mam

Theorem ^.2; Let Q(x,X) have a unique stationary value (necessarily

a miniraum) in S0 for each \ >0 , and let x^ ' minimize T^ '(x,r)

for i = 1,2,... «md fixed r . Then the limit points of {x^ '}

are local minima of the TCP.

Remark; Note that r does not have to be small in this result.

Proof: If x^ minimizes T^ '(x,r) then

1 1
7f(x^)-r £ i±K ^ m " m W^i^

* 0 , (^.22)

and this expression has the form

VQ(x(i),\(i)) = 0 (1^.23)

where M1' has the numerical value

, .x i-1 ,
\\J = —rn—
(i)
TT —-T7\ nrr ' j=i,2,...,m .
kj -10g(gj(x )) S=2 (T + ^Cg^X^O)
(1^.21+)

Thus the {x^1'} also correspond to a sequence minimizing Q(x,\^ ') by

the assumed uniqueness of these stationary values. Now, as a > 1 ,

fiv' > 0 , s = 1,2, ...,i-l , K\ ' can be made arbitrarily small for

each j by choosing i large enough. The desired result is thus a

consequence of the remark following Theorem 1.2. G

105


ipipiimPMpnii^viiMPvpnninwiiivR

—4_

Remark; The conditions of the theorem are satisfied if f(x) convex,

gi(x) , i = 1,2, ...,m concave, and strict convexity / concavity holds

for at least one of these functions.

In what follows it is convenient to use the superscript i to

indicate the appropriate member of the log barrier function sequence

(U.20).

Lemma U.3:

(^.25)

where

U-.^+
■j "j g ,^<,^
~j ögj 'i
. (1+.26)

and

(i) -ii ,
py gj -* o . (^.27)

Proof: Let \(0) _ -log g then

^_0) (0) 1_ (U.28)


^

so that p^ ' = 1 . Now, differentiating the relation

(U.29)
^

gives

106

ii'iBiitiiririMiiiBMiiaiilfalttlit'tiMiiifiiiifiiiiiii iiii i-iiiini ...


Bww7'WWWWBWffTipWWiW*1l'r ".•w y ""rvr'-v-^rji? v •»T""1",""-"l,l.,V

MMMHi

so that

a2^i+1) /^j ^ ,^i


(i l)
+ ^
^(i) +
/^(i)

06^
"Si ^ ag

^ + g

■■*(• )

This demonstrates (^.25) and (^.26), {k.2rf) follows on noting that

pj0) = 1 , and that, from (^.29),

(i)
g ^G - 0 , g. -0 , i = 1,2,... . G
d «gj 0

A consequence of this lemma is that, provided J(x ) nonsingular,

then D^V1^ is nonsingular for x(r) sufficiently close to x .

Lemma h.k: Let J(x ) be nonsingular, and I1 = B0 , then the SO

condition is satisfied.

Proof: We have fron (I+.29) that


,2
t$U
w (i) _
(i)

J ög
6
'1 'J
<(i)
(1^.31)

1
so that w^. ^ -«0 as r -«0 for jeB^
0 . Now
0
0
bi~t 1 1
[D^V )^)]-
(i)
\ 'I •

. m 'Sn .

+ smaller terms

107

HBM|ttBBiM|MBMMjM|iBMgB ; ,J..ri. ^ :^ ^ -._


W^äU
■ MI.II Jll IIJII«. U«!«!!«! II M mM'.m |)WII.WH|Wffi)lJ,lli|MHJlWp/itl»|[»llll!l<ll|lBIIW»IW;ilWl|llTMl»WWi'iIW;'lll"l'Wy^

where P^1^
is diagonal, and P^' = l/pj1^ - 1 , r
•JJ --"J -' k^0'
j = 1,2,.,,tm . This result implies that for k large enough

1|
Vü*II ^ K^ u k
JeB, j «j^ •

The SO condition is an immediate consequence of this inequality, D

Remark: If B0 c I, then w^ ' need not tend to zero for j/B0 .

Thus eventually the largest components of >r ' will be those associated

with the inactive constraints. This implies that ||x. -x j| = 0(r, ) .

But g. , jcB- , is o(r,) which suggests that, in general, the SO


,1 u ic
condition does not apply. This case should be contrasted with the log

and inverse cases where the contributions of the inactive constraints

do not dominate in w (in the inverse case the active constraints

dominate). We note that the SO condition is only sufficient for (^.l)

to provide an error estimate and numerical experience indicates that it

is applicable in the calculations with the log sequence. However, the

above discussion suggests that to attain the maximum rate of convergence

with the members of the log sequence, the inactive constraints should be

identified and dir-carded. A possible way to do this automatically is by

the use of a separable barrier objective function


m
=f +r u (U.52)
^*'h} W k ^ i ^gi^

k-1 k-1
where p = r, u , r, is the usual barrier parameter, and u is

the multiplier estimate obtained from the previous minimization. This

objective function has the property of forcing the multiplier estimates

108
J i

-■-"'•■ gjtiMKüiiatai - -■-■■ ■ -■ '——' »a -■.-.■-■


for the inactive constraints to zero at a very fast rate. We have

k+1
12? J :
i ~r*k+l"i
'vlUi ^-(W
c^g
i

s^As-^rft^i <^
where p. is a bound for 4^- (x.) , k = 1,2, ... .
JU

This choice can also be favorable in the case of nonstrict comple-

mentarity. Consider the previous example


2 x
min x subject to x -x > 0 , -i> 0 .

Set Q = x2-r2(log(x2-x1)+uk_1 log x^^} . Then 7Q = 0 gives

X
2-Xl=rk ' ^l^kVl '

so that

Vk,! = 2l/2 1/2 1/2 .


k x1 k k-1

-k , / = cc^k reduces this to


Setting r, = Q; u /2

0 1 c k
ß ß
k - 2 k-l " 2

The solution to this difference equation satisfying the initial

condition ßn = 0 is

ßk = -k + (l-(|)k) .

From this it follows that u = 0(r.) , and hence that x^. = 0(r ) .

Thus, for this example, we are able to obtain results as favorable as

those in which strict complementarity holds.

Example. Show that the error estimate (U.l) is valid in this case.

L« 109
wmmwM'-i'iJWJiiiiiuii^niPiiiMiiiiiiiiimniKmnniiiiinixuM ni.MOiwiP

There is a penalty to pay for the generality of the barrier


J
function algorithms, and this is a significant burden of calculation

associated with each of the successive unconstrained minimizations.

This can be explained (at least in part) by looking at the Hessian of

the barrier objective function. Experience (in part supported by

theoretical results) indicates that the condition number of the Hessian

ia a good indicator of the degree of difficulty of an unconstrained


•"I ■

optimization problem when it is solved by descent methods. 3 i

On the assumptions that the second order sufficiency conditions

hold at x , and that the active constraint gradients are linearly

independent, then it is possible to deduce fairly complete information

on the eigenvalues and eigenvectors of 7 T(x.,r.) from (5«8).

(i) There are n-t eigenvectors associated with eigenvalues of


Ä
7 T(x ,r,) that are 0(1) as r, -♦ 0 . The smallest eigenvalue tends

to

v 7 X(x ,u )v #
m = rain 1—a—- , Vv such that 7g.(x )v = 0 , Vi e B. .
V V V ~ ^ ~

(ii) There are t eigenvectors associated with eigenvalues of


p >»
7 T(x,, r,) which tend to • as r, -• 0 . These eigenvectors are

asymptotic to vectors of the form C1(x )y. where y. are eigenvectors

of the problem

[C1(/)TC1(x*) - ^Al^ = 0

p / O
where A is a diagonal matrix. A.. =—^ / max —5 , i = 1, ...,t
11
Ög2 /i<j<t Ög2 J

110

tm jjgmjn ■ M.^.^:^..^,tikJ,i--^...UTAn;|[rWt|1||..|.[.|riini<|1|.[.|n..|i...|...[||<|.. ..
w~-
•V«-.-.- ST^HmHWIPIWWl fWHWHWWWIWW "'"' ■«^»■iiim.w,i;iii»iHi,.iiiw.i>i«ii,i»|).,ii>iMiiHui,iiiiNijiiiitii,iiti,.niii.i.iw. P1I!I«!"!','»."I1I,I'|].IW»1IH"I,1M1K3:

The corresponding eigenvalues tend to • like u.r. max —^


l<J<t ög^
u
-- that is, like p..1 —7—r
s ; where a is the maximizing index.

This shows that the condition number of vT^x-^r. ) tends to •

like l/sa(xk) or like l/jk -x H , if the SO condition is satisfied.

In this latter case we have shown that our measure of the cost of a

barrier function calculation depends in the main on the accuracy desired

rather than on the choice of barrier functions. However, our estimates

for the log family indicate that these will be somewhat more expensive

than the above estimate except when all constraints are active.

Note that the device introduced to force more effective elimination

of the inactive constraints does not force the Hessian to be worse

conditioned in the case that strict complementarity does not obtain, at

least in the examples that have been worked out. The use of this

device would appear to be an important improvanent in barrier function

algorithms.

5. Analysis of penalty function methods.

Consider now the equality constrained problem (BCP)

min f(x) , S = [x ;h.(x) = 0, i = 1,2, ...,q (ielj} . (3.1)


xeS ~ ~ 1 ~ ^

It is assumed that S is nonempty, and that (3.1) has a bounded minimum

(say f ).

Ill

^ VlMMBMiMMiltofrM^' I MM I -■•■"■'■ -- Itm IM ■ 1 ■■■■-->■"" I -'■--


^<iipi.,l>Hi.|i.lww^i..i,j,<TlH;.>..f.UM,ilii,ji,1jliH1 II^,^^,!!^ I^HIIII!1III.HI|.II<1P.WI,...WMHIP;I»H^|JI iplin.ij|11.»nt|)T.ii..»IWiplim)inii»ini«w-A.ilmw

WIIBKttB0KKKKIttmmmmmmmmmmmmmmmmmMmm-mm,'m*.m*m,mmmi*mm*mmmmmmm™ ii ...'...I, iii ;i.,..

Remark; The inequality constraint g(x) > 0 can be written as the

equality constraint „J

h(x) = min (0,g(x)) = 0 (5.2)

so that formally the ICP is a special case of an BCP. However h(x)

given by (5»2) can have discontinuous first derivatives.

Definition; F(x,\) is a penalty objective function if

F(x,\) = f(x) + \ V fih^x)) (5.5)

where i|r(h) is a monotonic increasing function of |h | , and \|r(0) = 0 .

Example; Let \|f(h) = |h| then \|f is a penalty function if cc > -1 ,

If g(x) is concave then, from (5.2), so is h(x) , and \|r(h) is

convex provided a > 0 . If a < 0 then TJ is unbounded as h - 0 .

Theorem 5.1; Let fX.} t» , and x minimize F(x,\ ) . Then

{f(x )} nondecreasing, {F(x ,\ )} strictly increasing unless x eS ,

and
(iw} nonincreasing. If {x } -• x then x solves

the BCP.

Proof; Let \ < \ . Then, provided x ,x„ ^S ,


r s ~r „.s

F X F X F
F(V^)
jr r < ( „s' r' < (v~s'
S^r) a^Ss') < (V\)
~r' s'

Thus (compare Lemma 1.2) the results for the sequences follow us before.

We have
min F(x,\) < min F(x,\) = min f(x) = f . (5.1+)
x ~ xcS ~ xeS

112

■ ' amamjay^aaaAl..^; .1.-. ■ ■"^-"■■'■'■■■^■^ui^m^a' u.--.:..-. . ...n^ ■...^.^i■.„■.-.";:. ■.-^.^. .......,..^..i:.j


pi^iu,iiii...imwip!»»wwpn"ii)|i|i|i|"ii.i"i. .iinifc.i ii [•nH<M"mmvm*»' HIIIU.HJIIII m ^•»■.■"<- iiiiii«jiimKii»ini!ui.^Bi»if"'«'»"'.i)ii-iwi„'»!"M«''".nM'.i 11,1.1..11 imnii

Thus the F(x ,\ ) are bcoonded, and hence x eS . Now

x cS =» f(x ) > f ,

but, by (5.1+),

f(xr) < F(xrAr) < f

so that

lim f(x ) < f .

Thus f (x ) = f , and x solves the BCP. D

Remark; In the more general case in which fx ) is bounded it follows,

by restricting attention to convergent subsequences, that all limit

points of {x } solve the ECP.

Theorem 5.2; Let fx } -» x and assume y{h.) continuously differentlable,

and 7h.(x ) , iel0 , linearly independent. Define u by

u
i " ' Xr ^J sgn(h )
i ' ±
= 1'2>"'><i. (5-5)

then {u } - u , the vector of Lagrange multipliers for the ECP.

T
Proof: Define the matrix B(x ) by /c.(B(x)) =7h. (X) ,

i =1,2, ...,q . The condition that x minimize F(x,\ ) gives

0 = 7F(xr,\r) = 7f(xr) +\r V ^y sgn(hi)7hi(xr) (5.6)

so that

7f(xr)T = B{XT)^ . (5.7)

115

^-J..-,__. -.^..■..^^x.;.lM..-^»-.... ■..., ■-.■..■.. .x. . .L_ ., . , l , -...■> jy^


-' -■■'■ ■-■• I
W^f^ ■^t^^wwwMim,.:" luii^ww^^tffwwwwK^m^'y ^w|-|""w/l».iyw!"i|i" ■.■iiit.i.«i"iiiwnMitipii,iiiii>i»i«.i.iw,^n.t w!wwiiWffppwpi»PW»pMPHpppiPwiw pmwi up; WBWBWmBPW

Now B(x ) has full rank for ||x -x || small enotigh. Thus

•«.i
ur = (B(>:r)T B(xr))-1 B(xr)T 7f(xr)T

- (B(x*)T BCx*))"1 E(X*)


T
7f(x*)T = u* . D (5.8)

Remark: (i) If strict complementarity holds so that |u. | > 0 ,

i = 1,2, ...,q , then the convergence of the Lagrange multiplier estimates

implies (from (5'5)) that sgn(h.) is constant for i large enough as


v V >0 , xeS . Thus the minimizing sequence approaches S 'from one

side1. In this sense S acts like a hairier.

(ii) Note that h.(x) = min(0,g. (x)) = 0 identically in a neighborhood

of x if g. (x ) > 0 . Thus 7h (x ) = 0 so that, in this trivial

sense, the constraint gradients are not linearly independent. However,

if strict ccmplementarity holds, then a multiplier result can be proved

for the active constraints (do this I). In fact, the strict complementarity

restriction can be relaxed somewhat.

Theorem ^.3; If the conditions of Theorem 5'2 hold, and, in addition.

{u } -u , and \ djt2
_ • , i = 1,2,.. .,q , as r -♦ « , then the
~r ~ r
dh^

second order sufficiency conditions hold at x if and only if

7T^x ,\ ) is positive definite for r sufficiently large.

Proof; This is essentially the same as that of Lemmas 5.1 and 3.2. D

Example; Derive the analogues of Lemmas 5.1 and 5*5 which apply when

the BCP is obtained by transforming an ICP by means of (5'2).

Ilk

■wail« iinrn i i II """"""""^iliai iHrtMitlflTiiiiirrHiii II I iiiiiiiiK'"'*"--'—-■«H'nin


^^^*^*ll^——«————— ■"■ ■ i»———«— ———■ mmmmmmmmmmmmmmMHBMttKi

Remark: The condition that \ —^ - o» is related to strict


dhi

ccmplementarity. Consider \|f = |h. | , a > 0 . Then

(5.9)

so that

xrr4 = (l-^alh.r1 «Ki (5.10)


dh^

d2
Thus X ^-| - • , h. - 0 if |u. I > 0 . Strict cctrplonentarity

is of particular importance for equality constraints derived from


inequality constraints by (5.2). In this case, the one sided convergence
implied by the multiplier relations is needed if we are to be able to
talk about second derivatives at all.
The parallel development of the treatment of the BCP by penalty
function methods and the treatment of the TCP by barrier functions can
be completed by discussing convergence rates of penalty function algorithms
in much the same way as we treated the barrier function case. For example,
multiplying (5.6) by x -x gives

f (x ) - f (xr) = V uj h.y + 0(|lxr - x*n2*


If) . (5.11)
i=i

The assumption that the SO condition is satisfied can now be used to

provide estimates. From (5.9)

(l + ^^lh.fa sgn(hi) = - u*

so that

115

MlliMMKiailM-l I itMMdawMMltt^iüi^^,-^...^..--,.^^ ...., ,. | ..


(5.12)

This sviggests a rate of convergence of O^j—) ' ) , which contrasts


r
favorably with the estimates obtained for the barrier function algorithms.

In particular as a-0, (5.12) suggests that the convergence rate

becomes arbitrarily great. However, the results of the previous section

also indicate that the condition number of the Hessian will become

arbitrarily large as a -* 0 . The next result provides information on

the limiting case a = 0 .

Theorem 5.h: In the IGP let f(x) be convex, and g.(x) , iel ,

concave. Let w be an infeasible point, x an interior point of S ,


# ■*
f x
a = rain g.(x ) , b = ( 0) - f(x ) , and \Q = (b+l)/a . Then x
lcI
l

minimizes

F(x,\) = f(x) -\ f min(0,gi(x)) (5.15)


i^l

provided ^ > \
Ö *

Remeürk; It is necessary to demonstrate the result only for X. = \0 .

For all larger \ it is then a consequence of Theorem 5.1.

Proof; Let v be the boundary point of S on the join of w and x ,

and B be the index set of constraints active at v . Define

s(x) = f(x) -\ Z «iW > (5.1M


ieB

then

116

tmmm M^MM,
-V»•.'■-,■ *■ i,im«v>-B ■",

=f(
^ X ^-o*
^-^o icB
< f(x0) - (b*-!)

= f(x*)-l

< f(v) = s(v) = F(v,X0) (5.15)

As s(x) is convex and v is on the join of x and w , 3 9,

0 < 9 < 1 , such that

s(v) = ös(x0) + (l-ö)s(w)

< 9s(v) + (l-©)s(w)

whence

E(V) < s(w) . (5.16)

Now s(w) <F(w,\ ) so that, from (5.15) and (5.16)

F(v,\0) < F(w>\0) •

Thus min F(x,\ ) must be attained at a feasible point. D


x

6. Accelerated penalty and barrier methods.

The problems of poor conditioning of the ccnrputational problem and

(comparatively) slow convergence make it worthwhile to search for methods

for accelerating the convergence of the penalty / barrier function


algorithms. Consider the (generalized) penalty objective function

»(x,W,T!) = f(x)+ f W^Oi^x) + T^) (6.1)

117

. ■- ■■ ..„.L^^.—^ . - .
- flU
fg^H^^J.u.-i Hii^Hiii^iw.Mm i ,..J., . '^■^.-'■■'Jiv.iii.i-yviuiiMi^ -A™™, h^n *'-vi' n gn^—w ^■^r-i« P^...I+...,I.—

where W is the diagonal matrix of penalty parameters, and the T|.

are further parameters to be used in the acceleration process. u


At a minimum of P , x(W,1)) satisfies

q
7f(x) - f ui(W,1))7hi(x) = 0 (6.2)

W X
where ui(W,Tl) = ~ ii ^" * Provided ||x(W,T0 - *II «"^

||u(W,t)) -u || are sufficiently small, the second order sufficiency

conditions hold at x , and Vh.Cx ) , iclp , are linearly independent,

then (by Theorem 5.5) x also solves the EC?

min f(x) , SL = {x;h (x) = h (x(W,Tl)) > i^Ip} •

/ x
One sequential strategy for making x(W,Ty -♦ x* is to force
M* «W **0 . i

\ = min W . t • . However, the parameter vector T\ is also available,


11
i
and we ask is it possible to adjust it to make

h^xfoTO) = 0 , irl2 . (6.5)

v c*x.
Let irr- be the matrix with components -rzr- , 1 = 1, ...,n ,

j = 1,2, ...,q . If v P(x(W,T\),W,iri) is nonsingular, then, by the

implicit function theorem, we can solve (6.2) for x = x(T|) holding

W fixed. We have

72 P "ri - -^n^i< ^
where all quantities are evaluated at x(W,Tl) • Defining the diagonal

118
|jpw^iy"J:WTF.ia!».»i<tiuiJ.iiiiiipvM,Mi.ittiitijl|).|jii..^, .IJI^I. urownpimnuBpipim J-*~'9J',i''>-^r-.-"rr——- &r^'rj"Wlfinv*'r**^ft7?*ryr^nr?i<y*ivnr~

2 2
Vv^ Ww^ W i
matrix V by ii == ii H = ii ^ '
^ = *ii Sh^ = 1'2—^ ' ^ the
matrix B by K.(B) = 7h. , i = 1,2, ...,q , then (6A) can be written

(72X+BVBT) ^ = - BV (6.5)

2 T
Choose V- to make U = 7 X+BVQB positive definite, and set
v = v v Then, by if min v is
i " o * ^"^» ii s11^101611*^ large

^ ~ -U'1B(BTU"1B)"1+0(V"1) . (6.6)

This relation can be justified if

(i) the second order sufficiency conditions hold at x and 7hi(x) ,

irlp , are linearly independent,

(ii) ||x-x || and l|u-u || are sufficiently small,

(iii) rain V.. -• as min W.. - • for I) = 9 , and


'ii ii

(iv) min W.. sufficiently large,


11
i

Consider now the use of Newton's method for solving (6.5). This

suggests that a correction 6^ to T\ be found by solving

-u <& R« TJT öx t_ .
(6.7)

But, by (6.6),

,T äx
B - I + 0(V'1) (6.8)

so that

6T| = h+0(V ) . (6.9)

119

"—"--—"■-^■■■''-n II --■ ■
wmMmmmmimmmmmw-*""* mm '" ' ' ■' '" WPilW ^.'HI'IIIMWHIWHIIHIHI« iiiiiiiiiiii;iB,iij!iii .iiii|iijiipiw.iiiujipy]|ii].ij|iiii,ii<<iiiii.iiiiiii., inii)ni»g.mi|iiM

Thus we expect the sinrple correction Bl) = h to approximate arbitrarily

closely to a second order process provided V is sufficiently large.

Algorithm (BCP)
1
(i) Initialize T/ ^ , W^ .

(ii) Minimize P(x,W^kSTl^kO to determine x, , u, .

(iii) IF £ ui N^) < r0L


' THEN STOP.
i=l

(iv) FOR 1=1 STEP 1 UNTIL Q DO

IF A^C^Cx,)) <DBCR*ABS(h.(xk_1))

THEN 71? ^ = "Hi + hi^)

ELSE w!^ = EXP*wf^


ii ii

(v) K := K+l.
i

(vi) GO TO (ii). -,

Remark; The idea behind the algorithm is that the correcticn (6.9)

is used whenever the convergence of h. to zero is satisfactory. ^

Otherwise it is assumed that W.. is too small and it is increased

accordingly. Tj. is modified at the same time to ensure that


k *■ / ä\lr v -1 öilf j
u. -.u. as h. -.0 . (rf) '" indicates the inverse function to -g- .

For the ICP we consider the modified barrier function

R(x,W(k),T)(k)) = f(x) + f W^ uj"1 ^(g.Cx) + if) (6.10)

120

——^-..-^.«-^.^.^ . , ™-»-;-,,.;,.^:^;a.,;^^.,.: ^ ■ -Ja=^;-ia-»'■''•—"-—.-r»-^:^-^-;:—.».IV-.^.:::.i-.^,;.^:LI


»WlW^^I^P^^iiaiWMlWiiwiUiliupiUiipiiiiil^ »ii>piiilwi,iii.i,l„il,liHl,lw.)ii,iiliiliil,minM,..,

■ mmmmm*mmmmmmmmmmmmmmmtim*m - 11 IMII »«■»WIHIIIIIHIIIIIIJIWWWWMWWi

where u. ^ is the vector of multiplier estimates fron the previous


(Xr)
minimization, and VT ' is now the diagonal matrix of barrier parameters.

We note that, in the particular case in which all constrcints are

active, the previous analysis is applicable, at least formally, and

suggests a correction

5^) = ^(k) + g( ) (6#11)

with order of magnitude departure from a second order iteration of


1
((T^SJ )-
Off min W;.' u.' —5 ) \ . However, we require automatic selection

of the active constraints if we are to make use of this result, and it

is important to note that this is provided naturally in the algorithm

by the options
=
(i) if g. -• 0 at a satisfactory rate then 1\. \+&± * and

(ii) if the convergence rate is too slow, then decrease the barrier

parameter.

This second option can be expected to apply to the inactive constraints,

and will drive the contribution to (6.14-) from this source rapidly to

zero by (U.55). Note that the boundedness of the barrier terms requires

that gi+'ni fee


positive. If ^ ' is set to zero then (6.11) ensures

that this condition will be met initially. Provided strict complementarity

holds, the convergence of the multiplier estimates will ensure that it

must hold ultimately. Of course, the calculation must be started from

a feasible point.

121

.i..J-^...,-.i],J..Mj.',i,..-.,.l„...., .-, »^ .■■.■■ ■miiW||,.-||l|1,„| I, ^...J.,,„......„...,....,,. ..,LJ ,.


■ ""■I ■,■*, ■ip, nnmiij-Hiiiniw.ijji .„nun,.ii^ni IIIHHHJ,,,,,!!!,,,!! .„,,,11 mi I|l| |U ,1 liniiinmw

Algorithm (ICP)

(i) Initialize TV1' , W^1' , u« .


"0

(ii) Minimize R(x,W^k^'n^k^) to determine x, , u.


m
(iii) IF £ uk g^XjP < TOL THEN STOP.

(iv) FOR I = 1 STEP 1 UNTIL M DO

IF ABStg.^)) <DECR*ABS(g1(xk_1))

THEN
«if1' ■ -/^ >

ELSE V)^1' = DBCR^W^. ,


ii ii '

(k+l)

(ßiTi'^i
(v) K := K+l.

(vi) GO TO (ii)

Remark; As in the previous algorithm (^)" denotes the inverse


og
function to *t- . For example, if 0 = - log g then f] = W .
og
Consider now another modified penalty function for the ECP

S(x) = f(x) -u(x)Th(x) +h(x)TWh(x) (6.12)

where the matrix W is positive definite.

Lemma 6.1: If the second order sufficiency conditions hold for x = x ,

and the vh.(x ) , iel , are linearly independent then S(x) has a

122

-■i---:-;-"'----"--■■„- drnn-ir.. iaC^l-..^U-^L-JL-^^..,.U^..-^A.1 ^^rf-lr^ ^Jb^ . :■.■, .--^.r ■-■.. ' ^..-^ ,- ■■—^.. - , .„,, .t J.,... rJ.L'. i^
WW^^W^'W "MWMMIN llil '.' • ■liii.uiili^.lp. m.. ..,-, „..^ I'nm "in min) ji.i ii Mi . III.III|IWII.

local mlniraum at x = x provided u(x) -«u as x -. x and the


** m* ****** m* t*

smallest eigenvalue of W Is large enough.

Proof: We have

7S(x*) = 7f(x*) -u(xY 7h(x*)

h(x'(')T(7u(x*)-2W7h(x*))

7f(x ) -u 7h(x )

= 0

as u is the vector of Lagrange multipliers for the BCP. Thus S

has a stationary point for x = x . Now

7 S(x ) = 7 £(x ,u ) -7u(x )X7h(x )

7h(x*)V(x )

+ 27h(x*)TW7h(x*) (6.15)

where terms which vanish at x have been ignored. Corollary 3.2 can
2 *
now be applied to show 7 S(x ) is positive definite. We set
V = 7h(x*')T , U = 72l(x*,u*) - 7u(x*)T7h(x*) - 7h(x*)T7u(x*) , and note
0**0 MM ** ** ** ** 9* m* r* m*

that

T T 2 , * *\.
min t Ut = min tA 7 £(x ,u )t = m > 0
^=0, Ht|l=i ~ ^=0, Htll=i ~

»ff
as the second order sufficiency conditions hold at x . G

\
125

in Uf^lUll^l^ ,,■,.,.,;:,„.....J.-;;ltt_,n„...^,.
r
i ■

Theorem 6.1; Let the conditions of Lemma 5.1 hold at x and set

u(x) = B(x)+ vf(x)T (6.11+) u

where B(x) = 7h(x) . Then (6.12) has a local minimum at x provided


the smallest eigenvalue of W is large enough.
J
Proof; This result is an immediate consequence of Lemma 6.1. As the
7h.(x ) , id , are linearly independent, B(x ) is a bounded
operator for ||x - x || small enough. Thus u(x) -»u as x -»x . D ü

Remark; (i) By using (6.lU) we can const.ruct a penalty function which


is differentiable in a neighborhood of x (contrast with (5*15)) and
■x-
J
which has a local minimum at x = x for sufficiently large but finite
values of the penalty parameter. However, (6.11+) requires first derivatives
of the problem functions so that minimization of (6.12) with a method that
0
requires first derivatives of S will require second derivatives of
the problem functions. Two cases have been considered (Fletcher).

(1) S(x) = f(x) -h(x)TB(x)+vf(x)T+0||h(x)U2 , and (6.15) 2

(ii) C(x) . f(x)-h(x)Vx)V(x)T+al|(B(x)+)Th(x)|l2 , (6.16)

where a is a penalty parameter. J

(ii) There is a close connection between the penalty function (6.15) and
2
the algorithm based on (6.1) in the case i|f(h) = h . At a minimum of P
we have (as vP -■ 0 )

2W(h+9) = - B+(7f)T . (6.1?)

12l4

i^LaaafliM^fa^^^—^.^,.,...,..-.^■.■■■..: g -iilitu-i ■rrn '■ ■Mr'«!■■■■— ^..;J.-v^.'^.'..-J..:. ■--^.^^■■.^Mv, m^,^ n-y, ■ r n■ i,- ■ ■ ,...: r_'. .._.'.-_^.:.'.„. .v 'u.-..:^-n.:^...^L^
I'M mniiiiiwijpwywwwpwiiiiii i i IIWIHWI)'!.» iww, nmi]§mmwmimmmwm\-jw'mmwmwmw*wnm%w\m<mmMi.mmwtm)f piiiiunnmnn

Thus the correction formula corresponds to updating the Lagrange

multiplier estimate by (6.1U) at the end of each unconstrained minimiza-

tion rather than continuously which the use of S requires.

(iii) Note that S(x) can be interpreted as a Lagrangian. For

example, in the case S(x) is given by (6.16),

S(x) = X(x,w(x)) (6.18)

where

w(x) = B(x)+7f(x)T-oB(x)+(B(x)+)Th(x) (6.19)

Lemma 6.2; w(t) defined by (6.19) is the vector of Lagrange multipliers

for the problem

minimize f(t) + 7f(t)(x-t) + § |lx-tl|2 (6.20)


x ~ ~~~ ^~~

subject to the linear constraints

h(t)+7h(t)(x-t) = 0 , (6.21)

provided this minimum exists.

Proof; Any point satisfying the constraints (6.21) has the form

x = t-(B(t)T)+h(t)+A(t)z (6.22)

where B(t)aA(t) = 0 . The multiplier relation for (6.20), (6.21) is

7f (t) + c(x-t)T = uTB(t)T (6.23)

so that u can be taken as (substituting (6.22) into (6.25))

u = B(t)+{7f(t)T-ff(B(t)T)+h(t)} . D
I I

125

^j^g^^gjlg^gg^l .,.,%.,..„,;A.Jf..., ^.^^.^ .J,r^J-,..


WPWWIIIPPPpillWWPIipBPWWlWWWil^^

If (6,22) is substituted into (6,20) the problem becomes one of

minijnizing w.r.t. z
u
a T T
7f Az +| z A A z

whence
Ü
z = - i (ATA)"1AT7fr . (6.2U)

Thus a plays a role in ensuring that x(t) , the minimum of (6,20),

cannot deviate far from t (cf. remark (ii) following MP Corollary 5«l)» *^ |

Example: (i) The Lagrangian interpretation provides a method for

generalizing the above discussion to inequality constraints. Consider

the problan min £(x,w(x)) where w(t) is the vector of multipliers for
x ~

the problem
a
2
min f(t) +Vf(t)(x-t) + | ||x-t||
x '

subject to g(t)+ 7g(t)(x-t) >0 .

Under what conditions does £ have an unconstrained minimum at x .

What role does strict complementarity play in this problem?

(ii) (6.12) can be generalized to other penalty functions and to a


barrier functions (cf. Remark (ii) above). How much of the above

analysis goes through? What modifications are required? Evaluate the

resulting algorithms.

126

.^tJ«aiLlJ^^*«L*J^^.^iJi^^^^ ....■.■■ ^.i;...^. .■^^^.,.„^..;t.v.,J....a„,.-l.-^ ■■ ..■-^■■...,^. ■_„■..v^.^.^Jf.^l....^^^., ■^.:..1,..^..,-l,.:


PWP^WW^^W^^^W^^^^ww^ww^^^^^^^w^wwIIWUIII i». iiiiBii.wiimwi.uii.i.iw.my u m, .miw,» i m.i ■■i.i.nwi,.. , „I

Notes

1., 2. See Fiacco and McCormlck's book. Also the paper 'Penalty

function methods for mathematical programming problems';

J. Math. Anal, and Applic. (1970), by Osbome and Ryan.

3. Fiacco and McCormick were the first to draw attention to the

importance of these (as they were to much of the material in this

section).

U. The log family is due to Osbome and Ryan. The importance of the

conditioning of the Hessian to Walter Murray. Rate of convergence

formulae have also been developed by F. A. Lootsma, (Thesis, also

survey paper at Dundee conference).

5. Fiacco and McCormick. The exact penalty function is due to

Zangwill.
2
6. The algorithm for the BCP is due to Powell in the case | = h

(Harwell report, also Procedings of Keele Conference). The exact

penalty function S(x) is due to Fletcher who has developed it

together with his student Shirley Lill and described it In several

Harwell reports. The extension to inequality constraints (example (i))

is also due to Fletcher.

References

A. V. Fiacco and G. P. McCormick (1968): Nonlinear Programming:

Sequential Unconstrained Minimization Techniques, Wiley.

R. Fletcher and S. A. Lill (1970): A class of methods for non-linear

programming, in Proceedings of Nonlinear Programming Syraposium,

Madison.

127

iiinVfi.ftH'llMrmltiilfliiliiiii.111 ■ 1 I _■..-.,.. „.,, .,..■.„ ^...-.\ .


PmmPül—H—HWWW II Ulli III>»I 11 ii ii m ii i ■i.ii.miiiwiii. ■iniiiiiiinwn wimiiii I.UIIII i I.IIIIPII.HI, wi' WIMIMDI iimm ,HIIIII!I>«III.II..^

F. A. Lootsma (1970): Boundary properties of penalty functions for


constrained minimization, Phillips Res. Repts. Suppl., No. 5.
w

W. Murray (1969): Constrained Optimization, Nat. Bxys. Lab. Rep. Ma79'


M. R. Osborne and D. M. Ryan (1970): On penalty functions for non-
linear programming problems, J. Math. Anal. Appl., 31, pp. 559-578.
U

M. J. Powell (i960): A Method for non-linear constraints in


minimization problems, in Optimization (editor: R. Fletcher),
Academic Press.
.'

,J
I

128

filtttMU
'"-'"-— ■--- ■■-"
■ - ■-■ m mmm MUMM BUI
|pHiifiHiiPt^-^.Ti» i.^rrtwwF'T—■ ■»■■— \mi ■ ,.,mmmnim» trr}*-Tt:r~*.:.,.,l,l,*immr--yi,i>mr-rwwrr>,S. J.I')'-i"iif..M^ P.'.' •wiwvm-iw; T*TTVTWPT^'■"""""TOTTHT-WI-fWITTIWTft.^"^^11JWW '1'" ■ JUJ■■■I■W■ W|'j'IWM,-yWS-^ -^TPW'Tf ff l|■ W I'jW If WVTH'V—^•WWV'ff'TT-T-""T■nnrTt^lHTH—Tti.^

Bibliography

Abadie, J. (editor), (196?) Nonlinear Programming, (1970) Integer


and Nonlinear Progranuning, North Holland.

Ablow, C. M. and Brigham, G. (1955). An analog solution of programming


problems. Op. Res., 3, pp 388-394.

Akaike, H. (1959). On a successive transformation of probability-


distribution and its application to the analysis of the optimal
gradient method. Ann. Inst. Stat. Math., Tokyoy, 11, p 1.

Allran, R. R. and Johnsen, S. E. J. (1970). An algorithm for solving


nonlinear programming problems subject to nonlinear inequality
constraints. Comp. J., 13(2), pp 171-177.

Arrow, K. J. and Hurwicz, L. (1956). Reduction of constrained maxima


to saddle point problems; in "Proceedings of the Third Berkeley
Symposium on Mathematical Statistics and Probability", ed.
J. Neyman. University of California Press, pp 1-20.

Arrow, K. J., Hurwicz, L. and Uzawa, H. (1958). "Studies in Linear


and Nonlinear Programming". Stanford University Press.

Arrow, K. J., Hurwicz, L. and Uzawa, H. (1961). Constraint qualifica-


tions in maximization problems. Nav. Res. Log. Q., 8, pp 175-191.

Bard, Y. (1968). On a numerical instability of Davidon-like methods.


Maths. Comp., 22 (103), pp 665-666.

Bard, Y. (1970). Comparison of gradient methods for the solution of


nonlinear parameter estimation problems. SIAM J. Numer. Anal.,
7 (1), pp 157-186.

Bartels, R. H., Golub, G. H., and Saunders, M. A. (1970), Numerical


techniques in Mathematical Programming, in Nonlinear Prograxmning,
Academic Press.

Bazaraa, M. S. Goode, J. J., Nashed, M. Z., and Shetty, C. M. (1971).


Nonlinear programming without differentiability in Banach spaces:
Necessary and sufficient constraint qualification. Applicable
Analysis, to appear.

129

■üV ]iniiMiiv,r 1 toith^ nrWiaiiMAa^^^^.^,.^^.^.^^^.....-.--... •-•■'-■•■--'nnnri'r'-lBliidi-'iriitl.iifM r^iii


miti^^mmm^mfmßßmimsKiß^vmm wmrmmmmmfmmjmmm - -

Bealo, E. M. L. (1955). On minimizing a convex function subject to


linear inequalities. J. Roy. Stat. Soc, Ser. B, 17 (2), pp
175-184.

Beale, E. M. L. (1959). Numerical methods; in "Nonlinear Prograraning",


ed. J. Adadie. North-Holland Publishing Co., p 150.

Bellmore, M., Greenberg, H. J. and Jarvis, J. J. (1970). Generalized


penalty function concepts in mathematical optimization.
Op. Res., 18 (2), pp 229-252.

Beltrami, E. J. (1969). A constructive proof of the Kuhn-Tucker


multiplier rule. J. Math. Anal. Appl., 26, pp 297-306.

Beltrami, K. J. (1970). "An Algorithmic Approach to Nonlinear


Analysis and Optimization". Academic Press.
U
Beltrami, E. J. and McGill, R. (1966). A class of variational
problems in search theory and the maximum principle. Op. Res.,
Ik (2), pp 267-278.

Box, M. J. (1966). A comparison of several current optimization


methods and the use of transformations in constrained problems.
Comp. J., 9, pp 67-77.

Box, M. J., Davies, D. and Swann, W. H. (1969). "Nonlinear Optimiza-


tion Techniques". ICI Monograph, Oliver and Boyd.

Bracken, J. and McCorraick, G. P. (1968). "Selected Applications of


Nonlinear rrogrammin^;". Wiley.

Hroydon, C. G. (1965). A class of methods for solving nonlinear


simultaneous equations. Math. Comp., 19 (92), pp 577-593.
Broyden, C. G. (1967). Quasi-Newton methods and their application to
function minimization. Math. Comp., 21 (19), PP 368-381.
Broyden, C. G. (1970a). The convergence of a class of double-rank
minimization algorithms; 1. general considerations. J. Inst.
Math. Appl., 6 (1) pp 76-90.
Broyden, C. G. (1970b). The convergence of single-rank quasi-Newton
methods. Math. Comp., 2k (110), pp 365-382.

150

-
^uiT.ww^*>H»i..IiJ,W^Li*wii.i»!JM.^,p*.-^.#rtpwT^-v-WBB-- »iLmi^v^m^w^trvn^ jin.ij»mw,mwyJ.f w- ^mmmmim/WW ^.'l-H -n-uiiwi ■■ —I u^ I .^MIM^W i» M-X^^UJ.M ,m. . n K W'vmiJS.JqFinjy*

Bui Trong Lieu and Hoard, P. (1966). La m^thode des centres dans un
espace topologique. Num^r. Math., 8 (l), pp 56-67.

Butler, T. and Martin, A. V. (1962). On a method of Courant for


minimizing functionals. J. Math. Fhys., kl, pp 291-299.

Canrp. G. D. (1955). Inequality-constrained stationary value problems.


J. Op. Res. Soc. Am., 3, pp 5^8-550.

Carrol, C. W. (1961). The created response surface technique for


optimizing nonlinear restrained systems. Op. Res., 9 (2),
pp 169-184.

Charnes, A. and Lemke, C. E. (19510. Minimization of nonlinear


separable convex functionals. Nav. Res. Log. Q., 1, pp 301-302.

Cheney, E. W. and Goldstein, A. A. (1959). Newton's method for convex


programming and Tchebycheff approximation. Numer. Math., 1,
pp 25^-268.

Colville, A. R. (1968). A comparative study on nonlinear progranming


codes. IBM New York Scientific Centre Tech. Rep. No. 320-2949.

Courant, R. (1943). Variational methods for the solution of problems


of equilibrium and vibrations. Bull. Am. Math. Soc, 49, PP 1-23.

Crockett, J. B. and Chernoff, H. (1955). Gradient methods of maximiza-


tion. Pac. J. Math., 5, PP 33-50.

Curry, H. B. (1944). The method of steepest descent for non-linear


minimization problems. Quart. Appl. Math., 2 (3), pp 250-261.

Daniel, J. W. (1967a). Ihe conjugate gradient method for linear and


nonlinear operator equations. SIAM J. Numer. Anal., 4, pp 10-26.

Daniel, J. W. (1967b). Convergence of the conjugate gradient method


with computationally convenient modifications. Numer. Math., 10,
pp 125-131.
Dantzig, G. B. (1951). Maximization of a linear function of variables
subject to linear inequalities: in "Activity Analysis of Produc-
tion and Allocation", ed. T. C. Koopmans. Cowles Ccnanission
Monograph No. 13, Wiley.

131
I—,H.I- -i.-!.—!..!'!- ■ ^'XI"."L..I.. m^mmm^^im*.: . —• T"-■■-■» ^!"M .T-—»r—wi-c-^-f ■

Dantzig, G. B. (1963). Variable metric method for minimization.


U.S.A.E.C. Research and Development Rep. ANL-5990, Argonne
National Laboratory.

Davidon, W. C. (1968). Variance algorithm for minimization. Comp. J.,


10, pp 406-M0.

Dorn, W. S. (i960). Duality in quadratic programming. Quart. Appl.


Math., 18, pp 155-162.

Edelhaum, T. N. (1962). Iheory of maxima and minima: in "Optimiza-


tion Techniques", ed. G. Leitma-m. Academic Press.

El-Hodiri, Mohamed A. (1971). Constrained Extrema, Lecture Notes in


Operations Research and Mathematical Systems 56, Springer-Verlag.

Kvans, J. P. and Gould, F. J. (1970). Stability in non-linear


programming. Op. Res., 18 (l), pp 107-118. wJ

Kaure, P. and Huard, P. (1966). Resultats nouveaux relatifs A la


mdthode des centres. Quatri&ne Conference de Recherche
Op^rationelle, Cambridge, Mass.

Fiacco, A. V. (1967). Sequential unconstrained minimization methods


for nonlinear programming. Fh.D. Dissertation, Northwestern
University, 111.
Fiacco, A. V. (1970). Penalty methods for mathematical programming in
!;n with general constraint sets. J. Opt. Ih. Appl., 6 (3),
pp 2l32-268.

Fiacco, A. V. and McCormick, G. P. (1963). Programming under non-


linear constraints by unconstrained minimization: a primal-dual
method. Tech. Paper RAC-TP-96, Research Analysis Corporation.

Fiacco, A. V. and McConnick, G. P. (196^a). The sequential unconstrained


minimization technique for nonlinear programming: a primal-dual
method. Man. Sei., 10 (2), pp 360-366.

Kiacco, A. V. and McCormick, G. P. (1964b). Computational algorithm


for the sequential unconstrained minimisation technique for
nonlinear programming. Man. Sei., 10 (2), pp 601-617.

152

r- —■■ • '■--- ■ ■■-- — ■—"——


iJLI »JM,immm^wmwwitmmmmrwwmmmfm^'•PBPPWWWWB«« J!ijMiiiJii.«ii.ji«."iuui.ju^«M.«iii«|j('w^wiwiHuuwuiuii4iiJiw»w!ii«uiiBjii.^

Fiacco, A. V. and McCormick, G. P. (1966). Extensions of SUMT for


nonlinear programming: equality constraints and extrapolation.
Man. Sei., 12 (ll), pp 816-829.
Fiacco, A. V. and McCormick, G. P. (1967). The slacked unconstrained
minimization technique for convex programming. SXAM J. Appo.
Math., 15 (3), pp 505-515.
Fiacco, A. V. and McCormick, G. P. (1968). "Nonlinear Programming:
Sequential Unconstrained Minimization Techniques". Wiley.

Fletcher, R. (1965). Function minimization without evaluating


derivatives - a review. Corap. J., 8, pp 33-41.

Fletcher, R. (1968). Programming under linear equality and inequality


constraints. ICI Rep. No. MSDH/68/19.

Fletcher, R. (Ed.) (1969a). "Optimization". Academic Press.

Fletcher, R. (1969b). A review of methods for unconstrained optimiza-


tion: in "Optimization", ed. R. Fletcher. Academic Press.

Fletcher, R. (1969c). A new approach to variable metric algorithms.


AERE Tech. Rep. No. 383.

Fletcher, R. (1970). A general quadratic programming algorithm.


AERE Tech. Rep. No. hOl.

Fletcher, R. (1970). A new approach to variable metric algorithms.


Computer J., 13, pp 317-322.

Fletcher, R. and McCann, A. P. (1969). Acceleration techniques for


nonlinear programming: in "Optimization", ed. R. Fletcher.
Academic Press.
Fletcher, R. and Powell, M. J. D. (1963). A rapidly convergent
descent method for minimization. Comp. J., 6, pp 163-168.

Fletcher, R. and Reeves, C. M. (1964). Function minimization by


conjugate gradients. Comp. J., 7, pp 1^9- 154.

Forsythe, G. E. (1967). On the asymptotic directions of the s-


diraensional optimum gradient method. Tech. Rep. No. CS6l,
Stanford University.

135

gjMMiriaMjag^g-a,,-.—,,—.-„,.,....,.....,^..„—
nT^rvi^T-T^^gT^-W-^yTtrwv^f^TTnr^

Forsythe, G. E. and Motzkin, T. S. (1951). Asymptotic properties of


the optimum gradient method (abstract). Bull. Am. Math. Soc,
57, P 183.

Frisch, K. R. (1955). The logarithmic potential method of convex


programming. Memorandum May 13, 1955, University Institute of
Economics, Oslo.

Geoffrion, A. M. (1966). Strictly concave parametric programming,


part 1: basic theory. Man. Sei., 13, pp 244-253.

Geoff rion, A. M. (1967). Strictly concave parametric programming,


part 2: additional theory and computational considerations.
Man. Sei., 13 (5), PP 359-370.
Gill, P. E. and Murray, W. (1970). Stable implementation of unconstrained
optimization algorithms, NFL report Maths. 97-

Goldfarb, D. (1969a). Sufficient conditions for the convergence of a


variable metric algorithm: in "Optimization", ed. R. Fletcher.
Academic Press.

Goldfarb, D. (1969b). Extension of Davidon's variable metric method


to maximization under linear inequality and equality constraints.
SIAM J. Appl. Math., 17, pp 739-764.

Goldfarb, D. (1970). A family of variable-metric methods derived by


variational means. Math. Comp., 24, pp 23-26.

Goldfarb, D. and Lapidus, L. (1968). Conjugate gradient method for


nonlinear programming problems with linear constraints.
I. and E. C. Fundamentals, 7 (l), PP 142-151.

Goldstein, A. A. (1962). Cauchy's method of iminmization. Numer.


Math., 4, pp 146-150.

Goldstein, A. A. (1965). On steepest descent, SIAM J. Control, 3,


pp 147-151.
Goldstein, A. A. (1968). Constructive Real Analysis, Harper and Row.

Goldstein, A. A. and Price, J. (1967). An effective algorithm for


minimization, Num. Math., 10, pp. 184-189.

154

^JICiHIIBIUMlMUMKjMMaUifi
||pWPP^iWM^|iiWJ!»'r''l'W<IIIJy»l!F. I»1'IP|MJIIWIIIII^IUILII|1PI!|^»1.,W|IW>II.|II»H,BI|^M^I''«WIH"MPI"'J'I|.B .-H UPTB'»!»!'!!".'"Hili.imiiyiUmMCT.'ilHI.illH iiiUliW|^»,».>.PH|.|.iiiHIII-i].iHllii IIIIIIIIH^PIJ.HIIIJIWJIIIIIIIIII ! uni iji.i.np.

Golub, G. H., and Saunders, M. A. (1969). Numerical methods for


solving linear least squares problems, Computer Science Dept.
Tech. Rep. CS134, Stanford University.

Gould, F. J. and Tolle, J. W. (1971). A necessary and sufficient


constraint qualification, SIAM J. Appl. Maths. 20, pp 164-172.

Greenstadt, J. (1967). On the relative efficiencies of gradient


method?.. Math. Ccanp., 21, pp 360-367.

Greenstadt, J. (1970). Variations on variable-metric methods.


Math. Comp., 24, pp 1-25.

Haarhoff, P. C. and Buys, J. D. (1970). A new method for the


optimization of a nonlinear function subject to nonlinear
constraints. Comp. J., 13 (2), pp 178-184.

hadley, G. (1964). "Nonlinear and dynamic Programming". Addison-


Wesley.

Hartley, H. 0. and Hocking, R. R. (1963). Convex programming by


tangential approximation. Man. Sei., 9» PP 600-612.

Hestenes, M. R. (1966). "Calculus of Variations and Optimal Control


Theory". Wiley.

Huang, H. Y. (1969). Unified approach to quadratically convergent


algorithms for function minlmization. J.O.T.A., 5, pp 405-425.

Huang, H. Y. and Levy, A. V. (1969). Numerical experiments on


quadratically convergent algorithms for function minimization.
Aero-Astronautics Rep. No. 66, Rice: University.

Huard, P. (1964). Resolution de programmes math6natiques & constraintes


non-linöaires par la m£thode des centres. Note E.D.F., HR5690/317.

Huard, P. (1967). Resolution of mathematical programming with


nonlinear constraints by the method of centres; in "Nonlinear
Programming", ed. J. Adadie. North-Holland Pub. Co., Amsterdam.

Huard, P. (1968). Programmation math6natique convex. R.I.R.O., 7»


pp 43-59.

f 155
-"'•"""Tr ' " ' ■■'■ ' '_
liiir-T T'"irms: If^wwtfWii 'yr^r*riwv!,»A'mjn"mmi

John, J1'. (19^8). Extremum problems with inequalities as subsidiary


conditions: in "studies and Essays". Courant Anniversary Volume,
interscience, pp 187-204.

Kelley, )I. J. (1962). Methods of gradients: in "Optimization


Techniques", ed. G. Leitmann. Academic Press.

Kelley, J. E., Jr. (i960). The cutting plane method for solving
convex programs. J. Soc. Ind. Appl. Math., 8 (4), pp 703-712.

Kowalik, J. (1966). Nonlinear programming procedures and design


optimization. Acta Polytechnica Scandinavica, 13, Trondheim.

Kowalik, J. and Osborne, M. R. (1968). "Methods for Unconstrained


Problems". American Elsevier.

Kowalik, J., Osborne, M. R. and Ryan, D. M. (1969). A new method for


constrained optimization problems. Op. Res., 17 (6), pp 973-983.

Kuhn, H. W. and Tucker, A. W. (1951). Non-linear Programming: in


"Proceedings of the Second Berkeley Symposium on Mathematical
statistics and Probability", ed. J. Neyman. University of
California Press, pp ^81-493.

Kunzi, 11. P. and Oettli, W. (1969). Nichtlineare Optimierung,


Lecture Notes in Operations Research and Mathematical Systems 16,
Springer-Verlag.

Lemke, G. J'.'. (1962). A method for solution of quadratic programming


problems. Man. 3ci., 8, pp 4^2-453.

Lootsma, b''. A. (1967). Logarithmic programming: a method of solving


nonlinear-programming problems. Philips Res. Rep., 22 (3),
pp 329-344.

Lootsma, F. A. (1968a). Extrapolation in logarithmic programming.


Hiilips Res. Res. Rep., 23, pp IO8-II6.

Lootsma, F. A. (1968b). Constrained optimization via penalty functions.


Philips Res. Rep., 23, pp 408-423.

Lootsma, F. A. (1969). Hessian matrices of penalty functions for


colving constrained-optimization problems. Philips Res. Resp.,
24, pp 322-331.

156

m- - -' - " ... .-.^.^ ■ .- ........±.


IIIIÜPUHHIII IHHlJIUlill.lilHHIIIIpiTWHWWIWiWi.l'Wll' I'll. Mi|illi.lHipiBpiUHHifllMI»»»lli|i|ll"li'lH.lni|lv.P^i.,»nwi ill i<iM>i|lwvi-»iii;i.iiii-«li ill» J l|ji^iiw;ill|l,,||l|iwil|imwwil|IWfl|lllll||l|ffWIW^«^tlil|l|l|llfl| IMIII PHHpm

—— m—ii ■ — mmimmmmmmm—mmmmm

Lootsma, F. A. (1970). Boundary properties of penalty functions


for constrained minimization. Philips Res. Resp. Supp. No. 3.

Lootsma, F. A. (editor), (1972). Numerical Methods for Nonlinear


Optimization, Academic Press.

Luenburger, D. G. (1969). "Optimization by Vector Space Methods".


Wiley.

Mangasarian, 0. L. (1969). Nonlinear Programming. McGraw-Hill.

Mangasarian, 0. L. and Fromowitz, S. (1967). The Fritz Jo'm necessary


optimality conditions in the presence of equality and inequality
constraints. J. Math. Anal. Appl., 17 (l), PP 37-V7.
McCormick, G. P. (1969). The rate of convergence of the reset
Davidon variable metric method. MRC Tech. Rep. No. 1012,
University of Wisconsin.

McConnick, G. P. and Pearson, J. D. (1969). Variable metric methods


and unconstrained optimization: in "Optimization", ed. R. Fletcher.
Academic Press.

Meyers, G. E. (1968). Properties of the conjugate-gradient and


Davidon methods. J. Opt. Th. Appl., 2 (k), pp 209-219.

Morrison, D. D. (1968). Optimization by least squares. SIAM J.


Numer. Anal., 5 (l), PP 83-88.. """"

Murray, W. (1969a). Indefinite quadratic programming. Nat. Phys.


Lab. Rep. Ma 76.

Murray, W. (1969b). Behaviour of Hessian matrices of barrier and


penalty functions arising in optimization. Nat. Phys. Lab.
Rep. Ma 77.

Murray. W. (1969c). Constrained Optimization. Nat. Fhys, Lab. Rep.


Ma 79.
Murtagh, B. A. and Sargent, R. W. H. (1969). A constrained minimiza-
tion method with quadratic convergence; in "Optimization, ed.
R. Fletcher, Academic Press.

Murtagh, B. A. and Sargent, R. W. H. (1970). Computational experience


■with quadratically convergent minimization methods. Comp. J.,
13 (2), pp 185-19^.

157
■^■■■,J.--..^-,-.^—^ .. . ...
rw»w-Tyf')^l'.lw'"g',WWf^,w' v.^ffivm-'•'<;.iw.'T>'r ^.-TiP-wTl"i""W'"TT-wiw"w.^.w-F^'"wl.^l.'mn-,'i'>in' "' im'i^u'i'■.■■".'■■'WTI»..HI"I.BI,HI ^PI ,fi«ii H MM II' I.II«I ^.*m-*.•l'.y■^r*•w*^•'^'w••*rl•*":l"^^" n'>vi"\.n"H'm\\wm'^^ w*mwm\ttm%^t*fimxiHli

Neider, J. A. and Mead, R. (1965). A sluiplex method for function


minimization. Corap. J., 7, pp 308-313.

Osborne, M. R. and Ryan, D. M. (1970a). On penalty function methods


for nonlinear programming problems. J. Math. Anal. APPl.> 31 (3),
PP 559-578.

Osborne, M. R. and Ryan, D. M. (1970b). An algorithm for nonlinear


programming. Tech. Rep. No. 35, Computer Centre, A.N.U.

Osborne, M. R. and Ryan, D. M. (1972). A hybrid method for nonlinear


programming, In Numerical Methods for Nonlinear Optimization
(ed. F. A. Lootsma), Academic Press.

Ostrowski, A. M. (1966). Solution of Equations and Systems of


Wquations (2nd Edition), Academic Press.

Parisot, G. R. (1961). Resolution numärique approchde du probl&ne


de programmation lin^aire par application de la programmation
logarithmique. Rev. Fr. Recherche Opdrationelle, 20, pp 227-259.

Pearson, J. D. (1969). On variable metric methods of minimization.


Corap. J., 12, pp 171-178.

Pietrzykowski, T. (1962). Application of the steepest descent method


to concave programming; in "Proceedings of IFIPS Congress",
M\mich, North-Holland Pub. Co., pp 185-189.

Polak, K. and Ribi&re, G. (1969). Note sur la convergence de m£thodes


de directions conjugu^es. R.I.R.O., 3, pp 35-^3.

Pomentale, T. (1965). A new method for solving conditioned maxima


problems. J. Math. Anal. Appl., 10, pp 216-220.

Powell, M. J. D. (1967). A method for nonlinear constraints in


minimization problems. AERE Tech. Rep. No. 310.

Powell, M. J. D. (I968). A survey of numerical methods for unconstrained


optimization. AERE Tech. Rep. No. 340.

Powell, M. J. D. (1969a). Rank one methods for unconstrained optimiza-


tion. AERE Tech. Rep. No. 372.

138
J i

■ MM -^-•- ■■■-■■ . ■ ■'■■+■ ■■ -^ ■■-..::.. ■ ■ J- ^—... .::■ .;.....^-...-^


Iill».^.^!«^"..^«»..!...!.-■».,. WIW»i--li.W''''^-....'r.>.T«1.|.IWTl»T-^^iraw^

Powell, M. J. D. (1969b). On the convergence of the variable metric


algorithm. AERE Tech. Rep. No. 382.

Powell, M. J. D. (1970). A new algorithm for unconstrained optimiza-


tion. AERE Tech. Rep. No. 393.

Rosen, E. M. (1966). A review of quasi-Newton methods in nonlinear


equation solving and unconstrained optimization; in "Proceedings
21st National Conference of the A.CM.". Thompson Book Co.,
pp 37-41.

Rosen, J. B. (i960). The gradient projection method for nonlinear


programming, part 1: linear constraints. J. Soc. Ind. Appl.
Math., 8 (1), pp 181-217.

Rosen, J, B. and Suzuki, S. (1965). Construction of nonlinear


programming test problems. Comm. A.CM., 8, p 113.

Rosenbrock, H. H. (i960). An automatic method for finding the


greatest or least value of a function. Corap. J., 3, pp 175-184.

Rudin, W. (1953). "Principles of Mathematical Analysis". McGraw-


Hill, N. Y.

Schmit, L. A. and Fox R. L. (1965). Integrated approach to structural


symhesis, AIAA J., 3 (6), pp 1104-1112.

Shah, B. V., Buehler, R. J. and Kempthorne, 0. (1964). Some algorithms


for minimizing a function of several variables. J. Soc. Ind.
Appl. Math., 12, pp 7^-92.

Spang, H. A. (1962). A review of minimization techniques for nonlinear


functions. SIAM Review, k (4), pp 343-365.

Stoer, J. (1971). On the numerical solution of constrained loast


squares problems, SIAM J. Numerical Analysis, 8, pp 382-4x1.

Strong, R. E. (1965). A note on the sequential unconstrained minimiza-


tion technique for non-linear programming. Man. Sei., 12 (l),
pp 142-144.

Tremoli^res, R. (1968). La m^thode des centre & troncature variable.


Th^se, Paris.

159

ÜIMttÜiÜültt "ilii^&Mfiaiiiiii'ii' 1.1111 - -^-- ^--■III'»,, 1


p«IJWHWi|i||IHIi|iT»IHIIi,llil|lll.iii||IIWil»HI inilllim^niillulliKiJil.i I,ILH!»HH.II|IM»HWPII«M»H.J iummi p.. inn Wim>iiii|ni;iniiin|||ii.i|lip„pnip.ww.,|i| ,, nmij i,,^,,., .1 ^ , i.,,--..!...... ■.„., hlf in, mg-m

Whittle, P. (1971). "Optimization under Constraints", Wiley.

Wolfe, P. (1959) • The simplex method for quadratic progrananing.


Econometrica, 27 (3), pp 382-398.

Wolfe, P. (1961). A duality theorem for nonlinear programning.


Quart. Appl. Math., 19 (3), pp 239-24^.
Zangwill, W. I. (1967). Nonlinear programming via penalty functions.
Man. Sei., 13 (5), PP 344-358.
Zangwill, W. I. (1969). "Nonlinear Programming: A Unified Approach".
Prentice-Hall.

Zoutendljk, G. (i960). "Methods of Feasible Directions". Elsevier.

IkO
M

—-■-TMMaMllllMH.Itollil ■ 1 I —^■-

You might also like