Nonlinear Optimisation Techniques Notes

You might also like

Download as pdf
Download as pdf
You are on page 1of 26
Nonlinear Optimisation Techniques Prof David Bogle University College London Complete Nomenclature List Scalar (S$) Vector (U) Matrix (M) objective function (Ss) constraints (U) number of constraints (S) number of equality constraints (S) independent variables (U) number of independent variables (5) solution (optimum) (U) itial point for # (U) ith independent variable (5) (i= gradients (U) Hessian (M ~ nxn) Hessian approximation (M - nxn) Unit matrix (M) iteration number (5) step vector = tye ~My (U) difference between gradients = gx+1 ~ gx W) step length control parameter (S) restricted step = ap (U) trust region size (S) Atk measure of accuracy of quadratic model q of f = 0) (in trust region method of step control) (S) Lagrangian function (S) Hessian of Lagrangian wrt % (M - nxn) Lagrange multipliers (U) constraint gradients (M - nxm) - ith column is gradient of constraint i. For LP (linear constraints) A is constant i.e. Ax = b for all # gradient of constraint i (U) right hand side of linear constraint (U) set of equality constraints set of inequality constraints cost functions in LP (f(#) = ula) (U) matrix with rows orthogonal to all rows of A, AIZ=0 (M - mxn) step projected into constraints (p=Zpz) (U) projected gradients ( gz = 2g) (U) projected gradient difference ( yz = Zy) (U) projected Hessian approximation (Bz = 2'B2) (M ~ nxn) Quadratic approximation for Lagrangian (S) Lagrangian for line search sub-problem (5) Lagrange multipliers for line search problem (W) timisation 2 Si PRELIMINARY DEFINITIONS Independent variables Objective function Constraints (equality or inequality) Gradient vector (af/ox) H(n) Hessian Matrix (a2f/anjon) HTH Quadratic form Positive Definite Matrices If uTHu> 0 for all x * 0 H is positive definite and so is its inverse. (positive semi-definite RTH 20 negative definite HTH HCO) 2. Contours If the objective function is plotted on axes of the independent variables (x), contours are fines joining points of constant value of the objective function (f(x). The gradient vector of a function at any point is always normal to the function contour at that point. 3 e ference Approximations We can generate approximate formulae for the derivatives of functions by using a Taylor series: af nar fleh) = 160 + Get gaat oot dt _ (Gh) - 100 | h det dx h 2 dx = + ho This is a forward difference approximation formula (cf central and backward difference formulae). The approximation is first order since the truncation error, given be the second term on the right hand side, is o(h). Optimisation 3 *2 Fig.l Contours for the function fsty)Hz,Ms) = Hi? * Ho? * Hs? (the contours are concentric spheres) 2 4 te quadratic function Fig.2 Contours for a positive defini Fig.5, Contours for Rosenbrock's function 004,82) = 100 * (ty? - #2) + Hy) sta) Optimisation Stationary Points strong cal ainirmum weak local minima global | U Minimum (two dimensions) Maximum (two dimensions) ain Saddle Point (two dimensions) ation Problems Properties of fi) Properties of cil#) Function of a single variable No constraints Linear function Simple Bounds Sum of squares of linear ‘function Linear functions Quadratic functions Sparse linear functions sum of squares of nonlinear functions Smooth nonlinear functions Smooth nonlinear function Sparse nonlinear functions Sparse nonlinear function Non-smooth nonlinear functions Non-smooth nonlinear function Optimisation UNCONSTRAINED OPTIMISATION An unconstrained mi misation problem may be expressed as min f(x) I 0 @) Since (x - x*) can be cither positive or negative, if g(x*) is non-zero then N Oe - x9Tg(e*) = del oe xi)? (a) & Consider the situation that all the x's are at the optimum except xj ic. (Offox;)x* is non-zero. Since (x; - xj*) could be either positive or negative and, ignoring second and higher order terms, equation 2 would not hold and x* would not be a minimum. This leads to the so-called first order condition for a stationary point (minimum or maximum) of f(x). [FIRST ORDER CONDITION [ of - lank. =0 ¥ jl,nuN = g(x") =0 Optimisation 7 If g(x) = 0, then for x* to be a minimum (x - x*)TH(x*)(x - x*) must be greater than zero which leads to the second order condition. [SECOND ORDER CONDITION H(x*) must be positive definite. Only if the Hessian is positive definite can we guarantee that we have a minimum. If it is positive semi-definite then we may have a weak minimum or a saddle point. An optimal point x* is completely specified by satisfying what are called the necessary and sufficient conditions for optimality. A condition N is necessary for a result R if R can be true only if the condition is true (R=N). However, the reverse is not true, ic. if N is true, R is not necessarily true. A condition is sufficient for a result R if R is true if the condition is true (SR). A condition is necessary and sufficient for a result R if and only if T is true (TR). The first order condition is a necessary condition for the optimum, The second order condition is the sufficiency condition. eg. f(x) = x3 f(x) = 3x2 £"(x) = 6x At x=0 the necessary condition is satisfied but the sufficiency condition is not since the second derivative is positive semi-definite (f(x) = 0). These conditions have a practical use. By testing that the necessary condition is not satisfied we can demonstrate that a point is definitely not optimal. The sufficiency condition need only be checked to show that a point definitely is the optimum. This is based on small deviations about x*, hence x* is a local minimum. Optimisation 8 MINIMISATION OF A SINGLE VARIABLE FUNCTION Bisection Algorithm: 1. Select interval [a,b] which brackets minimum i.ex g(a)g(b)<0 and g(a) > 0 2. Evaluate g(x) = gs) 3, If abs(g(x))) < € stop 4. If gx) <0 set a else set b 5. sit 6. goto2 Others - Fibonacci, Golden Section Newton's Method y(xx) Meet = ke ~ ¥"Gq,) Solve the equation g(x) = 0 BOX) > MeI= Xk Gg) FOF min £9) Algorithm 1, Select xo G=0) 2. Evaluate g(x;) 3. If g(x) < © and h(x) > 0 then stop 4. Evaluate h(x) BO) 5 Xe+1= Xk“ Cxy) 6. isitl T. go to2 Uses gradient and and linear approximation for second derivatives i.c. geaxt+b 1 =F ax2 + + h 2 ax? + bx +0 Optimisation 9 MINIMISATION OF FUNCTIONS OF MANY VARIABLES PATTERN SEARCH METHODS Simplex method, fixed step method, Continuous steepest descent GRADIENT METHODS A Model Algorithm for Gradient Methods 1. 2. 3. 4, 5. Stop if convergence attained. If not return to 2 and set k = k+1 Choose initial x) and determine f(x), (Xo), H(x,). Set k=0. Determine step direction dy Determine step length - select a such that f(x, + ady) < f(x) and set py = ady Update xKe1 = Xk + Pk > fet Betty Hit Methods for choice of search direction i) Steepest descent If pg < 0, this will guarantee continuous reduction in f, i.e. the step is a descent step. Choose p=-8 Advantages - simple Disadvantages - inefficient, oscillatory, not scale invariant ii) Newton's Method This method instead solves the equation g(x)=0 iteratively. Steps are generated in a similar manner to Newton's method in one direction, i.e. by setting up a linear approximation to the second derivatives. The step is generated by solving the equation H(Xy) Pk = - 8(%K) This is done by solving this equation as a linear system by matrix decomposition techniques (not Dk = - H(x1)! g(x). A descent direction can only be guaranteed if at p=eTilg>0 Optimisation 10 which is only true if H-1 is positive definite. Advantages - in general more efficient Disadvantages - Requires Hessian matrix (or finite difference approx), Cannot guarantee descent direction Two possible improvements a) exact line search b) ensure H is positive definite eg. H' = (H + vi) iii) Quasi-Newton Methods Approximate H using Quasi-Newton formulae e.g. DFP (Davidon- Fletcher-Powell) or BFGS (Broyden-Fletcher-Goldfarb-Shanno) formulae. iv) Conjugate Gradient Methods Step Length Control i) Bxact line search Search to find the minimum of the objective function along the step direction - use univariate methods e.g. bisection, Golden Section. ii) Approximate (inexact) line search Search to ensure that the objective function is reduced. Trust Region method Assume that the quadratic approximation is valid in some region around the current point and the step is restricted to within that region (hypersphere). A measure of accuracy of the quadratic model is given by Afk ** aak where Aqx is the change in the local quadratic approximation between steps Kand k+1 and Af; is the change in f. Optimisation u The following is a commonly used algorithm: 1, Given trust region size dy, and xk, gk and Hy 2. Obtain py by gradient (or other) method 3.1F Tog < Dx retain px else set | p, | =3, 4, Evaluate fist and hence rk 5. Tf re < 0.25 setdcer= Hp, tl a* re >0.75and II ppll=o_ set dk = 2y else dk+1 = Bk 6. If tm < 0. set xke1 = xk else xk = xk + Pk Optimisation 12 simplex Method Y Fixed Step Sequential Minimisation Search Direction Methods @® Continuous Steepest Descent Optimisation 13 escent Direction (-g'p? 0) Enact Line Search Steepest Descent Step Steepest Descent (5) and Newton (N) Steps for e Quadratic Func Optimisation 14 CONSTRAINED OPTIMISATION A constrained optimisation problem may be expressed mathematically as minimize f(x) xe RO subject to c(x)=0 icE equality constraints ci(x) 20 iel inequality constraints If any point x satisfies all the constraints it is said to be a feasible point and the set of all such points is referred to as the feasible region.. The conditions for a local constrained minimum may be set up in the same way as in unconstrained optimisation. However the conditions are posed in terms of a composite function called the Lagrangian function. The Lagrangian function L(x,A) is defined as follows: LOA) = fx) - Y Accn(x) kel = f(x) - aTe(x) (1) where A is the set of Lagrange multipliers with one for each constraint. CONDITIONS FOR A LOCAL CONSTRAINED MINIMUM Consider the case that all constraints are equality constraints. At a constrained local minimum the Lagrangian must satisfy Vx = V,L = 0, ie. oL oxi; 79 (no. of variables) (2) aL | om (no. of constraints) @) Condition (3) can be seen to signify that all constraints are satisfied since oL. dng = 80) = 0. Conditions (1) and (2) give Optimisation 15 OL _ of Ock _ axi* Oxi” » Meo =O m = gx) = YL Ana(x) k=l With inequality constraints When inequality constraints are involved we need to determine whether or not the constraint is ‘active’ i.e. whether the current point satisfies one of the inequality constraints exactly in which case it may be treated as an equality constraint. In order to determine this we may use the sign of the Lagrange multiplier. If the Lagrange multiplier is non-negative the constraint is active. If constraint k is not active Ax = 0. This is intuitively sensible since when the point is away from a constraint that constraint does not affect the problem and should not contribute to the Lagrangian function. Example f(x) = x12 + x9? x2-120 Form the Lagrangian function L(x,A) = x12 + x22 - Axa - oL Vag = 5 = 2x1 = 0 =x1=0 Vink = 22-050 2 d= 2x2 vaL =-(2- 1) = x2=l And At the point (0,1) 4 = 2 and f cannot be improved (f = 1, x2 2 1). Hence this is a minimum of the function subject to the constraints. If we consider instead the problem f(x) = x1? - x2? x2-120 ‘Optimisation 16 > F Decreasing = \_ / mE ((D\ WT Contours for f(x) = x12 + x2? Constraint x2 > 1 is active at the minimum x F Decreasing 2 Lo Contours for f(x) = x1? - x2? Constraint x2 > I is not active at the minimum Optimisation 7 If we follow the same argument the point (0,1) satisfies the conditions on the derivatives of the Lagrangian function but A = ~2 and the objective function can indeed be improved by moving away from the constraint. This Ieads to the analogous conditions for a constrained minimum. [FIRST ORDER CONDITIONS (Kuhn-Tucker Conditions) VL(%,a) = 0 cx) = 0 ick cx) 20 ier az 0 iel Rici vi The final condition is called the complementarity condition. If cj = 0 at the solution the constraint is active and the Lagrange multiplier may have any positive value. If the constraint is not active at the solution the multiplier should have a value of zero. If the multiplier is negative the solution does not lie on this constraint. ISECOND ORDER CONDITIONS INecessary conditions If x* is a local minimum of f then the 2's exist and sTWs > 0 vse G* Sufficient conditions If x* is a local minimum of f then the A's exist and sTW*s > 0 vse G* where G* is the set of all feasible directions and W* is the Hessian of the Lagrangian function at the solution i.e. Vx2L(x*,0*) Optimisation 18 METHODS FOR LINEARLY CONSTR. min f(x) subject to Ax = b = ailx = x20 Methods for this problem are based on the gradiente methods for unconstrained optimisation but modified to handle the constraints. Active Set methods These methods determine which constraints belong to the active set and move along one constraint until another constraint is violated. When this occurs the new constraint is added to the ‘active set! and the constraint with the most negative multiplier is dropped from the active set. The following is a commonly used algorithm. Algorithm 1, k=l. Choose x() and determine the set of active constraints 2. If this is not a solution to the equality constrained problem go to step 4. 3. Compute aK) and choose the most negative Aq) if Aq®) 2 0 then x* = x) and stop else remove q from the active set. 4. Solve equality constrained problem for step p(®) 5, Find the first constraint violated along the step direction and set xl) = x) + aWp® om . ajTx(k). where a) = min (1, ahih pMkp and linear approximations to the constraints: ci(x®) + a(e)Tp = 0 i=1,2,...,me ci(x) + a(xTp > 0 i= metl,...,m The resulting QP is solved and the approximations updated at each iteration. Algorithm 1. Choose initial feasible point x9 and evaluate H®, g9, a9, ¢9, Solve quadratic programming problem (QP) for step direction pk. 3. Use a line search algorithm to minimise the objective function while satisfying constraints for xk+I me m min Y(x,") = f(x) + Zpilesoo + ¥Y nilmin(0,ci(x))| 5 ising To ensure that we also have a descent direction we must also. satisfy W> Pil i=1,2,, The line search finds a value for a which satisfies these conditions. Optimisation 23 4, Set dk = xk+l - yk = apk Solve for the multipliers. If a constraint becomes active add it to the active constraint set and drop that constraint with the most negative multiplier. gk+l - gk If xk*l is a Kuhn-Tucker point 5. Evaluate gk*!, then STOP. 6. Update BK (an approximation to H*) with the BFGS Quasi- Newton update. BkdkdkTBE —_ ykykT BEL = BE —Gurpkak | *takTyk If dkTyk 0 set yk = nBKdk + yk which will make Bk+! positive definite and therefore guarantee that dk! is a descent direction. 7, Set k = k+l and return to step 2. [1] Murtagh B.A, and Saunders M.A. (1980) 'MINOS/AUGMENTED user's manual’ Report SOL 77-9 Department of Operations Research, Stanford University, CA. [2] Murtagh B.A, and Saunders M.A. (1982) 'A projected Lagrangian algorithm and its implementation for sparse nonlinear constraints' Math. Prog. Study 16 pp84-117 [3] Powell M.J.D. (1978) 'A fast algorithm for nonlinearly constrained optimisation calculations’ Proc. Biennial Numerical Analysis Conference, Dundee (ed. G.A. Watson) Springer-Verlag. Optimisation 24

You might also like