Professional Documents
Culture Documents
Const Opt
Const Opt
Const Opt
constraints, but in this section the more general description in (23) can be
handled. The idea of a penalty function method is to replace problem (23)
by an unconstrained approximation of the form
Minimize{f(x) + cP(x)} (24)
Example 16
cP(x)
c=1 c=1
c = 10 c = 10
c = 100 c = 100
a b x
Example 17
To clarify these ideas and to get some understanding on how to select the
penalty parameter c, let us consider the following problem.
Minimize f(x) = (x1 – 6)2 + (x2 – 7)2
subject to g1(x) = –3x1 – 2x2 + 6 ≤ 0
g2(x) = –x1 + x2 – 3 ≤ 0
g3(x) = x1 + x2 – 7 ≤ 0
2 4
g4(x) = 3 x1 – x2 – 3 ≤ 0
x2
5
0
0 1 2 3 4 5 6 x1
Setting the elements of ∇x (c,x) to zero and solving yields the stationary
point
Penalty and Barrier Methods 5
6(1 + c) 6c
x1*(c) = 1 + 2c and x2*(c) = 7 – 1 + 2c
Algorithm
to obtain xk, the optimum It is assumed that problem (26) has a solution
for all positive values of ck. This will be true, for example, if (c,x)
increases without bounds as ||x|| → ∞.
A simple implementation known as the sequential unconstrained
minimization technique (SUMT), is given below.
k c x1 x2 g3
Implementation Issues
Much of the success of SUMT depends on the approach used to solve the
intermediate problems, which in turn depends on their complexity. One
thing that should be done prior to attempting to solve a nonlinear program
using a penalty function method is to scale the constraints so that the
penalty generated by each is about the same magnitude. This scaling
operation is intended to ensure that no subset of the constraints has an
undue influence on the search process. If some constraints are dominant,
the algorithm will steer towards a solution that satisfies those constraints
at the expense of searching for the minimum.
In a like manner, the initial value of the penalty parameter should
be fixed so that the magnitude of the penalty term is not much smaller than
the magnitude of objective function. If an imbalance exists, the influence
of the objective function could direct the algorithm to head towards an
unbounded minimum even in the presence of unsatisfied constraints. In
either case, convergence may be exceedingly slow.
Convergence
Although SUMT has an intuitive appeal, it is still necessary to prove that it
will converge to the optimum of the original problem (23). The following
lemma is the first component of the proof, and gives a set of inequalities
that follows directly from the definition of xk and the inequality ck+1 > ck.
Lemma 1:
(ck, xk) ≤ (ck+1, xk+1)
8 Algorithms for Constrained Optimization
P(xk) ≥ P(xk+1)
f(xk) ≤ f(xk+1)
Proof:
(ck+1, xk+1) = f(xk+1) + ck+1P(xk+1) ≥ f(xk+1) + ckP(xk+1)
Barrier Function
It should be apparent that the quadratic penalty function in (25) is only one
of an endless set of possibilities. While the "sum of the squared
violations" is probably the best type of penalty function for an exterior
method, it is not at all suitable if one desires to conduct a search through a
Penalty and Barrier Methods 9
where r > 0 is the barrier parameter. This function is valid only for
interior points such that all constraints are strictly satisfied: gi(x) < 0 for
all i.
Equation (27) indicates that the closer one gets to a constraint
boundary, the larger B(r,x) becomes. Indeed, points precisely on any
boundary are not defined. Hence B(r,x) is often called a barrier function
and, in a sense, is opposite to the kind of exterior penalty function
introduced in (25).
As discussed in Chapter 4, the basic idea of interior point methods
is to start with a feasible point and a relatively large value of the
parameter r. This will prevent the algorithm from approaching the
boundary of the feasible region. At each subsequent iteration, the value of
r is monotonically decreased in such a way that the resultant problem is
relatively easy to solve if the optimal solution of its immediate
predecessor is used as the starting point. Mathematically, the sequence of
solutions {xk} can be shown to converge to a local minimum in much the
same way that the exterior penalty method was shown to converge earlier.
Example 18
2 –1
B(r,x) = 2x1 + 9x2 + r –x – x + 4.
1 2
with gradient
4x – r(–x – x + 4)–2
1 1 2
∇xB(r,x) = –2 .
9 – r(–x1 – x2 + 4)
Setting the gradient vector to zero and solving for the stationary point
yields
10 Algorithms for Constrained Optimization
r→0
LQ(rk,x) = f(x) – rk ∑ 1
ln( – gi(x)) + r
k
∑ hi(x)2 .
i=1 i=1
The algorithm finds the unconstrained minimizer of LQ(rk,x) over the set
{x : gi(x) < 0, i = 1,…, m} for a sequence of scalar parameters {rk} strictly
Penalty and Barrier Methods 11
Summary
Penalty and barrier methods are among the most powerful class of
algorithms available for attacking general nonlinear optimization
problems. This statement is supported by the fact that these techniques
will converge to at least a local minimum in most cases, regardless of the
convexity characteristics of the objective function and constraints. They
work well even in the presence of cusps and similar anomalies that can
stymie other approaches.
Of the two classes, the exterior methods must be considered
preferable. The primary reasons are as follows.
1. Interior methods cannot deal with equality constraints without
cumbersome modifications to the basic approach.
2. Interior methods demand a feasible starting point. Finding such a
point often presents formidable difficulties in and of itself.
3. Interior methods require that the search never leave the feasible
region. This significantly increases the computational effort
associated with the line search segment of the algorithm.
Zoutendijk's Method
d
dt gi(x +td)t=0 = ∇gi(x ) d ≤ 0, i∈I
0 0 T
d
0+td)
= ∇f(x ) d < 0
0 T
dt f(x
t=0
∇f(x0)Td – ≤ 0 (28c)
d Td = 1 (28d)
Because the vectors ∇f and ∇gi are evaluated at a fixed point x0,
the above direction-finding problem is almost linear, the only nonlinearity
being (28d). Zoutendijk showed that this constraint can be handled by a
modified version of the simplex method so problem (28) may be solved
with reasonable efficiency. Note that if some of the constraints in the
original nonlinear program (1) were given as equalities, the algorithm
would have to be modified slightly.
14 Algorithms for Constrained Optimization
Of course, once a direction has been determined, the step size must
still be found. This problem may be dealt with in almost the same manner
as in the unconstrained case. It is still desirable to minimize the objective
function along the vector d, but now no constraint may be violated. Thus t
is determined to minimize f(xk + tdk) subject to the constraint xk + tdk ∈ S.
Any of the techniques discussed in Section 10.6 can be used. A new point
is thus determined and the direction-finding problem is re-solved. If at
some point the minimum ≥ 0, then there is no feasible direction
satisfying ∇f(x0)Td < 0 and the procedure terminates. The final point will
generally be a local minimum of the problem. Zoutendijk showed that for
convex programs the procedure converges to the global minimum.
Sequential Quadratic Programming 15
where ∆x is the direction of movement. Given initial values for the variables, in SLP all
nonlinear functions are replaced by their linear approximations as in Eq. (29). The
variables in the resulting LP are the ∆xj's representing changes from the current values. It
is common to place upper and lower bounds on each ∆xj, given that the linear
approximation is reasonably accurate only in some neighborhood of the initial point.
The resulting linear program is solved and if the new point provides an
improvement it becomes the incumbent and the process is repeated. If the new point does
not yield an improvement, the step bounds may need to be reduced or we may be close
enough to an optimum to stop. Successive points generated by this procedure need not be
feasible even if the initial point is. However, the amount of infeasibility generally is
reduced as the iterations proceed.
Successive quadratic programming (SQP) methods solve a sequence of quadratic
programming approximations to the original nonlinear program (Fan et al. [1988]). By
definition, QPs have a quadratic objective function, linear constraints, and bounds on the
variables. A number of efficient procedures are available for solving them. As in SLP,
the linear constraints are first order approximations of the actual constraints about the
current point. The quadratic objective function used, however, is not just the second
order Taylor series approximation to the original objective function but a variation based
on the Lagrangian.
The Lagrangian for this problem is L(x, ) = f(x) – Th(x). Recall that
first order necessary conditions for the point xc to be a local minimum of
Problem (30) are that there exist Lagrange multiplies c such that
∇Lx(xc, c) = ∇f(xc) – ( c)T∇h(xc) = 0 (31)
and h(xc) = 0
16 Algorithms for Constrained Optimization
On the other hand, if ∆x* = 0 is the solution to this problem, we can show
that xc satisfies the necessary conditions (31) for a local minimum of the
original problem. First, since ∆x* = 0, h(xc) = 0 and xc is feasible to
Problem (30). Now, because ∆x* solves Problem (33), there exists a *
such that the gradient of the Lagrangian function for (33) evaluated at ∆x*
= 0 is also equal to zero; i.e., ∇f(xc) – ( *)T∇h(xc) = 0. The Lagrange
multipliers * can serve as the Lagrange multipliers for the original
problem and hence the necessary conditions (31) are satisfied by (xc, *).
The extension to inequality constraints is straightforward; they are
linearized and included in the Lagrangian when computing the Hessian
matrix, L, of the Lagrangian. Linear constraints and variable bounds
contained in the original problem are included directly in the constraint
region of Eq. (33b). Of course, the matrix L need not be positive definite,
even at the optimal solution of the NLP, so the QP may not have a
minimum. Fortunately, positive definite approximations of L can be used
so the QP will have an optimal solution if it is feasible. Such
approximations can be obtained by a slight modification of the popular
BFGS updating formula used in unconstrained minimization. This
formula requires only the gradient of the Lagrangian function so second
derivatives of the problem functions need not be computed.
Because the QP can be derived from Newton’s method applied to
the necessary conditions for the optimum of the NLP, if one simply
accepts the solution of the QP as defining the next point, the algorithm
behaves like Newton’s method; i.e., it converges rapidly near an optimum
but may not converge from a poor initial point. If ∆x is viewed as a search
direction, the convergence properties can be improved. However, since
both objective function improvement and reduction of the constraint
infeasibilities need to be taken into accounted, the function to be
minimized in the line search process must incorporate both. Two
possibilities that have been suggested are the exact penalty function and
Sequential Quadratic Programming 17
Table A2, taken from Lasdon et al. [1996], summarizes the relative merits
of SLP, SQP, and GRG algorithms, focusing on their application to
problems with many nonlinear equality constraints. One feature appears
as both an advantage and a disadvantage –– whether or not the algorithm
can violate the nonlinear constraints of the problem by relatively large
amounts during the solution process.
SLP and SQP usually generate points yielding large violations of
the constraints. This can cause difficulties, especially in models with log
or fractional power expressions, since negative arguments for these
functions are possible. Such problems have been documented in reference
to complex chemical process examples in which SLP and some exterior
penalty-type algorithms failed, whereas an implementation of the GRG
method succeeded and was quite efficient. On the other hand, algorithms
that do not attempt to satisfy the equalities at each step can be faster than
those that do. The fact that SLP and SQP satisfy all linear constraints at
each iteration should ease the aforementioned difficulties but do not
eliminate them.
There are situations in which the optimization process must be
interrupted before the algorithm has reached optimality and the current
point must be used or discarded. Such cases are common in on-line
process control where temporal constraints force immediate decisions. In
these situations, maintaining feasibility during the optimization process
may be a requirement for the optimizer inasmuch as constraint violations
make a solution unusable. Clearly, all three algorithms have advantages
that will dictate their use in certain situations. For large problems, SLP
software is used most widely because it is relatively easy to implement
given a good LP system. Nevertheless, large-scale versions of GRG and
SQP have become increasingly popular.
18 Algorithms for Constrained Optimization
A.3 Exercises
33. Solve the problem given below with an exterior penalty function method, and then
repeat the calculations using a barrier function method.
2 2
Minimize x1 + 4x2 – 8x1 – 16x2
subject to x1 + x2 ≤ 5
0 ≤ x1 ≤ 3, x2 ≥ 0
2 2
Minimize x1 + 2x2
subject to 4x1 + x2 ≤ 6
x1 + x2 = 3
x1 ≥ 0, x2 ≥ 0
35. Repeat the preceding exercise using Zoutendijk’s procedure. Use the normalization
–1 ≤ dj ≤ 1, j = 1, 2, to permit solution by linear programming.
2
Minimize 5x1 – 10x1 – 10x2 log10 x2
2 2
subject to x1 + 2x2 ≤ 4, x1 ≥ 0, x2 ≥ 0
a. Approximate the separable functions with piecewise linear functions and solve
the resultant model using linear programming. If necessary assume 0log 10 0 = 0.
37. Solve the following problem using Zoutendijk’s procedure. Start with x0 = (0, 3/4).
20 Algorithms for Constrained Optimization
2 2
Minimize 2x1 + 2x2 – 2x1x2 – 4x1 – 6x2
subject to x1 + 5x2 ≤ 5
2
2x1 + x2 ≤ 0
x1 ≥ 0, x2 ≥ 0
38. Solve the relaxation of the redundancy problem when the budget for components is
$500 (the value of C). Use the data in the table below.
n
Maximize ∏1 – (1 – rj)1+xj
j=1
n
subject to ∑c x j j ≤ C, xj ≥ 0, j = 1,…, n
j =1
Item, j 1 2 3 4
Reliability, rj 0.9 0.8 0.95 0.75
Cost per item, cj 100 50 40 200
2 2 2
Minimize f(x) = 2x1 + 20x2 + 43x3 + 12x1x2 – 16x1x3
x1 + 2x2 + 3x3 ≥ 5
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0
a. Write out the KKT conditions then set up the appropriate linear programming
model and solve with a restricted basis entry rule. Is the solution a global
optimum? Explain.
40. Use an NLP code to solve the problem in the preceding exercise but this time
maximize rather than minimize. Because the maximization problem may have local
solutions, try different starting points. Confirm that you have found the global
maximum by solving the problem by hand.
22 Algorithms for Constrained Optimization
Bibliography
Bazaraa, M.S., H.D. Sherali and C. M. Shetty, Nonlinear Programming: Theory and
Algorithms, Second Edition, John Wiley & Sons, New York, 1993.
Fletcher, R. Practical Methods of Optimization, Second Edition, John Wiley & Sons,
New York, 1987.
Nash, S.G. and A. Sofer, “A Barrier Method for Large-Scale Constrained Optimization,”
ORSA Journal on Computing, Vol. 5, No. 1, pp. 40-53, 1993.
Nash, S.G. and A. Sofer, Linear and Nonlinear Programming, McGraw Hill, New York,
1996.