Const Opt

Appendix A Page 1
Algorithms for Constrained Optimization

Methods for solving a constrained optimization problem in n variables and m constraints
can be divided roughly into four categories that depend on the dimension of the space in
which the accompanying algorithm works. Primal methods work in n – m space, penalty
methods work in n space, dual and cutting plane methods work in m space, and
Lagrangian methods work in n + m space. Each of these approaches are founded on
different aspects of NLP theory. Nevertheless, there are strong interconnections between
them, both in the final form of implementation and in performance. The rates of
convergence of most practical algorithms are determined by the structure of the Hessian
of the Lagrangian, much like the structure of the Hessian of the objective function
determines the rates of convergence for most unconstrained methods.
In this appendix, we present several procedures for solving problem (1). The first
is the now classical penalty approach developed by Fiacco and McCormick [1968] and is
perhaps the simplest to implement. The second is Zoutendijk's feasible direction method.
Other primal approaches discussed in the literature include the gradient projection
method and the generalized reduced gradient (GRG) method that is a simplex-like
algorithm. There are many commercial codes that implement these and related
techniques. Sequential linear programming and sequential quadratic programming (SQP),
for example, are two Lagrangian approaches that have proven to be quite effective. SQP
is highlighted at the end of this appendix.
A.1 Penalty and Barrier Methods

The methods that we describe presently, attempt to approximate a constrained
optimization problem with an unconstrained one and then apply standard search
techniques to obtain solutions. The approximation is accomplished in the case of penalty
methods by adding a term to the objective function that prescribes a high cost for
violation of the constraints. In the case of barrier methods, a term is added that favors
points in the interior of the feasible region over those near the boundary. For a problem
with n variables and m constraints, both approaches work directly in the n-dimensional
space of the variables. The discussion that follows emphasizes penalty methods
recognizing that barrier methods embody the same principles.
Consider the problem

Minimize{f(x) : x ∈ S} (23)
where f is continuous function on ℜn and S is a constraint set in ℜn. In

most applications S is defined explicitly by a number of functional
2 Algorithms for Constrained Optimization
constraints, but in this section the more general description in (23) can be
handled. The idea of a penalty function method is to replace problem (23)
by an unconstrained approximation of the form
Minimize{f(x) + cP(x)} (24)
where c is a positive constant and P is a function on ℜn satisfying (i) P(x)

is continuous, (ii) P(x) ≥ 0 for all x ∈ ℜn, and (iii) P(x) = 0 if and only if x
∈ S.
Example 16
Suppose S is defined by a number of inequality constraints: S = {x : gi(x)

≤ 0, i = 1,..., m}. A very useful penalty function in this case is
m
P(x) = 1
2 ∑ (max{0,g (x)}
i
2
(25)
i =1
which gives a quadratic augmented objective function denoted by

(c,x) ≡ f(x) + cP(x).
Here, each unsatisfied constraint influences x by assessing a penalty equal
to the square of the violation. These influences are summed and
multiplied by c, the penalty parameter. Of course, this influence is
counterbalanced by f(x). Therefore, if the magnitude of the penalty term is
small relative to the magnitude of f(x), minimization of (c,x) will almost
certainly not result in an x that would be feasible to the original problem.
However, if the value of c is made suitably large, the penalty term will
exact such a heavy cost for any constraint violation that the minimization
of the augmented objective function will yield a feasible solution.
The function cP(x) is illustrated in Fig. 19 for the one-dimensional
case with g1(x) = b – x and g2(x) = x – a. For large c it is clear that the
minimum point of problem (24) will be in a region where P is small. Thus
for increasing c it is expected that the corresponding solution points will
approach the feasible region S and, subject to being close, will minimize f.
Ideally then, as c → ∞ the solution point of the penalty problem will
converge to a solution of the constrained problem.
Penalty and Barrier Methods 3
cP(x)
c=1 c=1
c = 10 c = 10
c = 100 c = 100
a b x
Figure 19. Illustration of penalty function
Example 17
To clarify these ideas and to get some understanding on how to select the
penalty parameter c, let us consider the following problem.
Minimize f(x) = (x1 – 6)2 + (x2 – 7)2
subject to g1(x) = –3x1 – 2x2 + 6 ≤ 0
g2(x) = –x1 + x2 – 3 ≤ 0
g3(x) = x1 + x2 – 7 ≤ 0
2 4
g4(x) = 3 x1 – x2 – 3 ≤ 0
The feasible region is shown graphically in Fig. 20 along with several

isovalue contours for the objective function. The problem is a quadratic
program and the isovalue contours are concentric circles centered at (6,7),
the unconstrained minimum of f(x).
x2
5
0
0 1 2 3 4 5 6 x1
Figure 20. Feasible region for Example 17
Using the quadratic penalty function (25), the augmented objective

function is
(c,x) = (x1 – 6)2 + (x2 – 7)2 + c((max{0, –3x1 – 2x2 + 6})2
+ (max{0, –x1 + x2 – 3})2 + (max{0, x1 + x2 – 7})2

2 4
+ (max{0, 3 x1 – x2 – 3 })2).
The first step in the solution process is to select a starting point. A

good rule of thumb is to start at an infeasible point. By design then, we
will see that every trial point, except the last one, will be infeasible
(exterior to the feasible region).
A reasonable place to start is at the unconstrained minimum so we
set x0 = (6,7). Since only constraint 3 is violated at this point, we have
(c,x) = (x1 – 6)2 + (x2 – 7)2 + c(max{0, x1 + x2 – 7})2.
Assuming that in the neighborhood of x0 the “max” operator returns the

constraint, the gradient with respect to x is
 2x1 – 12 + 2c(x1 + x2 – 7) 
 
∇x (c,x) =  2x – 14 + 2c(x + x – 7)  .
 2 1 2 
Setting the elements of ∇x (c,x) to zero and solving yields the stationary
point
6(1 + c) 6c
x1*(c) = 1 + 2c and x2*(c) = 7 – 1 + 2c
as a function of c. For any positive value of c, (c,x) is a strictly convex

function (the Hessian of (c,x) is positive definite for all c > 0), so x1*(c)
and x2*(c) determine a global minimum. It turns out for this example that
the minima will continue to satisfy all but the third constraint for all
positive values of c. If we take the limit of x1*(c) and x2*(c) as c à ∞, we
obtain x1* = 3 and x2* = 4, the constrained global minimum for the original
problem.
Selecting the Penalty Parameter
Because the above approach seems to work so well, it is natural to

conjecture that all we have to do is set c to a very large number and then
optimize the resulting augmented objective function (c,x) to obtain the
solution to the original problem. Unfortunately, this conjecture is not
correct. First, “large” depends on the particular model. It is almost
always impossible to tell how large c must be to provide a solution to the
problem without creating numerical difficulties in the computations.
Second, in a very real sense, the problem is dynamically changing with the
relative position of the current value of x and the subset of the constraints
that are violated.
The third reason why the conjecture is not correct is associated
with the fact that large values of c create enormously steep valleys at the
constraint boundaries. Steep valleys will often present formidable if not
insurmountable convergence difficulties for all preferred search methods
unless the algorithm starts at a point extremely close to the minimum
being sought.
Fortunately, there is a direct and sound strategy that will overcome
each of the difficulties mentioned above. All that needs to be done is to
start with a relatively small value of c and an infeasible (exterior) point.
This will assure that no steep valleys are present in the initial optimization
of (c,x). Subsequently, we will solve a sequence of unconstrained
problems with monotonically increasing values of c chosen so that the
solution to each new problem is “close” to the previous one. This will
preclude any major difficulties in finding the minimum of (c,x) from one
iteration to the next.
Algorithm
To implement this strategy, let {ck}, k = 1,2,... be a sequence tending to

infinity such that ck > 0 and ck+1 > ck. Now for each k we solve the
problem
Minimize{ (ck,x) : x ∈ ℜn} (26)
to obtain xk, the optimum It is assumed that problem (26) has a solution
for all positive values of ck. This will be true, for example, if (c,x)
increases without bounds as ||x|| → ∞.
A simple implementation known as the sequential unconstrained
minimization technique (SUMT), is given below.
Initialization Step: Select a growth parameter > 1, a stopping

parameter > 0, and an initial value of the penalty parameter c0.
Choose a starting point x0 that violates at least one constraint and
formulate the augmented objective function (c0,x). Let k = 1.
Iterative Step: Starting from xk–1, use an unconstrained search

technique to find the point that minimizes (ck–1,x). Call it xk and
determine which constraints are violated at this point.
Stopping Rule: If the distance between xk–1 and xk is smaller than (i.e.,
||xk–1 – xk|| < ) or the difference between two successive objective
functions values is smaller than (i.e., |f(xk–1) – f(xk)| < ), stop with
xk an estimate of the optimal solution. Otherwise, put ck ← ck–1,
formulate the new (ck,x) based on which constraints are violated at
xk, put k ← k+1 and return to the iterative step.
Applying the algorithm to Example 17 with = 2 and c0 = 0.5 for

eight iterations yields the sequence of solutions given in Table A1. The
iterates xk are seen to approach the true minimum point x* = (3,4).
Table A1. Sequence of Solutions Using the Penalty Method
k c x1 x2 g3
0 –– 6.00 7.00 6.00

1 0.5 4.50 5.50 3.00
2 1 4.00 5.00 2.00
3 2 3.60 4.60 1.20
4 4 3.33 4.33 0.66
5 8 3.18 4.18 0.35
6 16 3.09 4.09 0.18
7 32 3.05 4.05 0.09
8 64 3.02 4.02 0.04
Implementation Issues
Much of the success of SUMT depends on the approach used to solve the
intermediate problems, which in turn depends on their complexity. One
thing that should be done prior to attempting to solve a nonlinear program
using a penalty function method is to scale the constraints so that the
penalty generated by each is about the same magnitude. This scaling
operation is intended to ensure that no subset of the constraints has an
undue influence on the search process. If some constraints are dominant,
the algorithm will steer towards a solution that satisfies those constraints
at the expense of searching for the minimum.
In a like manner, the initial value of the penalty parameter should
be fixed so that the magnitude of the penalty term is not much smaller than
the magnitude of objective function. If an imbalance exists, the influence
of the objective function could direct the algorithm to head towards an
unbounded minimum even in the presence of unsatisfied constraints. In
either case, convergence may be exceedingly slow.
Convergence
Although SUMT has an intuitive appeal, it is still necessary to prove that it
will converge to the optimum of the original problem (23). The following
lemma is the first component of the proof, and gives a set of inequalities
that follows directly from the definition of xk and the inequality ck+1 > ck.
Lemma 1:
(ck, xk) ≤ (ck+1, xk+1)
P(xk) ≥ P(xk+1)
f(xk) ≤ f(xk+1)
Proof:
(ck+1, xk+1) = f(xk+1) + ck+1P(xk+1) ≥ f(xk+1) + ckP(xk+1)
≥ f(xk) + ckP(xk) = (ck, xk),
which proves the first inequality. From this we also have
f(xk) + ckP(xk) ≤ f(xk+1) + ckP(xk+1)
f(xk+1) + ck+1P(xk+1) ≤ f(xk) + ck+1P(xk)
Adding these two expressions and rearranging terms yields (ck+1 –

ck)P(xk+1) ≤ (ck+1 – ck)P(xk) which proves the second inequality. Finally,
f(xk+1) + ckP(xk+1) ≥ f(xk) + ckP(xk) which, in conjunction with the second
inequality, leads to the third. ❚
Lemma 2: Let x* be a solution to problem (23). Then for each k

f(x*) ≥ (ck, xk) ≥ f(xk)
Proof: f(x*) = f(x*) + ckP(x*) ≥ f(xk) + ckP(xk) ≥ f(xk). ❚
Global convergence of the penalty method, or more precisely

verification that any limit point of the sequence is a solution, follows
easily from the two lemmas above.
Theorem 10: Let {xk} be a sequence generated by the penalty method,

where xk is the global minimum of (ck–1,x) at the iterative step of the
algorithm. Then any limit point of the sequence is a solution to (23).
Proof: see Luenberger [1984].
Barrier Function
It should be apparent that the quadratic penalty function in (25) is only one
of an endless set of possibilities. While the "sum of the squared
violations" is probably the best type of penalty function for an exterior
method, it is not at all suitable if one desires to conduct a search through a
sequence of feasible or interior points. Let us consider the following

augmented objective function
m –1
B(r,x) = f(x) + r ∑ g (x) (27)
i
i=1
where r > 0 is the barrier parameter. This function is valid only for
interior points such that all constraints are strictly satisfied: gi(x) < 0 for
all i.
Equation (27) indicates that the closer one gets to a constraint
boundary, the larger B(r,x) becomes. Indeed, points precisely on any
boundary are not defined. Hence B(r,x) is often called a barrier function
and, in a sense, is opposite to the kind of exterior penalty function
introduced in (25).
As discussed in Chapter 4, the basic idea of interior point methods
is to start with a feasible point and a relatively large value of the
parameter r. This will prevent the algorithm from approaching the
boundary of the feasible region. At each subsequent iteration, the value of
r is monotonically decreased in such a way that the resultant problem is
relatively easy to solve if the optimal solution of its immediate
predecessor is used as the starting point. Mathematically, the sequence of
solutions {xk} can be shown to converge to a local minimum in much the
same way that the exterior penalty method was shown to converge earlier.
Example 18
To clarify these ideas consider the problem

2
Minimize{f(x) = 2x1 + 9x2 subject to x1 + x2 ≥ 4}.
Using (27) the augmented objective function is
2  –1 
B(r,x) = 2x1 + 9x2 + r –x – x + 4.
 1 2 
with gradient
 4x – r(–x – x + 4)–2 
 1 1 2 
∇xB(r,x) =  –2  .
 9 – r(–x1 – x2 + 4) 
Setting the gradient vector to zero and solving for the stationary point
yields
x1(r) = 2.25 and x2(r) = 0.333 r + 1.75 for all r > 0.
In the limit as r approaches 0, these values become x1 = 2.25 and x2 = 1.75

with f(x) = 25.875 which is the optimal solution to the original problem.
Figure 21 depicts the feasible region and several isovalue contours of f(x).
Also shown is the locus of optimal solutions for B(r,x) starting with r0 =
20 and decreasing to 0.
r→0
Figure 21. Graphical illustration of the barrier search procedure
A Mixed Barrier-Penalty Function
Equality constraints, though not discussed up until now, can be handled

efficiently with a penalty function. Let us consider the following problem.
Minimize f(x)
subject to hi(x) = 0, i = 1,…, p
gi(x) ≤ 0, i = 1,…, m
The most common approach to implementing the sequential unconstrained
minimization technique for this model is to form the logarithmic-
quadratic loss function
m p
LQ(rk,x) = f(x) – rk ∑ 1
ln( – gi(x)) + r
k
∑ hi(x)2 .
i=1 i=1
The algorithm finds the unconstrained minimizer of LQ(rk,x) over the set
{x : gi(x) < 0, i = 1,…, m} for a sequence of scalar parameters {rk} strictly
decreasing to zero. All convergence, duality, and convexity results for

LQ(rk,x) are similar to those for the pure penalty and barrier functions.
Note that for a given r, the stationarity condition is
m p
r 2hi (x(r ))
∇f(x(r)) – ∑ ∇gi(x(r)) + ∑ ∇hi(x(r)) = 0
i =1 −gi (x(r)) i =1 r
Fiacco and McCormick were able to show that as r → 0, r / gi(x(r)) → *i ,
the optimal Lagrange multiplier for the ith inequality constraint, and
2hi(x(r))/r → *i , the optimal Lagrange multiplier for the ith equality
constraint. This result is suggested by the fractions in the above
summations.
Summary
Penalty and barrier methods are among the most powerful class of
algorithms available for attacking general nonlinear optimization
problems. This statement is supported by the fact that these techniques
will converge to at least a local minimum in most cases, regardless of the
convexity characteristics of the objective function and constraints. They
work well even in the presence of cusps and similar anomalies that can
stymie other approaches.
Of the two classes, the exterior methods must be considered
preferable. The primary reasons are as follows.
1. Interior methods cannot deal with equality constraints without
cumbersome modifications to the basic approach.
2. Interior methods demand a feasible starting point. Finding such a
point often presents formidable difficulties in and of itself.
3. Interior methods require that the search never leave the feasible
region. This significantly increases the computational effort
associated with the line search segment of the algorithm.
Although penalty and barrier methods met with great initial

success, their slow rates of convergence due to ill-conditioning of the
associated Hessian led researchers to pursue other approaches. With the
advent of interior point methods for linear programming, algorithm
designers have taken a fresh look at penalty methods and have been able
to achieve much greater efficiency than previously thought possible (e.g.,
see Nash and Sofer [1993]).
A.2 Primal Methods

In solving a nonlinear program, primal methods work on the original problem directly by
searching the feasible region for an optimal solution. Each point generated in the process
is feasible and the value of the objective function constantly decreases. These methods
have three significant advantages: (1) if they terminate before confirming optimality
(which is very often the case with all procedures), the current point is feasible; (2) if they
generate a convergent sequence, it can usually be shown that the limit point of that
sequence must be at least a local minimum; (3) they do not rely on special structure, such
as convexity, so they are quite general. Notable disadvantages are that they require a
phase 1 procedure to obtain an initial feasible point and that they are all plagued,
particularly when the problem constraints are nonlinear, with computational difficulties
arising from the need to remain within the feasible region from one iteration to the next.
The convergence rates of primal methods are competitive with those of other procedures,
and for problems with linear constraints, they are often among the most efficient.
Primal methods, often called feasible direction methods, embody the same
philosophy as the techniques of unconstrained minimization but are designed to deal with
inequality constraints. Briefly, the idea is to pick a starting point satisfying the
constraints and to find a direction such that (i) a small move in that direction remains
feasible, and (ii) the objective function improves. One then moves a finite distance in the
determined direction, obtaining a new and better point. The process is repeated until no
direction satisfying both (i) and (ii) can be found. In general, the terminal point is a
constrained local (but not necessarily global) minimum of the problem. A direction
satisfying both (i) and (ii) is called a usable feasible direction. There are many ways of
choosing such directions, hence many different primal methods. We now present a
popular one based on linear programming.
Zoutendijk's Method
Once again, we consider problem (23) with constraint set is S = {x : gi(x)

≤ 0, i = 1,…, m}. Assume that a starting point x0 ∈ S is available. The
problem is to choose a vector d whose direction is both usable and
feasible. Let gi(x0) = 0, i ∈ I, where the indices in I correspond to the
binding constraints at x0. For feasible direction d, a small move along this
vector beginning at the point x0 makes no binding constraints negative,
i.e.,
d 

dt gi(x +td)t=0 = ∇gi(x ) d ≤ 0, i∈I
0 0 T
For a minimization objective, a usable feasible vector has the additional

property that
Primal Methods 13
d 
0+td)
 = ∇f(x ) d < 0
0 T
dt f(x
t=0
Therefore, the function initially decreases along the vector. In searching

for a "best" vector d along which to move, one could choose that feasible
vector minimizing ∇f(x0)Td. If some of the binding constraints were
nonlinear, however, this could lead to certain difficulties. In particular,
starting at x0 the feasible direction d0 that minimizes ∇f(x0)Td is the
projection of –∇f(x0) onto the tangent plane generated by the binding
constraints at x0. Because the constraint surface is curved, movement
along d0 for any finite distance violates the constraint. Thus a recovery
move must be made to return to the feasible region. Repetitions of the
procedure lead to inefficient zigzagging. As a consequence, when looking
for a locally best direction it is wise to choose one that, in addition to
decreasing f, also moves away from the boundaries of the nonlinear
constraints. The expectation is that this will avoid zigzagging. Such a
direction is the solution of the following problem.
Minimize (28a)
subject to ∇gi(x0)Td – i ≤ 0, i ∈ I (28b)
∇f(x0)Td – ≤ 0 (28c)
d Td = 1 (28d)
where 0 ≤ i ≤ 1 is selected by the user. If all i = 1, then any vector (d, )

satisfying (28b) - (28c) with < 0 is a usable feasible direction. That with
minimum value is a best direction which simultaneously makes ∇f(x0)Td
and ∇gi(x0)Td as negative as possible; i.e., steers away from the nonlinear
constraint boundaries. Other values of i enable one to emphasize certain
constraint boundaries relative to others. Equation (28d) is a normalization
requirement ensuring that is finite. If it were not included and a vector
(d, ) existed satisfying Eqs. (28b) - (28c) with negative, then could be
made to approach –∞, since (28b) - (28c) are not homogeneous. Other
normalizations, such as |dj| ≤ 1 for all j, are also possible.
Because the vectors ∇f and ∇gi are evaluated at a fixed point x0,
the above direction-finding problem is almost linear, the only nonlinearity
being (28d). Zoutendijk showed that this constraint can be handled by a
modified version of the simplex method so problem (28) may be solved
with reasonable efficiency. Note that if some of the constraints in the
original nonlinear program (1) were given as equalities, the algorithm
would have to be modified slightly.
Of course, once a direction has been determined, the step size must
still be found. This problem may be dealt with in almost the same manner
as in the unconstrained case. It is still desirable to minimize the objective
function along the vector d, but now no constraint may be violated. Thus t
is determined to minimize f(xk + tdk) subject to the constraint xk + tdk ∈ S.
Any of the techniques discussed in Section 10.6 can be used. A new point
is thus determined and the direction-finding problem is re-solved. If at
some point the minimum ≥ 0, then there is no feasible direction
satisfying ∇f(x0)Td < 0 and the procedure terminates. The final point will
generally be a local minimum of the problem. Zoutendijk showed that for
convex programs the procedure converges to the global minimum.
Sequential Quadratic Programming 15
A.3 Sequential Quadratic Programming

Successive linear programming (SLP) methods solve a sequence of linear approximations
to the original nonlinear program. In this respect they are similar to Zoutendijk's method
but they do not require that feasibility be maintained at each iteration. Recall that if f(x)
is a nonlinear function and xc is the current value for x, then the first order Taylor series
expansion of f(x) around xc is
f(x) = f(xc + ∆x) ≅ f(xc) + ∇f(xc)(∆x) (29)
where ∆x is the direction of movement. Given initial values for the variables, in SLP all
nonlinear functions are replaced by their linear approximations as in Eq. (29). The
variables in the resulting LP are the ∆xj's representing changes from the current values. It
is common to place upper and lower bounds on each ∆xj, given that the linear
approximation is reasonably accurate only in some neighborhood of the initial point.
The resulting linear program is solved and if the new point provides an
improvement it becomes the incumbent and the process is repeated. If the new point does
not yield an improvement, the step bounds may need to be reduced or we may be close
enough to an optimum to stop. Successive points generated by this procedure need not be
feasible even if the initial point is. However, the amount of infeasibility generally is
reduced as the iterations proceed.
Successive quadratic programming (SQP) methods solve a sequence of quadratic
programming approximations to the original nonlinear program (Fan et al. [1988]). By
definition, QPs have a quadratic objective function, linear constraints, and bounds on the
variables. A number of efficient procedures are available for solving them. As in SLP,
the linear constraints are first order approximations of the actual constraints about the
current point. The quadratic objective function used, however, is not just the second
order Taylor series approximation to the original objective function but a variation based
on the Lagrangian.
We will derive the procedure for the equality constrained version of a

nonlinear program.
Minimize f(x) (30a)
subject to hi(x) = 0, i = 1,...,m (30b)
The Lagrangian for this problem is L(x, ) = f(x) – Th(x). Recall that
first order necessary conditions for the point xc to be a local minimum of
Problem (30) are that there exist Lagrange multiplies c such that
∇Lx(xc, c) = ∇f(xc) – ( c)T∇h(xc) = 0 (31)
and h(xc) = 0
Applying Newton's method to solve the system of equations (31), requires

their linearization at the current point yielding the linear system
 ∇L2(xc, c) –∇2h(xc)   ∇L (xc, c)
 x  ∆x  x 
  ∆  =   (32)
 –∇2h(xc)T 0   h(xc) 
It is easy to show that if (∆x, ∆ )) satisfies Eq. (32) then (∆x, c + ∆ ))

will satisfy the necessary conditions for the optimality of the following
QP.
1
Minimize f(xc)(∆x) + 2 ∆xT∇L2x(xc, c)∆x (33a)
subject to h(xc) + ∇h(xc)∆x = 0 (33b)
On the other hand, if ∆x* = 0 is the solution to this problem, we can show
that xc satisfies the necessary conditions (31) for a local minimum of the
original problem. First, since ∆x* = 0, h(xc) = 0 and xc is feasible to
Problem (30). Now, because ∆x* solves Problem (33), there exists a *
such that the gradient of the Lagrangian function for (33) evaluated at ∆x*
= 0 is also equal to zero; i.e., ∇f(xc) – ( *)T∇h(xc) = 0. The Lagrange
multipliers * can serve as the Lagrange multipliers for the original
problem and hence the necessary conditions (31) are satisfied by (xc, *).
The extension to inequality constraints is straightforward; they are
linearized and included in the Lagrangian when computing the Hessian
matrix, L, of the Lagrangian. Linear constraints and variable bounds
contained in the original problem are included directly in the constraint
region of Eq. (33b). Of course, the matrix L need not be positive definite,
even at the optimal solution of the NLP, so the QP may not have a
minimum. Fortunately, positive definite approximations of L can be used
so the QP will have an optimal solution if it is feasible. Such
approximations can be obtained by a slight modification of the popular
BFGS updating formula used in unconstrained minimization. This
formula requires only the gradient of the Lagrangian function so second
derivatives of the problem functions need not be computed.
Because the QP can be derived from Newton’s method applied to
the necessary conditions for the optimum of the NLP, if one simply
accepts the solution of the QP as defining the next point, the algorithm
behaves like Newton’s method; i.e., it converges rapidly near an optimum
but may not converge from a poor initial point. If ∆x is viewed as a search
direction, the convergence properties can be improved. However, since
both objective function improvement and reduction of the constraint
infeasibilities need to be taken into accounted, the function to be
minimized in the line search process must incorporate both. Two
possibilities that have been suggested are the exact penalty function and
Sequential Quadratic Programming 17
the Lagrangian function. The Lagrangian is suitable for the following

reasons:
1. On the tangent plane to the active constraints, it has a minimum at
the optimal solution to the NLP.
2. It initially decreases along the direction ∆x.
If the penalty weight is large enough, the exact penalty function also has
property (2) and is minimized at the optimal solution of the NLP.
Relative advantages and disadvantages:
Table A2, taken from Lasdon et al. [1996], summarizes the relative merits
of SLP, SQP, and GRG algorithms, focusing on their application to
problems with many nonlinear equality constraints. One feature appears
as both an advantage and a disadvantage –– whether or not the algorithm
can violate the nonlinear constraints of the problem by relatively large
amounts during the solution process.
SLP and SQP usually generate points yielding large violations of
the constraints. This can cause difficulties, especially in models with log
or fractional power expressions, since negative arguments for these
functions are possible. Such problems have been documented in reference
to complex chemical process examples in which SLP and some exterior
penalty-type algorithms failed, whereas an implementation of the GRG
method succeeded and was quite efficient. On the other hand, algorithms
that do not attempt to satisfy the equalities at each step can be faster than
those that do. The fact that SLP and SQP satisfy all linear constraints at
each iteration should ease the aforementioned difficulties but do not
eliminate them.
There are situations in which the optimization process must be
interrupted before the algorithm has reached optimality and the current
point must be used or discarded. Such cases are common in on-line
process control where temporal constraints force immediate decisions. In
these situations, maintaining feasibility during the optimization process
may be a requirement for the optimizer inasmuch as constraint violations
make a solution unusable. Clearly, all three algorithms have advantages
that will dictate their use in certain situations. For large problems, SLP
software is used most widely because it is relatively easy to implement
given a good LP system. Nevertheless, large-scale versions of GRG and
SQP have become increasingly popular.
Table A2. Relative Merits of SLP, SQP, and GRG Algorithms
Algorithm Relative advantages Relative disadvantage
SLP • Easy to implement • May converge slowly on

• Widely used in practice problems with nonvertex
• Rapid convergence when optima
optimum is at a vertex • Will usually violate nonlinear
• Can handle very large constraints until convergence,
problems often by large amounts
• Does not attempt to satisfy
equalities at each iteration
• Can benefit from
improvements in LP solvers
SQP • Usually requires fewest • Will usually violate nonlinear

functions and gradient constraints until convergence,
evaluations of all three often by large amounts
algorithms (by far) • Harder than SLP to
• Does not attempt to satisfy implement
equalities at each iteration • Requires a good QP solver
GRG • Probably most robust of all • Hardest to implement

three methods • Needs to satisfy equalities at
• Versatile--especially good each step of the algorithm
for unconstrained or linearly
constrained problems but
also works well for nonlinear
constraints
• Can utilize existing process
simulators employing
Newton’s method
• Once it reaches a feasible
solution it remains feasible
and then can be stopped at
any stage with an improved
solution
Exercises 19
A.3 Exercises
33. Solve the problem given below with an exterior penalty function method, and then
repeat the calculations using a barrier function method.
2 2
Minimize x1 + 4x2 – 8x1 – 16x2
subject to x1 + x2 ≤ 5
0 ≤ x1 ≤ 3, x2 ≥ 0
34. Perform 5 iterations of the sequential unconstrained minimization technique using

the logarithmic-quadratic loss function on the problem below. Let x0 = (0, 0), r0 = 2
and put rk+1 ← rk/2 after each iteration.
2 2
Minimize x1 + 2x2
subject to 4x1 + x2 ≤ 6
x1 + x2 = 3
x1 ≥ 0, x2 ≥ 0
35. Repeat the preceding exercise using Zoutendijk’s procedure. Use the normalization
–1 ≤ dj ≤ 1, j = 1, 2, to permit solution by linear programming.
36. Consider the following separable nonlinear program.
2
Minimize 5x1 – 10x1 – 10x2 log10 x2
2 2
subject to x1 + 2x2 ≤ 4, x1 ≥ 0, x2 ≥ 0
a. Approximate the separable functions with piecewise linear functions and solve
the resultant model using linear programming. If necessary assume 0log 10 0 = 0.
b. Solve the original problem using an penalty function approach.
c. Perform at least 4 iterations of Zoutendijk’s procedure.
37. Solve the following problem using Zoutendijk’s procedure. Start with x0 = (0, 3/4).
2 2
Minimize 2x1 + 2x2 – 2x1x2 – 4x1 – 6x2
subject to x1 + 5x2 ≤ 5
2
2x1 + x2 ≤ 0
x1 ≥ 0, x2 ≥ 0
38. Solve the relaxation of the redundancy problem when the budget for components is
$500 (the value of C). Use the data in the table below.
n
Maximize ∏1 – (1 – rj)1+xj
j=1
n
subject to ∑c x j j ≤ C, xj ≥ 0, j = 1,…, n
j =1
Item, j 1 2 3 4
Reliability, rj 0.9 0.8 0.95 0.75
Cost per item, cj 100 50 40 200
39. Consider the following quadratic programming.
2 2 2
Minimize f(x) = 2x1 + 20x2 + 43x3 + 12x1x2 – 16x1x3
– 56x2x3 + 8x1 + 20x2 + 6x3
subject to 3x1 + 2x2 + 5x3 ≤ 35
x1 + 2x2 + 3x3 ≥ 5
–x1 + 2x2 – 5x3 ≤ 3
5x1 – 3x2 + 2x3 ≤ 30
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0
a. Write out the KKT conditions then set up the appropriate linear programming
model and solve with a restricted basis entry rule. Is the solution a global
optimum? Explain.
b. Use an NLP code to find the optimum.

Exercises 21
40. Use an NLP code to solve the problem in the preceding exercise but this time
maximize rather than minimize. Because the maximization problem may have local
solutions, try different starting points. Confirm that you have found the global
maximum by solving the problem by hand.
Bibliography
Bazaraa, M.S., H.D. Sherali and C. M. Shetty, Nonlinear Programming: Theory and
Algorithms, Second Edition, John Wiley & Sons, New York, 1993.
Fan, Y, S. Sarkar and L. Lasdon, "Experiments with Successive Quadratic Programming

Algorithms," Journal of Optimization Theory and Applications, Vol. 56, No. 3, pp. 359-
383, 1988.
Fiacco, A.V. and G.P. McCormick, Nonlinear Programming: Sequential Unconstrained

Minimization Techniques, John Wiley & Sons, New York, 1968.
Fletcher, R. Practical Methods of Optimization, Second Edition, John Wiley & Sons,
New York, 1987.
R. Horst and H. Tuy, Global Optimization: Deterministic Approaches, Third Edition,

Springer-Verlag, Berlin, 1995.
Lasdon, L., J. Plummer and A. Warren, "Nonlinear Programming," in M. Avriel and B.

Golany (eds.), Mathematical Programming for Industrial Engineers, Chapter 6, pp. 385-
485, Marcel Dekker, New York, 1996.
Luenberger, D.G., Linear and Nonlinear Programming, Second Edition, Addison

Wesley, Reading, MA, 1984.
Nash, S.G. and A. Sofer, “A Barrier Method for Large-Scale Constrained Optimization,”
ORSA Journal on Computing, Vol. 5, No. 1, pp. 40-53, 1993.
Nash, S.G. and A. Sofer, Linear and Nonlinear Programming, McGraw Hill, New York,
1996.
Zoutendijk, G., Methods of Feasible Directions, Elsevier, Amsterdam, 1960.

Const Opt

Uploaded by

Copyright:

Available Formats

You might also like

Const Opt

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Const Opt

Uploaded by

Copyright:

Available Formats

Appendix A Page 1

Algorithms for Constrained Optimization

A.1 Penalty and Barrier Methods

Consider the problem

where f is continuous function on ℜn and S is a constraint set in ℜn. In

where c is a positive constant and P is a function on ℜn satisfying (i) P(x)

Suppose S is defined by a number of inequality constraints: S = {x : gi(x)

which gives a quadratic augmented objective function denoted by

Figure 19. Illustration of penalty function

The feasible region is shown graphically in Fig. 20 along with several

Figure 20. Feasible region for Example 17

Using the quadratic penalty function (25), the augmented objective

+ (max{0, –x1 + x2 – 3})2 + (max{0, x1 + x2 – 7})2

The first step in the solution process is to select a starting point. A

Assuming that in the neighborhood of x0 the “max” operator returns the

as a function of c. For any positive value of c, (c,x) is a strictly convex

Selecting the Penalty Parameter

Because the above approach seems to work so well, it is natural to

To implement this strategy, let {ck}, k = 1,2,... be a sequence tending to

Minimize{ (ck,x) : x ∈ ℜn} (26)

Initialization Step: Select a growth parameter > 1, a stopping

Iterative Step: Starting from xk–1, use an unconstrained search

Applying the algorithm to Example 17 with = 2 and c0 = 0.5 for

Table A1. Sequence of Solutions Using the Penalty Method

0 –– 6.00 7.00 6.00

≥ f(xk) + ckP(xk) = (ck, xk),

which proves the first inequality. From this we also have

f(xk) + ckP(xk) ≤ f(xk+1) + ckP(xk+1)

f(xk+1) + ck+1P(xk+1) ≤ f(xk) + ck+1P(xk)

Adding these two expressions and rearranging terms yields (ck+1 –

Lemma 2: Let x* be a solution to problem (23). Then for each k

Proof: f(x*) = f(x*) + ckP(x*) ≥ f(xk) + ckP(xk) ≥ f(xk). ❚

Global convergence of the penalty method, or more precisely

Theorem 10: Let {xk} be a sequence generated by the penalty method,

sequence of feasible or interior points. Let us consider the following

To clarify these ideas consider the problem

x1(r) = 2.25 and x2(r) = 0.333 r + 1.75 for all r > 0.

In the limit as r approaches 0, these values become x1 = 2.25 and x2 = 1.75

Figure 21. Graphical illustration of the barrier search procedure

A Mixed Barrier-Penalty Function

Equality constraints, though not discussed up until now, can be handled

decreasing to zero. All convergence, duality, and convexity results for

Although penalty and barrier methods met with great initial

A.2 Primal Methods

Once again, we consider problem (23) with constraint set is S = {x : gi(x)

For a minimization objective, a usable feasible vector has the additional

Therefore, the function initially decreases along the vector. In searching

where 0 ≤ i ≤ 1 is selected by the user. If all i = 1, then any vector (d, )

A.3 Sequential Quadratic Programming

We will derive the procedure for the equality constrained version of a

Applying Newton's method to solve the system of equations (31), requires

It is easy to show that if (∆x, ∆ )) satisfies Eq. (32) then (∆x, c + ∆ ))

subject to h(xc) + ∇h(xc)∆x = 0 (33b)

the Lagrangian function. The Lagrangian is suitable for the following

Relative advantages and disadvantages:

Table A2. Relative Merits of SLP, SQP, and GRG Algorithms

Algorithm Relative advantages Relative disadvantage

Proof: f(x) = f(x) + ckP(x*) ≥ f(xk) + ckP(xk) ≥ f(xk). ❚