Aiaa 00 4891

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

AIAA-2000-4891

A SURROGATE-MODEL-BASED METHOD FOR CONSTRAINED


OPTIMIZATION

Charles Audet, J. E. Dennis, Jr., Douglas W. Moore

Department of Computational & Applied Mathematics,


Rice University, 6100 Main St., Houston, TX 77005.

Andrew Booker, Paul D. Frank

Mathematics & Engineering Analysis,


Boeing Phantom Works,
Mathematics and Computing Technology,
Box 3707, M/S 7L-68, Seattle, WA 98124.

Abstract 1 Introduction

This paper describes an algorithm and provides test This paper describes an optimization method for
results for surrogate-model-based optimization. In the case where the objective and constraint func-
this type of optimization, the objective and constraint tions are represented by inexpensive-to-evaluate “sur-
functions are represented by global “surrogates”, i.e. rogate” models of a “true” simulation code. The
response models, of the “true” problem responses. In use of response models is particularly attractive when
general, guarantees of global optimality are not pos- the true responses come from compute-intensive com-
sible. However, a robust surrogate-model-based op- puter simulations. Examples include 3D fluid dynam-
timization method is presented here that has good ics analysis and structural analysis via the finite ele-
global search properties, and proven local convergence ment method. The use of inexpensive-to-evaluate sur-
results. This paper describes methods for handling rogate models has obvious potential for speeding the
three key issues in surrogate-model-based optimiza- optimization process. The surrogate models also have
tion. These issues are maintaining a balance of effort the benefits of allowing optimization of problems with
between global design space exploration and local op- nonsmooth or noisy responses, and providing insight
timizer region refinement, maintaining good surrogate into the nature of the design space [4][5][10].
model conditioning as points “pile up” in local regions,
Because response models are inexpensive-to-
and providing a provably convergent method for en-
evaluate surrogates for expensive analysis codes, they
suring local optimality.
are often used to represent disciplines in multidisci-
Acknowledgments: Work of the first author was plinary design optimization (MDO) problems (see e.g.
supported by NSERC (Natural Sciences and Engi- [7][12][2][26]) and competing objectives in multiobjec-
neering Research Council) fellowship PDF-207432- tive optimization.
1998, and the first three authors was supported
There are many types of surrogate models to
by DOE DE-FG03-95ER25257, AFOSR F49620-98-1-
choose from. Among these are neural nets, well-
0267, The Boeing Company, Sandia LG-4253, Exxon-
conditioned low-degree polynomials [11], and Kriging
Mobil and CRPC CCR-9120008.
models [9][8][23][24]. As will be discussed later, the
Copyright 2000
c by the American Institute of Aeronautics method used here for selecting points for model refine-
and Astronautics, Inc. All rights reserved. ment requires estimates of uncertainty in model value
1
American Institute of Aeronautics and Astronautics
Initial Iterate X0
predictions at arbitrary points in design space. These
estimates can readily be obtained for Kriging models.
This is one reason that Kriging models are used in the Search or Poll?

implementation described in this paper. However, it


SEARCH
should be noted that except for the details of model "Global"
Optimization
updating and selection of points for model refinement, of Response-Model
Problem
the overall optimization method is independent of the POLL

type of surrogate model used. Determine model


improvement
Points

A mathematical statement of the optimization Poll


Search Point(s)
problem is: Points
Decision on
whether to
Evaluate the search or poll.

min f (x) "True" Simulation Iterate count "k",


x∈<n Code at the new and current
points iterate Xk.

s.t. C(x) ≤ 0 (1)


New points and Update size of
reponses at these points poll grid

x`b ≤ x ≤ xub
CONTROLLER:
Update Filter-Based,
where f : < → < and C : < → < are continuously
n n m Models
with new
Derivative-Free
Optimization
differentiable functions with C = (c1 , . . . , cm )T . The data Algorithm

vectors x`b and xub in <n are the lower and upper
bounds on the variables. The inequalities in (1) are Figure 1: The optimization algorithm
completely general because a bound other than zero
can be handled by redefining the appropriate com-
ponent function ci (x). Lower-bounded inequalities tion 2. Much of this optimization method is encom-
can be handled by negating the corresponding ci (x). passed in the Figure 1 box labeled “controller”.
Equality constraints can be specified as a pair consist- The basic optimization method is an iterative pro-
ing of an upper-bounded and lower-bounded inequal- cess. At each iteration, the method identifies points
ity with the same constraint functions and bounds. at which more “true” simulation data is needed. The
However, since this paper deals with modeling of ex- data are obtained and surrogate models are updated
pensive simulations, it is assumed that there will be using the new data. The optimization controller uses
a fairly loose feasibility tolerance for equality con- the new data to decide how the optimization process
straints. Thus, for ease of exposition and for discus- will proceed. One such decision is whether to be in
sion of convergence, equalities will be considered to be the local, lattice-based, poll mode or the more global
inequality constraints with upper and lower bounds search mode. The modes are discussed next.
that happen to be “somewhat” close to each other.
The search mode selects new points for running
For ease of exposition, it is assumed in (1) that f (x) the simulation code that are either “global optimiz-
and the constraint functions ci (x) are responses from ers” for the surrogate model problem or are points for
a “true” simulation code, each of which is represented model improvement. The search for “global optimiz-
by a single surrogate model. In actuality, the opti- ers” may consist of simply running a local optimiza-
mization method presented here is valid if f (x) or tion code from several (say 10) randomly selected start
component functions of C(x) are represented by ar- points in the design space. The model-improvement
bitrary functions of surrogate model values. For ex- points are not global optimizers of the response-model
ample, the surrogate objective can be a weighted sum problem, but lie in regions of possibly large model er-
of several response models. In addition, the method ror, where a global optimum is not unlikely. Thus, it
can handle the case where f (x) or some components is worth obtaining new “true” data at these points.
of C(x) are “truth” values, i.e. are not modeled. For Since these points tend to be far from existing data,
example, this is the case when some constraints are they represent the most global aspect of the optimiza-
simple analytic functions of the variables, such as con- tion method. The constrained, balanced, local-global
straints on geometry. search (CBLGS) method presented in Section 4 de-
Figure 1 is a high-level view of the optimization scribes selection of these model-improvement points.
algorithm. The method shown in Figure 1 is a The poll mode is a local, lattice-based, search. The
particular example of the very general filter-based, simplest poll pattern is the collection of unit steps in
derivative-free optimization method described in Sec- the 2n positive and negative coordinate directions on
2
American Institute of Aeronautics and Astronautics
an n–dimensional grid. However, it has been shown programming to suggest steps. Previous filter algo-
[20] that polling can be done along any set of vectors rithms thus require explicit use of the derivatives of
that form a positive spanning set for n–dimensional both the objective and the constraints. Our method
space. A positive spanning set is any set of vectors does not.
that spans n−dimensional space when summed with
nonnegative coefficients. We present a pattern search version of the filter
approach that converges using any user-defined finite
A key item in Figure 1 is the “Update Models with search procedure, here based on surrogates, to sug-
new data” process. A common problem in surrogate- gest steps, and which falls back on a special poll step
model-based optimization is the degradation in condi- when the user defined procedure stalls, i.e., when the
tioning as the models are updated to incorporate new search is unsuccessful. Elsewhere [1] we show that
data. This is because the optimization process tends our filter-pattern search class of algorithms guarantees
to “cluster” points in design space. Section 3 dis- subsequence convergence to points that satisfy appro-
cusses Kriging models and presents a new method for priate optimality conditions depending on the local
maintaining well-conditioned models during the opti- smoothness of the problem functions. Convergence is
mization process. assured without assuming even finite values or conti-
A summary of the content of this paper is as follows. nuity.
The “filter-based” optimization method, and conver- The nonnegative constraint violation function sat-
gence results for this method, are given in Section 2. isfies h(x) = 0 if and only if x is feasible (and thus
Section 3 provides a description of Kriging models and h(x) > 0 if and only if x is infeasible). We assume that
presents a new method for maintaining model condi- h satisfies a weak monotonicity property that if C(x)+
tioning. The CBLGS algorithm for selecting model- is the vector whose j th component is max(0, cj (x)),
improvement points is given in Section 4. Testing of then the component-wise inequality C(x)+ ≤ C(y)+
the optimization method is summarized in Section 5. implies h(x) ≤ h(y). For example, we could set
Conclusions and future work are discussed in Section h(x) = kC(x)+ k where k · k is a vector norm. Our
6. best theoretical results are for h(x) = kC(x)+ k22 .
A filter F is a finite set containing the feasible point
2 Filter-Based Derivative-Free
with best known objective, and infeasible points in
Optimization Algorithms <n such for no pair of infeasible points (x, x0 ) in the
filter does one point have both a better objective and
2.1 Introduction constraint violation function value than the other. An
infeasible point x0 is said to be filtered if either hmax ≤
Torczon [27] presents a flexible class of generalized h(x0 ) for some user-supplied upper bound hmax , or if
pattern search (GPS) algorithms for derivative-free for some infeasible x (x0 6= x) belonging to the filter,
unconstrained optimization. Here, our method (thor- h(x0 ) ≥ h(x) and f (x0 ) ≥ f (x). A feasible point is
oughly presented in [1]) extends the GPS algorithm filtered if its objective function value is greater than
to general nonlinear programming problems by com- or equal to the feasible incumbent value f F (i.e., the
bining it with filter algorithms (described below) in least function value found so far at a feasible solution).
a way that reduces transparently to GPS for uncon- It is unfiltered otherwise. Of course, the first feasible
strained problems. We do this without penalty con- point encountered (if any) is automatically the initial
stants or Lagrange multiplier estimates (as required feasible incumbent solution.
in [21]) by formulating a step acceptance rule based
on filter methods. The set of filtered points F is denoted in standard
notation as:
Filter algorithms were introduced by Fletcher and
Leyffer [13] as a way to globalize SQP and SLP with-  
[  h(x ) ≥ h(x), 
0
out using a merit function that would require a trou-
F = x0 ∈ <n \ F :
blesome penalty or barrier parameter to be estimated.  f (x0 ) ≥ f (x) 
x∈F
A step is accepted if it either reduces the objective [
function or the constraint violation. Fletcher, Leyffer {x0 ∈ <n : h(x0 ) ≥ hmax }
and Toint [14] show convergence of the method that [
uses sequential linear programming to suggest steps. {x0 ∈ <n : h(x0 ) = 0, f (x0 ) > f F }.
Fletcher, Gould, Leyffer and Toint [15] show conver-
gence of the method that uses sequential quadratic
3
American Institute of Aeronautics and Astronautics
f 6 Feasible region: Ω = {x ∈ <n : h(x) = 0} tion, we may wish to poll around one of the nearly
Trial set: Tk feasible points, which might have a lower objective
fkF r function value and thus lead us to explore a different
Filtered points: F k
r Fk part of the feasible region Ω. This option allows the
(hIk ,fkI ) algorithm to avoid stalling (when infinitely many con-
Successful iterations: Tk \ F k 6= ∅ secutive unsuccessful iterations are generated) in the
Unsuccessful iterations: Tk ⊂ F k
Lewis and Torczon [19] example.
The most useful successful iterations are those that
produce a feasible iterate xk+1 which improves the fea-
- F
hmax h sible incumbent value to fk+1 = f (xk+1 ). Next, are
the successful iterations that do not produce a feasible
iterate, but improve the most nearly feasible incum-
bent solution: hIk+1 = h(xk+1 ) and fk+1 I
= f (xk+1 ).
Figure 2: The three possible types of iterations.
Finally, there are the other successful iterations. They
F
leave the incumbents unchanged fk+1 = fkF , hIk+1 =
I I I
2.2 Algorithm hk and fk+1 = fk , but add some elements to the filter.
The unsuccessful iterations are such that all points
Unsuccessful iterations are those where all trial
in the trial set are filtered. Regardless of the feasi-
points are filtered. Successful iterations are those
bility of xk , if the iteration is unsuccessful, the next
where an unfiltered trial point is found. The search
iterate xk+1 is set to xk , and the mesh size parameter
or poll step may be terminated without any more
is decreased: ∆k+1 < ∆k .
function or constraint evaluations if such a point is
found. The mesh size parameter is either increased Our algorithm for constrained optimization is
or kept constant in successful iterations, and it is de- stated formally below. We allow for the fact that in
creased in unsuccessful ones. some applications, a set of initial solutions may be
used to seed the filter. Without any loss of general-
We define two types of incumbent solutions: the
ity we assume that any such points are on the initial
feasible ones, and the most nearly feasible ones (see
mesh and that they have been “filtered” to be consis-
Figure 2). Let fkF represent the feasible incumbent
tent with our initialization step in the sense that x0
value, i.e., the smallest objective function value (for
will not be filtered by the other seed points. An easy
feasible points) found by the algorithm up to iteration
way to assure this would be to take x0 to be the most
k. Let hIk > 0 be the least positive constraint violation
feasible seed point, breaking ties by taking one with
value found up to iteration k, and let fkI denote the
the smallest objective function value.
smallest objective function value of the points found
whose constraint violation function value are equal to A standard trick in engineering design, when seed
hIk . The superscript F stands for feasible and I for designs are available, is to go further and make linear
infeasible. combinations of the seed points be the new design
The poll set for any iteration is the set of mesh space. This certainly makes sense, can lead to a big
neighbors of pk . The poll center pk is either chosen in reduction in the dimension of the design space, and
the set of feasible incumbent solutions, or it must be makes the initial mesh be defined by the seed points
one of the most nearly feasible incumbent solutions. in a simple way.
Generally, there will be a single value in each of the It is implicit in all we have said that our initial solu-
sets of incumbents. tion must have finite values for the objective and con-
The poll center pk either satisfies (h(pk ), f (pk )) = straint violations, and this will be true of all the iter-
(0, fkF ), or, (h(pk ), f (pk )) = (hIk , fkI ). Our class of ates generated by the algorithm. In [1], we show how
algorithms and their analyses are completely flexible to incorporate a finite number of linear constraints
about the choice between these two alternative poll by setting the objective value to infinity points that
centers pk at a given iteration. It is up to the user to violate them while using the filter for more general
select the strategy defining the poll center. For exam- constraints.
ple, it might be a good strategy to alternate choosing Algorithm
the poll centers in the set of feasible and infeasible
incumbents until an unsuccessful iteration occurs. • Initialization:
Even if we already have a feasible incumbent solu- Let x0 be one of a set of initial points in the filter
4
American Institute of Aeronautics and Astronautics
F0 . Fix hmax > h(x0 ) and ∆0 > 0 and set the to the differentiability of the more common choice
iteration counter k to 0. h1 (x) = kC(x)+ k1 . The answer is that this choice is
rarely strictly differentiable at x̂, and a proof requires
• Definition of incumbent solutions: a very technical lemma.
Define (if possible)
fkF : the smallest objective function value for all We now summarize some results for the objective
feasible solutions found so far; function at a limit point.
hIk > 0: the least positive constraint violation
function value found so far; Theorem 2.2 Let x̂ be an accumulation point of any
fkI : the smallest objective function value of the refining subsequence. If x̂ is strictly feasible and if f
points found so far whose constraint violation is strictly differentiable at x̂, then ∇f (x̂) = 0.
function value are equal to hIk .
This says that if we converge to a point in the in-
• Search and poll steps: terior of the feasible region, then the KKT first order
Perform the search and possibly the poll steps necessary conditions hold. Wherever it is located, x̂
(or only part of the steps) until an unfiltered trial is a KKT point for a related problem. We spare the
point xk+1 is found, or when it is shown that all reader the technical result (rigorously shown in [1])
trial points are filtered. that for general constraints, we can construct a cone
that contains NΩ (x̂) as well as −∇f (x̂). In the case
– Search step:
that the constraints are sufficiently simple (for exam-
Evaluate the functions h and f on a set of
ple, a finite number of linear constraints), then one
trial points on the current mesh Mk (the
can arrange the polling directions so that this related
strategy that gives the set of points is usu-
problem is the actual problem. Of course, in theory
ally provided by the user).
it is always possible to include the right polling direc-
– Poll step: tions so that this happens.
Evaluate the functions h and f on the
By using a filter-based step acceptance criterion, we
poll set around pk , where pk satisfies either
have overcome a difficulty in applying pattern search
(h(pk ), f (pk )) = (0, fkF ) or (h(pk ), f (pk )) =
algorithms to constrained optimization; specifically,
(hIk , fkI ).
the problem that the objective function descent direc-
• Parameter update: tions in the positive spanning directions may be infea-
If the search or the poll step produced an un- sible. Lewis and Torczon [19] give an example where a
filtered iterate xk+1 ∈ Fk+1 , then declare the it- nonfilter-based version of the pattern search algorithm
eration successful and update ∆k+1 ≥ ∆k . stalls (all subsequent iterations are unsuccessful) when
Otherwise, set xk+1 = xk , declare the iteration the positive spanning directions are poorly chosen at
unsuccessful and update ∆k+1 < ∆k . a feasible point where the gradient of f is nonzero.
Increase k ← k + 1 and go back the definition of The following result shows that our algorithm will
the incumbents. not stall at any point x for which the ∇f (x) 6= 0, re-
gardless of the choice of positive spanning set. The
2.3 Convergence behavior implication is that there eventually will be a success-
ful iteration that will move the iterate away from the
We give the flavor of our results here; details of current point. Thus, we can hope for a more global
our rather technical analysis can be found in [1]. We solution even if we begin at a constrained local solu-
say that a subsequence of the sequence of convergent tion.
unsuccessful poll centers is a refining sequence, if the
corresponding subsequence of mesh size parameters Proposition 2.3 Suppose that f is strictly differen-
{∆k } goes to zero. tiable at the poll center pk , where k is the iteration
number. If ∇f (pk ) 6= 0, then there exists an index
Theorem 2.1 Let x̂ be an accumulation point of a ` ≥ k such that x`+1 6= xk .
refining sequence. If h is strictly differentiable at x̂,
then ∇h(x̂) = 0. The remainder of the paper can be viewed as provid-
ing sophisticated tools and procedures for the search
This means that using h(x) = kC(x)+ k22 gives nice phase of the algorithm. Our earlier work [3] showed
results, but the scaling of the constraint violations that kriging surrogates can be extraordinarily effective
is then more sensitive. Thus, the question arises as search tools.
5
American Institute of Aeronautics and Astronautics
3 Well-Conditioned Kriging Models nor compute β̂.
We propose the following solution to the problem
This section describes the interpolating models used of ill-conditioning: Model the output as the sum of
as approximate models and an approach to well– two stochastically independent Gaussian processes,
conditioned model updating. See, e.g., [24] for a more one with the original correlation parameters estimated
detailed description of Kriging models in computer from the first set of points, and one with a “finer” cor-
output. The model update approach and tests on relation structure. Thus
standard (unconstrained) optimization test problems
are described in [6]. Y (x) = β + Z1 (x) + Z2 (x).
Kriging models are models of the output Y of
where Z1 and Z2 are independent of each other. If, for
a deterministic simulation code as a function of d–
Zi , we denote the correlation function by Ri and the
dimensional inputs x via a stationary Gaussian pro-
variance by σi , then the resulting correlation function
cess Z plus a constant for Y is
Y (x) = β + Z(x) R(x, w) = λR1 (x, w) + (1 − λ)R2 (x, w)
where for each x, Z(x) has mean 0, variance σ 2 and where
correlations between Y (x) and Y (w) are given by the σ12
λ= .
positive definite correlation function σ12 + σ22
Cor(Y (x), Y (w)) ≡ R(x, w). Second Process Parameter Estimation
We note that the correlation function usually depends Updating a Kriging model with a second Gaussian
on d “scale” parameters θ1 , . . . , θd [24]. process requires estimation of correlation parameters
and of the weight λ. For correlations in the first pro-
When Y is observed at N points (called sites in cess Z1 we use the Gaussian product correlation func-
statistical literature) tion ([24]). The initial set of sites and model fit pro-
vide the correlation parameters (computed via Max-
S = (s1 , ..., sN ),
imum Likelihood Estimation [24]) for Z1 . These pa-
denoted by rameters are retained at each model update. For the
second process Z2 we choose the positive cubic spline
YS ≡ (Y (s1 ), ..., Y (sN ))T , correlation structure. The scale correlation parame-
ters for Z2 are assumed to be all unknown, but equal,
then the interpolating approximate model is given by i.e., θj = θ for all j. The fit to the model is then ob-
tained by estimating the parameter θ for Z2 and the
Ŷ (x) = E(Y (x) | YS ). weight λ, via maximum likelihood. The likelihood is
then maximized over λ in the range [0, 1] and θ in
which can be shown to be
the range [0, 1/dist], where dist is the minimum over
Ŷ (x) = β̂ + r(x, S)T R−1 (YS − ~1 · β̂) (2) pairs of distinct sites of the maximum difference in co-
ordinates between pairs. For the positive cubic spline
where (Ri,j ) = correlation between observations Y (si ) correlation, θ greater 1/dist implies all sites are uncor-
and Y (sj ), r(x, S) = vector of correlations between related with respect to R2 . Note that finding the max-
Y (x) and Y (si ), i = 1, . . . , N and imum likelihood over (λ, θ) is a two variable optimiza-
P P −1 tion problem. An approximate maximum likelihood
j Y (sj ) i Ri,j is found by evaluating the likelihood on a weighted
β̂ ≡ P −1 . “λ by θ” grid.
i,j Ri,j

In optimization with approximate models, points 4 Determining Points for Model


are continually added to an initial model to update the Improvement
model. These points tend to “pile up” as we (hope-
fully) approach an optimum. The result is that R This section introduces the constrained, balanced,
becomes ill-conditioned so that one can neither solve local-global search (CBLGS) method for determin-
for ing points for model improvement. The goal is to
R−1 (YS − ~1 · β̂) locate promising points in design space that are far
6
American Institute of Aeronautics and Astronautics
from points where data from the “true” analysis code which ŷ exceeds fmin is not large compared to the
have already been obtained. Points are considered variance.
“promising” if the combination of model-predicted Expected improvement provides a globalization tool
values and estimated model errors indicate a “reason- by accounting for model error bounds, which are func-
able” chance for feasibility and good objective values. tions of variance. However, this only helps in terms of
Since the method tends to select points far from exist- the objective function. The concept of expected im-
ing data points, it truly emphasizes “global” search. provement can easily be extended to the constraints
This global emphasis exceeds that of finding global by introducing the notion of expected violation. For
optimizers for the model problem, which is based on example, the expected violation at x, of a lower bound
existing information. The most local of all the pro- for ci (x), can be obtained from (3) by replacing fmin
cesses in Figure 1 is the poll. As the name implies, by the lower bound value and setting ŷ to the model-
the CBLGS algorithm attempts to balance the consid- predicted value for ci (x). The expected violation of
erations of local and global search of the design space. an upper bound can be computed in a similar way, by
For best utility, CBLGS must run quickly. Thus, it interchanging the roles of fmin and ŷ in (3). Having
is undesirable to have to solve a difficult optimization defined expected improvement and expected violation,
problem to find model-improvement points. Instead, the CBLGS algorithm can now be stated.
CBLGS relies on model evaluations and error bound
estimates over a large set (dense cloud) of points, num-
bering say 10,000 to 100,000, that are well-spread-out Constrained, Balanced, Local-Global Search
in design space. In doing this, CBGLS takes advan- (CBLGS) Algorithm
tage of the facts that large sets of well-spread-out
points can be obtained quickly, and model evalua- The parameters provided to CBLGS are: the num-
tions and model error estimates can be computed very ber of model refinement points requested ngoal, the
cheaply. The dense clouds of points used in CBLGS size of the initial dense cloud of candidate points
are orthogonal array-based Latin hypercubes (see e.g., ncloud, the maximum allowable size for the candi-
Owen [22]). For “reasonably” shaped feasible regions, date set maxcloud, the tolerance on expected feasi-
the feasible subset of the dense cloud will retain the bility f eatol, and the best feasible objective value
property of being well-spread out in design space. Be- found thus far (or an estimate for it) fmin .
fore presenting the CBLGS algorithm, it is necessary
to define the concepts of expected improvement and 1. Compute an orthogonal array-based Latin hyper-
expected violation. cube that has more than ncloud points, but does
For simplicity, expected improvement will be defined not exceed ncloud by a “large” number.
in terms of one Gaussian process. Suppose the Gaus- 2. Compute the L-infinity norm of the expected con-
sian process with variance σ 2 has been evaluated us- straint violations at all the candidate points. Let
ing equation (2) with ŷ = Ŷ (x), and fmin is the best nf eas be the number of candidates with expected
objective function value obtained thus far. The key violations less than f eatol. If nf eas < ngoal and
concepts here come from Schonlau [25], who defines ncloud < maxcloud, increase the size of ncloud
the expected improvement E(I) at x as the integral of (without exceeding maxcloud) and go to 1.
the distribution Y (x) over all values less than or equal
to fmin . This yields, 3. Evaluate expected improvement at the nf eas
points that passed the expected feasibility test.

 (fmin − ŷ)Φ( fmin −ŷ
) + σφ( fmin −ŷ Return as the set of model improvement points

 )


σ σ
the set of min(ngoal, nf eas) points yielding the

 if σ > 0 highest values for expected improvement.
E(I) = (3)



 As indicated above, CBLGS selects the best candi-


 dates from a “bucket” of points with small enough ex-
0 if σ = 0
pected violations. Some values that have been used for
where φ() and Φ denote the probability density func- the CBLGS parameters are 5,000 for ncloud, 100,000
tion and the cumulative distribution function of the for maxcloud, 5 for ngoal, and 0.01 for f eatol on a
standard normal distribution. Roughly speaking, the well-scaled problem.
expected improvement is small unless either the model A modification of CBLGS can be used to obtain
predicted value ŷ is less than fmin , or the amount by sample points for initial modeling, that are well-spread
7
American Institute of Aeronautics and Astronautics
out and are feasible with respect to a set of simulation- column of Table 1 indicates the number of runs, out of
code-independent constraints. For example, it may be 20, that NPSOL converged to a local optimizer that is
desirable to run an expensive simulation code only at a not the global optimizer. (NPSOL identified four dif-
set of points that is feasible with respect to geometric ferent local optimizers for each of the two problems).
constraints that can be computed without running the
The message to be gotten from Table 1 is not that
simulation code. In this situation, the main changes
SEQOPT is a better optimizer than NPSOL. Clearly,
required for CBLGS are redefining expected violation
the two codes are designed for different purposes. In-
to be the computed violation, and eliminating step 3
stead, the message is that while SEQOPT is a global
of the algorithm.
search code, it is reasonably efficient in obtaining its
solutions. We next discuss results for a nondifferen-
5 Test Results tiable set of problems, for which we have no competing
solver to SEQOPT.
SEQOPT is an optimization code that incorporates
all three major features of this paper. That is, filter-
# fevals # fevals # non-global
based optimization, well-conditioned Kriging models,
and CBLGS. Testing is in the preliminary phase, and Problem n SEQOPT NPSOL local opts NPSOL
the initial test results are reported here.
HS59 2 56 41 9/20
The first test results presented are for problems
number 59 (denoted HS59) and 100 (denoted HS100) HS100 2 162 178 4/20
from the widely-used Hock and Schittkowski [17] lo-
cal optimization test set. Since these problems are Table 1: Comparative results on two “standard” test
differentiable and have known answers, they provide problems
a means for validating SEQOPT, and allow us to
obtain comparative data using the highly-regarded An earlier version of SEQOPT was tested on Boeing
local optimization code NPSOL [16]. So that the wing planform design problems. The earlier version
tests reflect conditions where the problem functions differs from the current version only in that it did
come from expensive simulations, loose convergence not allow the option of a poll from the least-infeasible
tolerances were used. That is, a 0.001 solution tol- point, once a feasible point had been found.
erance was set for relative constraint violation and Figure 3 illustrates the wing planform design prob-
for the relative optimal objective value. Also, up- lem. The wing planform is the two-dimensional,
per and lower bounds were placed on all the vari- downward projection of the wing. The design vari-
ables. For HS59, the vector of lower bounds is (0.0,0.0) ables are the line segment end points for the wing lead-
and the vector of upper bounds is (65.0,75.0). For ing edges, trailing edges, and spars. These are the dots
HS100, the vector of lower bounds is (-10.0,-5.0,-5.0,- in Figure 3. In addition to the above, there are vari-
10.0,-3.0,-10.0,-5.0) and the vector of upper bounds is ables related to wing thickness and aerodynamic load-
(10.0,5.0,5.0,10.0,3.0,10.0,5.0). ing. A typical design problem is to minimize direct op-
To obtain comparative data, NPSOL was started erating cost subject to several constraints. The con-
from 20 randomly-generated points in the design straints include required range, maximum approach
space. Table 1 summarizes the comparative results velocity, maximum required runway length, and sev-
for SEQOPT and NPSOL. The function value counts eral others. The analysis code is a sophisticated com-
for SEQOPT, given in the third column of Table 1, are bination of preliminary design tools from many disci-
the sum of those used in obtaining the initial response plines. The disciplines include structures, aerodynam-
models and those computed during the SEQOPT run. ics, weights, costing, and configuration management.
For HS59, 16 function evaluations were used to obtain SEQOPT results for two wing planform design prob-
the initial models. For HS100, 64 function evaluations lems are summarized in Table 2. Table 2 indicates the
were used to obtain the initial models. The function size of the problems and the number of function eval-
value counts for NPSOL, given in the fourth column uations required by SEQOPT to solve the problem to
of Table 1, are the sum of the function evaluations approximately 0.001 relative accuracy. (This assumes
plus n times the number of gradients required, based that the solution found was, in fact, the true global so-
on the cost of one-sided finite difference derivatives. lution. We believe this to be true, but it’s impractical
The NPSOL results are averaged over those runs for to verify this result.) The number of function eval-
which NPSOL obtained the globally optimal function uations given is the sum of those used in obtaining
values, to within 0.001 relative tolerances. The final the initial response models and those computed dur-
8
American Institute of Aeronautics and Astronautics
for further work in many directions. There is an ob-
Design Variables: Y vious need for more testing and refinement. This is
-- Location of spars, leading edge,
particularly true due to the great algorithmic flexibil-
trailing edge
ity allowed within the filter-based framework. We are
-- Thickness
-- Aero loading distribution currently studying the relative merits of several dif-
ferent types of surrogate models. In the longer term,
. there is the opportunity to expand into methods and
codes for discrete problems.

References

[1] Audet C. and Dennis J.E.Jr.(2000), “A Pat-


tern Search Filter Method for Nonlinear Pro-
gramming without Derivatives,” TR00-09 De-
partment of Computational & Applied Mathe-
matics, Rice University, Houston TX.

X [2] Balabanov V., Giunta A.A., Grossman


B., Haim D., Mason W.H., and Watson
L.T.(1996), “Wing design for a high-speed civil
Figure 3: Wing planform design problem transport using a design of experiments method-
ology”, 6th AIAA/USAF/NASA/ISSMO Sym-
posium on Multidisciplinary Analysis and Op-
ing the SEQOPT run. The initial models are based timization, Bellevue, WA, AIAA, 1, September
on 123 runs for Airplane A, and 126 runs for Airplane 4-6, 1996 AIAA-96-4001-CP, pp. 168-183.
B.
[3] Booker A.J., Dennis J.E.Jr, Frank P.D.,
The only comparative results available are for the
Serafini D.B., Torczon V. and Trosset
optimal SEQOPT results versus the “baseline” design.
M.W.(1999), “A rigorous framework for opti-
For both airplanes, the baseline designs violated three
mization of expensive functions by surrogates,”
of the constraints by considerable margins. SEQOPT
Structural Optimization Vol.17 No.1, 1-13.
found feasible airplane designs, that had lower oper-
ating costs than the baseline designs. [4] Booker A.J.(1998), “Design and Anal-
ysis of Computer Experiments”, Seventh
Design # of # fevals AIAA/USAF/NASA/ISSMO Symposium on
Multidisciplinary Analysis and Optimization,
Problem n constrs SEQOPT AIAA-98-4757.
Airplane A 15 11 304
[5] Booker A. J. (1998), “Examples of Surrogate
Airplane B 15 11 292 Modeling of Computer Simulations”, Presented
at ISSMO/NASA/AIAA First Internet Confer-
Table 2: Summary for SEQOPT on planform design ence on Approximations and Fast Reanalysis in
problems Engineering Optimization.

[6] Booker A. J. (2000), “Well–Conditioned Krig-


ing Models for Optimization of Computer Mod-
6 Conclusions els, Boeing Phantom Works, Mathematics and
Computing Technology, Report M&CT-TECH-
A provably convergent method for surrogate-model- 002, February.
based optimization has been presented. The method
features filter-based optimization, robust surrogate [7] Burgee S.L., Giunta A.A., Balabanov V.,
models, and a balance of local and global search. The Grossman B., Mason W.H., Narducci R.,
method not only shows promise on standard test prob- Haftka R.T., and Watson L.T.(1996), “A
lems, but has been validated on problems of indus- coarse grained parallel variable–complexity mul-
trial significance. There is considerable opportunity tidisciplinary optimization paradigm”, Intl. J.
9
American Institute of Aeronautics and Astronautics
Supercomputing Applications and High Perfor- Programming,” Department of Operations Re-
mance Computing, 10, 269–299. search, Stanford University, Report SOL 86-2.

[8] Currin C., Mitchell T., Morris M., and [17] Hock W. and Schittkowski K.(1981) “Test
Ylvisaker D.(1988), “A Bayesian approach Examples for Nonlinear Programming Codes,”
to the design and analysis of computer exper- Springer-Verlag, New York.
iments”, Technical Report ORNL–6498, Oak
Ridge National Laboratory. [18] Lewis R.M. and Torczon V.(1998), “Pat-
tern search methods for linearly constrained
[9] Currin C., Mitchell T., Morris M., and minimization,” ICASE NASA Langley Research
Ylvisaker D.(1991), “Bayesian prediction of Center TR 98-3. To appear in SIAM Journal on
deterministic functions, with applications to the Optimization.
design and analysis of computer experiments”,
Journal of the American Statistical Association, [19] Lewis R.M. and Torczon V.(1996), “Pat-
86(416), 953–963. tern search algorithms for bound constrained
minimization,” SIAM Journal on Optimization,
[10] Booker A.J., Dennis J.E.Jr., Frank P.D., Vol.9 No.4, 1082-1099.
Moore D.W. and Serafini D.B.(1998),
“Managing Surrogate Objectives to Optimize [20] Lewis R.M. and Torczon V.(1996), “Rank
a Helicopter Rotor Design Example”, Sev- ordering and positive basis in pattern search
enth AIAA/USAF/NASA/ISSMO Symposium algorithms,” ICASE NASA Langley Research
on Multidisciplinary Analysis and Optimization, Center TR 96-71.
AIAA-98-4717. [21] Lewis R.M. and Torczon V.(1998), “A glob-
[11] deBoor, C. and Ron, A.(1992), “Computa- ally convergent augmented Lagrangian pattern
tional aspects of polynomial interpolation in search algorithm for optimization with general
several variables”, Mathematics of Computa- constraints and simple bounds,” ICASE NASA
tion, 58(198), 705–727. Langley Research Center TR 98-31.
[22] Owen A.B.(1992), “Orthogonal arrays for com-
[12] Giunta A.A.(1997), Aircraft Multidisciplinary
Optimization using Design of Experiments The- puter experiments, integration and visualiza-
tion”, Statistica Sinica, 2, 439–452.
ory and Response Surface Modeling Methods,
PhD thesis, Virginia Tech, 1997. Available [23] Sacks J, Schiller S.B., and Welch
as MAD 97-05-01, May 1997, Department of W.J.(1989), “Designs for computer experi-
Aerospace and Ocean Engineering, Virginia ments”, Technometrics, 31(1), 41–47.
Tech, 215 Randolph Hall, Blacksburg, Va 24061.
[24] Sacks J., Welch W.J., Mitchell T.J.,
[13] Fletcher R. and Leyffer S.(1997), “Nonlin- and Wynn H.P.(1989), “Design and analysis
ear Programming without a penalty function,” of computer experiments”, Statistical Science,
Dundee University, Dept. of Mathematics, Re- 4(4), 409–435.
port NA/171.
[25] Schonlau M.(1997), Computer experiments
[14] Fletcher R, Leyffer S. and Toint and global optimization, PhD thesis, Statistics
Ph.L.(1998), “On the global convergence of Department, University of Waterloo, Ontario,
an SLP-Filter algorithm,” Dundee University, Canada.
Dept. of Mathematics, Report NA/183.
[26] Simpson T.W. (1998), “Comparison of re-
[15] Fletcher R, Gould N.I.M., Leyffer S. sponse surface and Kriging models in the mul-
and Toint Ph.L.(1999), “On the global conver- tidisciplinary design of an aerospace nozzle”,
gence of trust-region SQP-Filter algorithms for ICASE Report No. 98-16, February.
general nonlinear programming,” Department
of Mathematics, FUNDP, Namur (B), Report [27] Torczon V.(1997), “On the Convergence of
99/03. Pattern Search Algorithms,” SIAM Journal on
Optimization Vol.7 No.1, 1–25.
[16] Gill P.E., Murray W., Suanders M.A. and
Wright M.H.(1986) “User’s Guide for NPSOL
(Version 4.0): a Fortran Package for Nonlinear
10
American Institute of Aeronautics and Astronautics

You might also like