Professional Documents
Culture Documents
Aiaa 00 4891
Aiaa 00 4891
Aiaa 00 4891
Abstract 1 Introduction
This paper describes an algorithm and provides test This paper describes an optimization method for
results for surrogate-model-based optimization. In the case where the objective and constraint func-
this type of optimization, the objective and constraint tions are represented by inexpensive-to-evaluate “sur-
functions are represented by global “surrogates”, i.e. rogate” models of a “true” simulation code. The
response models, of the “true” problem responses. In use of response models is particularly attractive when
general, guarantees of global optimality are not pos- the true responses come from compute-intensive com-
sible. However, a robust surrogate-model-based op- puter simulations. Examples include 3D fluid dynam-
timization method is presented here that has good ics analysis and structural analysis via the finite ele-
global search properties, and proven local convergence ment method. The use of inexpensive-to-evaluate sur-
results. This paper describes methods for handling rogate models has obvious potential for speeding the
three key issues in surrogate-model-based optimiza- optimization process. The surrogate models also have
tion. These issues are maintaining a balance of effort the benefits of allowing optimization of problems with
between global design space exploration and local op- nonsmooth or noisy responses, and providing insight
timizer region refinement, maintaining good surrogate into the nature of the design space [4][5][10].
model conditioning as points “pile up” in local regions,
Because response models are inexpensive-to-
and providing a provably convergent method for en-
evaluate surrogates for expensive analysis codes, they
suring local optimality.
are often used to represent disciplines in multidisci-
Acknowledgments: Work of the first author was plinary design optimization (MDO) problems (see e.g.
supported by NSERC (Natural Sciences and Engi- [7][12][2][26]) and competing objectives in multiobjec-
neering Research Council) fellowship PDF-207432- tive optimization.
1998, and the first three authors was supported
There are many types of surrogate models to
by DOE DE-FG03-95ER25257, AFOSR F49620-98-1-
choose from. Among these are neural nets, well-
0267, The Boeing Company, Sandia LG-4253, Exxon-
conditioned low-degree polynomials [11], and Kriging
Mobil and CRPC CCR-9120008.
models [9][8][23][24]. As will be discussed later, the
Copyright
2000
c by the American Institute of Aeronautics method used here for selecting points for model refine-
and Astronautics, Inc. All rights reserved. ment requires estimates of uncertainty in model value
1
American Institute of Aeronautics and Astronautics
Initial Iterate X0
predictions at arbitrary points in design space. These
estimates can readily be obtained for Kriging models.
This is one reason that Kriging models are used in the Search or Poll?
x`b ≤ x ≤ xub
CONTROLLER:
Update Filter-Based,
where f : < → < and C : < → < are continuously
n n m Models
with new
Derivative-Free
Optimization
differentiable functions with C = (c1 , . . . , cm )T . The data Algorithm
vectors x`b and xub in <n are the lower and upper
bounds on the variables. The inequalities in (1) are Figure 1: The optimization algorithm
completely general because a bound other than zero
can be handled by redefining the appropriate com-
ponent function ci (x). Lower-bounded inequalities tion 2. Much of this optimization method is encom-
can be handled by negating the corresponding ci (x). passed in the Figure 1 box labeled “controller”.
Equality constraints can be specified as a pair consist- The basic optimization method is an iterative pro-
ing of an upper-bounded and lower-bounded inequal- cess. At each iteration, the method identifies points
ity with the same constraint functions and bounds. at which more “true” simulation data is needed. The
However, since this paper deals with modeling of ex- data are obtained and surrogate models are updated
pensive simulations, it is assumed that there will be using the new data. The optimization controller uses
a fairly loose feasibility tolerance for equality con- the new data to decide how the optimization process
straints. Thus, for ease of exposition and for discus- will proceed. One such decision is whether to be in
sion of convergence, equalities will be considered to be the local, lattice-based, poll mode or the more global
inequality constraints with upper and lower bounds search mode. The modes are discussed next.
that happen to be “somewhat” close to each other.
The search mode selects new points for running
For ease of exposition, it is assumed in (1) that f (x) the simulation code that are either “global optimiz-
and the constraint functions ci (x) are responses from ers” for the surrogate model problem or are points for
a “true” simulation code, each of which is represented model improvement. The search for “global optimiz-
by a single surrogate model. In actuality, the opti- ers” may consist of simply running a local optimiza-
mization method presented here is valid if f (x) or tion code from several (say 10) randomly selected start
component functions of C(x) are represented by ar- points in the design space. The model-improvement
bitrary functions of surrogate model values. For ex- points are not global optimizers of the response-model
ample, the surrogate objective can be a weighted sum problem, but lie in regions of possibly large model er-
of several response models. In addition, the method ror, where a global optimum is not unlikely. Thus, it
can handle the case where f (x) or some components is worth obtaining new “true” data at these points.
of C(x) are “truth” values, i.e. are not modeled. For Since these points tend to be far from existing data,
example, this is the case when some constraints are they represent the most global aspect of the optimiza-
simple analytic functions of the variables, such as con- tion method. The constrained, balanced, local-global
straints on geometry. search (CBLGS) method presented in Section 4 de-
Figure 1 is a high-level view of the optimization scribes selection of these model-improvement points.
algorithm. The method shown in Figure 1 is a The poll mode is a local, lattice-based, search. The
particular example of the very general filter-based, simplest poll pattern is the collection of unit steps in
derivative-free optimization method described in Sec- the 2n positive and negative coordinate directions on
2
American Institute of Aeronautics and Astronautics
an n–dimensional grid. However, it has been shown programming to suggest steps. Previous filter algo-
[20] that polling can be done along any set of vectors rithms thus require explicit use of the derivatives of
that form a positive spanning set for n–dimensional both the objective and the constraints. Our method
space. A positive spanning set is any set of vectors does not.
that spans n−dimensional space when summed with
nonnegative coefficients. We present a pattern search version of the filter
approach that converges using any user-defined finite
A key item in Figure 1 is the “Update Models with search procedure, here based on surrogates, to sug-
new data” process. A common problem in surrogate- gest steps, and which falls back on a special poll step
model-based optimization is the degradation in condi- when the user defined procedure stalls, i.e., when the
tioning as the models are updated to incorporate new search is unsuccessful. Elsewhere [1] we show that
data. This is because the optimization process tends our filter-pattern search class of algorithms guarantees
to “cluster” points in design space. Section 3 dis- subsequence convergence to points that satisfy appro-
cusses Kriging models and presents a new method for priate optimality conditions depending on the local
maintaining well-conditioned models during the opti- smoothness of the problem functions. Convergence is
mization process. assured without assuming even finite values or conti-
A summary of the content of this paper is as follows. nuity.
The “filter-based” optimization method, and conver- The nonnegative constraint violation function sat-
gence results for this method, are given in Section 2. isfies h(x) = 0 if and only if x is feasible (and thus
Section 3 provides a description of Kriging models and h(x) > 0 if and only if x is infeasible). We assume that
presents a new method for maintaining model condi- h satisfies a weak monotonicity property that if C(x)+
tioning. The CBLGS algorithm for selecting model- is the vector whose j th component is max(0, cj (x)),
improvement points is given in Section 4. Testing of then the component-wise inequality C(x)+ ≤ C(y)+
the optimization method is summarized in Section 5. implies h(x) ≤ h(y). For example, we could set
Conclusions and future work are discussed in Section h(x) = kC(x)+ k where k · k is a vector norm. Our
6. best theoretical results are for h(x) = kC(x)+ k22 .
A filter F is a finite set containing the feasible point
2 Filter-Based Derivative-Free
with best known objective, and infeasible points in
Optimization Algorithms <n such for no pair of infeasible points (x, x0 ) in the
filter does one point have both a better objective and
2.1 Introduction constraint violation function value than the other. An
infeasible point x0 is said to be filtered if either hmax ≤
Torczon [27] presents a flexible class of generalized h(x0 ) for some user-supplied upper bound hmax , or if
pattern search (GPS) algorithms for derivative-free for some infeasible x (x0 6= x) belonging to the filter,
unconstrained optimization. Here, our method (thor- h(x0 ) ≥ h(x) and f (x0 ) ≥ f (x). A feasible point is
oughly presented in [1]) extends the GPS algorithm filtered if its objective function value is greater than
to general nonlinear programming problems by com- or equal to the feasible incumbent value f F (i.e., the
bining it with filter algorithms (described below) in least function value found so far at a feasible solution).
a way that reduces transparently to GPS for uncon- It is unfiltered otherwise. Of course, the first feasible
strained problems. We do this without penalty con- point encountered (if any) is automatically the initial
stants or Lagrange multiplier estimates (as required feasible incumbent solution.
in [21]) by formulating a step acceptance rule based
on filter methods. The set of filtered points F is denoted in standard
notation as:
Filter algorithms were introduced by Fletcher and
Leyffer [13] as a way to globalize SQP and SLP with-
[ h(x ) ≥ h(x),
0
out using a merit function that would require a trou-
F = x0 ∈ <n \ F :
blesome penalty or barrier parameter to be estimated. f (x0 ) ≥ f (x)
x∈F
A step is accepted if it either reduces the objective [
function or the constraint violation. Fletcher, Leyffer {x0 ∈ <n : h(x0 ) ≥ hmax }
and Toint [14] show convergence of the method that [
uses sequential linear programming to suggest steps. {x0 ∈ <n : h(x0 ) = 0, f (x0 ) > f F }.
Fletcher, Gould, Leyffer and Toint [15] show conver-
gence of the method that uses sequential quadratic
3
American Institute of Aeronautics and Astronautics
f 6 Feasible region: Ω = {x ∈ <n : h(x) = 0} tion, we may wish to poll around one of the nearly
Trial set: Tk feasible points, which might have a lower objective
fkF r function value and thus lead us to explore a different
Filtered points: F k
r Fk part of the feasible region Ω. This option allows the
(hIk ,fkI ) algorithm to avoid stalling (when infinitely many con-
Successful iterations: Tk \ F k 6= ∅ secutive unsuccessful iterations are generated) in the
Unsuccessful iterations: Tk ⊂ F k
Lewis and Torczon [19] example.
The most useful successful iterations are those that
produce a feasible iterate xk+1 which improves the fea-
- F
hmax h sible incumbent value to fk+1 = f (xk+1 ). Next, are
the successful iterations that do not produce a feasible
iterate, but improve the most nearly feasible incum-
bent solution: hIk+1 = h(xk+1 ) and fk+1 I
= f (xk+1 ).
Figure 2: The three possible types of iterations.
Finally, there are the other successful iterations. They
F
leave the incumbents unchanged fk+1 = fkF , hIk+1 =
I I I
2.2 Algorithm hk and fk+1 = fk , but add some elements to the filter.
The unsuccessful iterations are such that all points
Unsuccessful iterations are those where all trial
in the trial set are filtered. Regardless of the feasi-
points are filtered. Successful iterations are those
bility of xk , if the iteration is unsuccessful, the next
where an unfiltered trial point is found. The search
iterate xk+1 is set to xk , and the mesh size parameter
or poll step may be terminated without any more
is decreased: ∆k+1 < ∆k .
function or constraint evaluations if such a point is
found. The mesh size parameter is either increased Our algorithm for constrained optimization is
or kept constant in successful iterations, and it is de- stated formally below. We allow for the fact that in
creased in unsuccessful ones. some applications, a set of initial solutions may be
used to seed the filter. Without any loss of general-
We define two types of incumbent solutions: the
ity we assume that any such points are on the initial
feasible ones, and the most nearly feasible ones (see
mesh and that they have been “filtered” to be consis-
Figure 2). Let fkF represent the feasible incumbent
tent with our initialization step in the sense that x0
value, i.e., the smallest objective function value (for
will not be filtered by the other seed points. An easy
feasible points) found by the algorithm up to iteration
way to assure this would be to take x0 to be the most
k. Let hIk > 0 be the least positive constraint violation
feasible seed point, breaking ties by taking one with
value found up to iteration k, and let fkI denote the
the smallest objective function value.
smallest objective function value of the points found
whose constraint violation function value are equal to A standard trick in engineering design, when seed
hIk . The superscript F stands for feasible and I for designs are available, is to go further and make linear
infeasible. combinations of the seed points be the new design
The poll set for any iteration is the set of mesh space. This certainly makes sense, can lead to a big
neighbors of pk . The poll center pk is either chosen in reduction in the dimension of the design space, and
the set of feasible incumbent solutions, or it must be makes the initial mesh be defined by the seed points
one of the most nearly feasible incumbent solutions. in a simple way.
Generally, there will be a single value in each of the It is implicit in all we have said that our initial solu-
sets of incumbents. tion must have finite values for the objective and con-
The poll center pk either satisfies (h(pk ), f (pk )) = straint violations, and this will be true of all the iter-
(0, fkF ), or, (h(pk ), f (pk )) = (hIk , fkI ). Our class of ates generated by the algorithm. In [1], we show how
algorithms and their analyses are completely flexible to incorporate a finite number of linear constraints
about the choice between these two alternative poll by setting the objective value to infinity points that
centers pk at a given iteration. It is up to the user to violate them while using the filter for more general
select the strategy defining the poll center. For exam- constraints.
ple, it might be a good strategy to alternate choosing Algorithm
the poll centers in the set of feasible and infeasible
incumbents until an unsuccessful iteration occurs. • Initialization:
Even if we already have a feasible incumbent solu- Let x0 be one of a set of initial points in the filter
4
American Institute of Aeronautics and Astronautics
F0 . Fix hmax > h(x0 ) and ∆0 > 0 and set the to the differentiability of the more common choice
iteration counter k to 0. h1 (x) = kC(x)+ k1 . The answer is that this choice is
rarely strictly differentiable at x̂, and a proof requires
• Definition of incumbent solutions: a very technical lemma.
Define (if possible)
fkF : the smallest objective function value for all We now summarize some results for the objective
feasible solutions found so far; function at a limit point.
hIk > 0: the least positive constraint violation
function value found so far; Theorem 2.2 Let x̂ be an accumulation point of any
fkI : the smallest objective function value of the refining subsequence. If x̂ is strictly feasible and if f
points found so far whose constraint violation is strictly differentiable at x̂, then ∇f (x̂) = 0.
function value are equal to hIk .
This says that if we converge to a point in the in-
• Search and poll steps: terior of the feasible region, then the KKT first order
Perform the search and possibly the poll steps necessary conditions hold. Wherever it is located, x̂
(or only part of the steps) until an unfiltered trial is a KKT point for a related problem. We spare the
point xk+1 is found, or when it is shown that all reader the technical result (rigorously shown in [1])
trial points are filtered. that for general constraints, we can construct a cone
that contains NΩ (x̂) as well as −∇f (x̂). In the case
– Search step:
that the constraints are sufficiently simple (for exam-
Evaluate the functions h and f on a set of
ple, a finite number of linear constraints), then one
trial points on the current mesh Mk (the
can arrange the polling directions so that this related
strategy that gives the set of points is usu-
problem is the actual problem. Of course, in theory
ally provided by the user).
it is always possible to include the right polling direc-
– Poll step: tions so that this happens.
Evaluate the functions h and f on the
By using a filter-based step acceptance criterion, we
poll set around pk , where pk satisfies either
have overcome a difficulty in applying pattern search
(h(pk ), f (pk )) = (0, fkF ) or (h(pk ), f (pk )) =
algorithms to constrained optimization; specifically,
(hIk , fkI ).
the problem that the objective function descent direc-
• Parameter update: tions in the positive spanning directions may be infea-
If the search or the poll step produced an un- sible. Lewis and Torczon [19] give an example where a
filtered iterate xk+1 ∈ Fk+1 , then declare the it- nonfilter-based version of the pattern search algorithm
eration successful and update ∆k+1 ≥ ∆k . stalls (all subsequent iterations are unsuccessful) when
Otherwise, set xk+1 = xk , declare the iteration the positive spanning directions are poorly chosen at
unsuccessful and update ∆k+1 < ∆k . a feasible point where the gradient of f is nonzero.
Increase k ← k + 1 and go back the definition of The following result shows that our algorithm will
the incumbents. not stall at any point x for which the ∇f (x) 6= 0, re-
gardless of the choice of positive spanning set. The
2.3 Convergence behavior implication is that there eventually will be a success-
ful iteration that will move the iterate away from the
We give the flavor of our results here; details of current point. Thus, we can hope for a more global
our rather technical analysis can be found in [1]. We solution even if we begin at a constrained local solu-
say that a subsequence of the sequence of convergent tion.
unsuccessful poll centers is a refining sequence, if the
corresponding subsequence of mesh size parameters Proposition 2.3 Suppose that f is strictly differen-
{∆k } goes to zero. tiable at the poll center pk , where k is the iteration
number. If ∇f (pk ) 6= 0, then there exists an index
Theorem 2.1 Let x̂ be an accumulation point of a ` ≥ k such that x`+1 6= xk .
refining sequence. If h is strictly differentiable at x̂,
then ∇h(x̂) = 0. The remainder of the paper can be viewed as provid-
ing sophisticated tools and procedures for the search
This means that using h(x) = kC(x)+ k22 gives nice phase of the algorithm. Our earlier work [3] showed
results, but the scaling of the constraint violations that kriging surrogates can be extraordinarily effective
is then more sensitive. Thus, the question arises as search tools.
5
American Institute of Aeronautics and Astronautics
3 Well-Conditioned Kriging Models nor compute β̂.
We propose the following solution to the problem
This section describes the interpolating models used of ill-conditioning: Model the output as the sum of
as approximate models and an approach to well– two stochastically independent Gaussian processes,
conditioned model updating. See, e.g., [24] for a more one with the original correlation parameters estimated
detailed description of Kriging models in computer from the first set of points, and one with a “finer” cor-
output. The model update approach and tests on relation structure. Thus
standard (unconstrained) optimization test problems
are described in [6]. Y (x) = β + Z1 (x) + Z2 (x).
Kriging models are models of the output Y of
where Z1 and Z2 are independent of each other. If, for
a deterministic simulation code as a function of d–
Zi , we denote the correlation function by Ri and the
dimensional inputs x via a stationary Gaussian pro-
variance by σi , then the resulting correlation function
cess Z plus a constant for Y is
Y (x) = β + Z(x) R(x, w) = λR1 (x, w) + (1 − λ)R2 (x, w)
where for each x, Z(x) has mean 0, variance σ 2 and where
correlations between Y (x) and Y (w) are given by the σ12
λ= .
positive definite correlation function σ12 + σ22
Cor(Y (x), Y (w)) ≡ R(x, w). Second Process Parameter Estimation
We note that the correlation function usually depends Updating a Kriging model with a second Gaussian
on d “scale” parameters θ1 , . . . , θd [24]. process requires estimation of correlation parameters
and of the weight λ. For correlations in the first pro-
When Y is observed at N points (called sites in cess Z1 we use the Gaussian product correlation func-
statistical literature) tion ([24]). The initial set of sites and model fit pro-
vide the correlation parameters (computed via Max-
S = (s1 , ..., sN ),
imum Likelihood Estimation [24]) for Z1 . These pa-
denoted by rameters are retained at each model update. For the
second process Z2 we choose the positive cubic spline
YS ≡ (Y (s1 ), ..., Y (sN ))T , correlation structure. The scale correlation parame-
ters for Z2 are assumed to be all unknown, but equal,
then the interpolating approximate model is given by i.e., θj = θ for all j. The fit to the model is then ob-
tained by estimating the parameter θ for Z2 and the
Ŷ (x) = E(Y (x) | YS ). weight λ, via maximum likelihood. The likelihood is
then maximized over λ in the range [0, 1] and θ in
which can be shown to be
the range [0, 1/dist], where dist is the minimum over
Ŷ (x) = β̂ + r(x, S)T R−1 (YS − ~1 · β̂) (2) pairs of distinct sites of the maximum difference in co-
ordinates between pairs. For the positive cubic spline
where (Ri,j ) = correlation between observations Y (si ) correlation, θ greater 1/dist implies all sites are uncor-
and Y (sj ), r(x, S) = vector of correlations between related with respect to R2 . Note that finding the max-
Y (x) and Y (si ), i = 1, . . . , N and imum likelihood over (λ, θ) is a two variable optimiza-
P P −1 tion problem. An approximate maximum likelihood
j Y (sj ) i Ri,j is found by evaluating the likelihood on a weighted
β̂ ≡ P −1 . “λ by θ” grid.
i,j Ri,j
References
[8] Currin C., Mitchell T., Morris M., and [17] Hock W. and Schittkowski K.(1981) “Test
Ylvisaker D.(1988), “A Bayesian approach Examples for Nonlinear Programming Codes,”
to the design and analysis of computer exper- Springer-Verlag, New York.
iments”, Technical Report ORNL–6498, Oak
Ridge National Laboratory. [18] Lewis R.M. and Torczon V.(1998), “Pat-
tern search methods for linearly constrained
[9] Currin C., Mitchell T., Morris M., and minimization,” ICASE NASA Langley Research
Ylvisaker D.(1991), “Bayesian prediction of Center TR 98-3. To appear in SIAM Journal on
deterministic functions, with applications to the Optimization.
design and analysis of computer experiments”,
Journal of the American Statistical Association, [19] Lewis R.M. and Torczon V.(1996), “Pat-
86(416), 953–963. tern search algorithms for bound constrained
minimization,” SIAM Journal on Optimization,
[10] Booker A.J., Dennis J.E.Jr., Frank P.D., Vol.9 No.4, 1082-1099.
Moore D.W. and Serafini D.B.(1998),
“Managing Surrogate Objectives to Optimize [20] Lewis R.M. and Torczon V.(1996), “Rank
a Helicopter Rotor Design Example”, Sev- ordering and positive basis in pattern search
enth AIAA/USAF/NASA/ISSMO Symposium algorithms,” ICASE NASA Langley Research
on Multidisciplinary Analysis and Optimization, Center TR 96-71.
AIAA-98-4717. [21] Lewis R.M. and Torczon V.(1998), “A glob-
[11] deBoor, C. and Ron, A.(1992), “Computa- ally convergent augmented Lagrangian pattern
tional aspects of polynomial interpolation in search algorithm for optimization with general
several variables”, Mathematics of Computa- constraints and simple bounds,” ICASE NASA
tion, 58(198), 705–727. Langley Research Center TR 98-31.
[22] Owen A.B.(1992), “Orthogonal arrays for com-
[12] Giunta A.A.(1997), Aircraft Multidisciplinary
Optimization using Design of Experiments The- puter experiments, integration and visualiza-
tion”, Statistica Sinica, 2, 439–452.
ory and Response Surface Modeling Methods,
PhD thesis, Virginia Tech, 1997. Available [23] Sacks J, Schiller S.B., and Welch
as MAD 97-05-01, May 1997, Department of W.J.(1989), “Designs for computer experi-
Aerospace and Ocean Engineering, Virginia ments”, Technometrics, 31(1), 41–47.
Tech, 215 Randolph Hall, Blacksburg, Va 24061.
[24] Sacks J., Welch W.J., Mitchell T.J.,
[13] Fletcher R. and Leyffer S.(1997), “Nonlin- and Wynn H.P.(1989), “Design and analysis
ear Programming without a penalty function,” of computer experiments”, Statistical Science,
Dundee University, Dept. of Mathematics, Re- 4(4), 409–435.
port NA/171.
[25] Schonlau M.(1997), Computer experiments
[14] Fletcher R, Leyffer S. and Toint and global optimization, PhD thesis, Statistics
Ph.L.(1998), “On the global convergence of Department, University of Waterloo, Ontario,
an SLP-Filter algorithm,” Dundee University, Canada.
Dept. of Mathematics, Report NA/183.
[26] Simpson T.W. (1998), “Comparison of re-
[15] Fletcher R, Gould N.I.M., Leyffer S. sponse surface and Kriging models in the mul-
and Toint Ph.L.(1999), “On the global conver- tidisciplinary design of an aerospace nozzle”,
gence of trust-region SQP-Filter algorithms for ICASE Report No. 98-16, February.
general nonlinear programming,” Department
of Mathematics, FUNDP, Namur (B), Report [27] Torczon V.(1997), “On the Convergence of
99/03. Pattern Search Algorithms,” SIAM Journal on
Optimization Vol.7 No.1, 1–25.
[16] Gill P.E., Murray W., Suanders M.A. and
Wright M.H.(1986) “User’s Guide for NPSOL
(Version 4.0): a Fortran Package for Nonlinear
10
American Institute of Aeronautics and Astronautics