Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Optimization and Engineering, 5, 145156, 2004

c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands


Convergence of Hybrid Space Mapping Algorithms
KAJ MADSEN, JACOB SNDERGAARD
Informatics and Mathematical Modelling, Technical University of Denmark
email: km@imm.dtu.dk
email: js@imm.dtu.dk
Received January 13, 2003; Revised August 30, 2003
Abstract. The space mapping technique is intended for optimization of engineering models which involve very
expensive function evaluations. It may be considered a preprocessing method which often provides a very efcient
initial phase of an optimization procedure. However, the ultimate rate of convergence may be poor, or the method
may even fail to converge to a stationary point.
We consider a convex combination of the space mapping technique with a classical optimization technique.
The function to be optimized has the form H f where H : R
m
. R is convex and f : R
n
. R
m
is smooth.
Experience indicates that the combined method maintains the initial efciency of the space mapping technique.
We prove that the global convergence property of the classical technique is also maintained: The combined method
provides convergence to the set of stationary points of H f .
Keywords: space mapping, global convergence
1. Introduction
The subject of this paper is to prove global convergence of an optimization method which
is a convex combination of two strategies: One which is efcient initially in an iteration
and another which has guaranteed global convergence. The rst algorithm is the so-called
space mapping technique, described and motivated in Bakr et al. (2001), the other one is a
classical 1. order Taylor based trust region algorithm.
The problem to be solved is the following:
min
xR
n
H( f (x)) (1)
where f : R
n
.R
m
is a smooth function, often m n. H : R
m
.R is a convex function.
It may be a norm in R
m
, typically the L
1
, L
2
or the L

norm. The following minimax


function,
H(y) max
1i m
{y
i
]
where y = (y
1
, y
2
, . . . , y
m
)
T
, is also often used, e.g., in electromagnetic design, which has
been a major application area for the space mapping technique. Thus it is important to cover
the case where H is non-differentiable.
146 MADSEN AND SNDERGAARD
In the smooth least squares case, H = ||
2
2
, the 1. order Taylor based method is the Gauss-
Newton method. In this case convergence of the combined algorithm is proved in Vicente
(2002). For the non-differentiable choices of H mentioned above special versions of the
Taylor based method have been published in Madsen (1975) and Hald and Madsen (1985).
For general convex H convergence of the Taylor based trust region algorithm was proved
in Madsen (1985). Related results may be found in Fletcher (1982) and Womersley and
Fletcher (1986).
The space mapping technique which was introduced in Bandler et al. (1994) is intended
for problems where f is computationally expensive. It is an optimization technique for
engineering design in the following situation: f is representing an accurate model of some
physical system. Besides this model of primary interest (denoted the ne model), access to
a cheaper (coarse) model of the same physical system is assumed. The latter may be less
accurate. The main idea of the space mapping technique is to use the coarse model to gain
information about the ne model, and to apply this in the search for an optimal solution
of the latter. Thus the technique iteratively establishes a mapping between the parameters
of the two models which relate similar model responses. Having this mapping, most of the
model evaluations can be directed to the fast coarse model. A review of the Space Mapping
approach is given in Bakr (2000).
We give a description of the combined method in Section 2. The convergence is proved
in Section 3.
2. Description of the algorithms
The two algorithms which are combined are both iterative. In the descriptions below the
current iterate is x
k
R
n
. H f is denoted by F.
The Space Mapping Algorithm (SMA) assumes two functions available: The function f
to be minimized, and a function c : R
n
. R
m
which represents the coarse model that is
related to the same physical model as f . The space mapping p : R
n
. R
n
is intended to
connect similar values of f and c. In the present description it satises the following for
x R
n
:
p(x) arg min
xR
n
|c( x) f (x)| (2)
where | | is a norm in R
n
, usually the L
2
norm. The tentative step h
k
from x
k
is based on
the following approximation to p(x
k
h):
p
k
(h) = B
k
h p(x
k
), (3)
where the matrix B
k
is intended to approximate the gradient p
/
(x
k
)
T
of p at x
k
. This
approximation may be found using the gradients of f and c if they are available, otherwise
a Broyden update (Broyden, 1965) or a difference approximation to p
/
(x
k
)
T
has been used.
It is not important in the present paper how B
k
is found.
The SMA nds h
k
as a solution to
h
k
arg min
h
H(c( p
k
(h))),
CONVERGENCE OF HYBRID SPACE MAPPING ALGORITHMS 147
where the minimization is usually subject to a trust region.
In the Taylor-based method for minimizing f the tentative step from x
k
is based on the
following 1. order approximation to f (x
k
h),
f
k
(h) = D
k
h f (x
k
).
In the present paper D
k
= f
/
(x
k
)
T
. Otherwise a Broyden update or a difference approxi-
mation to f
/
(x
k
)
T
have been used.
The tentative step h
k
is found as a solution to
h
k
arg min
h
H( f
k
(h)), (4)
where the minimization is usually subject to a trust region.
In the combined algorithm (SMTA) for minimizing f the tentative step h
k
from x
k
is
based on a convex combination of c p
k
and f
k
:
s
w
k
(h) = wc( p
k
(h)) (1 w) f
k
(h)
where 0 w 1 is a transition parameter.
The method nds the tentative step h
k
as a solution to
h
k
arg min
h
H

s
w
k
k
(h)

,
where the minimization is usually subject to a trust region.
In the algorithm w
0
= 1 and w
k
is non-increasing function of k. The principle being that
if c p
k
is doing badly in approximating f then w
k
is decreased, and thereby the algorithm
gradually switches to using the Taylor model f
k
of f .
2.1. Details of the SMTA
The trust region at x
k
is the set
T
k
= {x R
n
[ |x x
k
|
k
] (5)
where || is a suitable norm in R
n
and
k
> 0.
At the current iterate x
k
, the tentative step h
k
R
n
is a solution to
h
k
arg min
h
H

s
w
k
k
(h)

s.t. x
k
h T
k
(6)
The quality of a given tentative step h
k
is measured by
S
k
(h
k
) H

s
w
k
k
(h
k
)

H( f (x
k
)).
148 MADSEN AND SNDERGAARD
S
k
(h
k
) may be interpreted as a measure of the ability of the model s
w
k
to predict a
decrease in F. Notice that S
k
(0) is not necessarily 0, however, i.e., the model does not
necessarily interpolate at x
k
.
If h
k
is acceptable then we use x
k
h
k
as the next iterate, otherwise we maintain x
k
as
the best iterate found. The acceptance of the step is based on the following criteria:
If the predicted decrease is non-positive,
S
k
(h
k
) 0, (7)
then the step is rejected. Otherwise the step is accepted if F decreases sufciently:
F(x
k
) F(x
k
h
k
)

1
[S
k
(h
k
)] (8)
where 0 <

1
< 1.
In each iteration the local bound
k
and the transition parameter w
k
are adjusted as
follows:
If (7) is true then
k1
=
k
, otherwise
k1
depends on the ratio between actual and
the predicted decrease. If
F(x
k
) F(x
k
h
k
)

2
[S
k
(h
k
)], (9)

1
<

2
< 1, is satised then
k1
= K
1

k
with 0 < K
1
< 1.
If
F(x
k
) F(x
k
h
k
)

3
[S
k
(h
k
)], (10)

2
<

3
< 1, is satised then
k1
= K
2

k
with K
2
> 1.
If none of the conditions (9) or (10) are satised then we let
k1
=
k
.
The transition parameter w
k
is chosen as follows: Initially w
k
= 1. If x
k
h
k
is not
accepted then we wish to move weight towards the Taylor model, and therefore we let
w
k1
= K
3
w
k
min{
k1
, 1], (11)
where 0 K
3
< 1. Otherwise w
k1
= w
k
.
In order to ensure convergence, however, we need w
k
0 for k . Therefore we
also apply (11) if it has not been used for the previous n iterations.
2.2. Summary of the SMTA
Given
0
, B
0
= I , w
0
= 1, k = 0
0. Find x
0
as a solution to min
x
H(c( x))
1. Evaluate f (x
0
)
2. Find p(x
0
) by solving (2)
3. Find h
k
by solving (6)
4. Evaluate f (x
k
h
k
)
CONVERGENCE OF HYBRID SPACE MAPPING ALGORITHMS 149
5. Find p(x
k
h
k
) by solving (2)
6. If (7) is false and (8) is true then let x
k1
= x
k
h
k
otherwise x
k1
= x
k
7. Update , w, B and D (only if w < 1)
8. Let k = k 1
9. If not converged then go to 3
The Steps 0, 1 and 2 are an initial phase where a (local) optimizer of H(c(x)) is found and
the initial translation p(x
0
) in the linear approximation to the space mapping (3) is found.
Note that if (7) is true after Step 3, then the algorithm can skip to Step 8, letting x
k1
,

k1
, B
k1
and D
k1
take the values from iteration k and updating w
k
using (11). Hence
we avoid the evaluation of f (x
k
h
k
) and f
/
(x
k
h
k
) in this case.
3. Proof of convergence
We show that the SMTA satises the usual convergence condition for trust region methods.
In the proof we do not need the actual updating scheme of the weights {w
k
], (11), we only
need property (12) below. Similarly, we do not need any properties of the SMA, we only
need that c is bounded (Assumption A2 below). Thus, the proof covers a class of algorithms,
including those presented in Bakr et al. (2001) and Pedersen (2001). Probably also other
pre-processing techniques will suit into this framework.
Throughout this section we use the notations {x
k
], {h
k
], {
k
] and {w
k
] as they are dened
in Section 2.1. It follows from the denition of the weights that they satisfy
w
k
= min{
k1
, 1] o(1) (12)
where o(1) 0 for k .
We make the following assumptions
A1: For each point z R
n
there exist a neighbourhood N(z) R
n
such that
f (x h) = f (x) f
/
(x)h o(|h|), x N(z), h R
n
,
where o is uniform for x N(z).
f
/
is continuous on R
n
.
A2: c is bounded in the region of interest.
A3: H is a convex function on R
m
.
A4: {x
k
] stays in a bounded region in R
n
.
3.1. Prerequisites
The convexity of H implies that it satises a local Lipschitz condition. H is not necessarily
differentiable. However, we can dene a generalized gradient which is a set of points (rather
150 MADSEN AND SNDERGAARD
than always a single point). Since a function which is Lipschitz on a nite dimensional space
is differentiable almost everywhere, the generalized gradient can be dened as follows, see
Clarke (1975) or Clarke (1983), Theorem 2.5.1:
Denition 1. The generalized gradient of H at x, denoted H(x), is the convex hull of the
set of all limits of the form lim H
/
(x e
i
), where H is differentiable at x e
i
and e
i
0
as i ,
H(x) conv

lim
e
i
0
H
/
(x e
i
)

.
It is easily seen, Clarke (1983), Proposition 2.1.1 and Madsen (1985), Proposition 2.1,
that H(x) is non-empty and compact.
Since H is convex the generalized gradient coincides with what is called the subdiffer-
ential in convex analysis. The convexity of H also implies the existence of a directional
derivative H
/
e
(y) for any y, e R
m
, e ,= 0,
H
/
e
(y) lim
t 0
H(y t e) H(x)
t
.
Now consider the composite function F = H f . Since f is smooth F is well dened.
A stationary point of F is dened as follows,
Denition 2. x is a stationary point of F if
0 F(x).
Using the convexity of H and the smoothness of f we can obtain the following characteri-
zation of a stationary point (Madsen, 1985), Proposition 2.15.
Proposition 1. Let x R
n
. Then
0 F(x)

F
/
h
(x) 0 for every direction h R
n
, h ,= 0.
Below we shall use the following 1. order approximation F to a change in F:
Denition 3. Let x, h R
n
. Dene
F(x; h) H( f (x) f
/
(x)h) H( f (x)).
We shall use the following two properties of F:
Proposition 2. For x, h R
n
we have
F(x; t h) = t F
/
h
(x) o(t ), for t > 0.
CONVERGENCE OF HYBRID SPACE MAPPING ALGORITHMS 151
Proof: Madsen (1985): Proposition 2.9.
Proposition 3. For x, h R
n
and 0 t 1 we have
F(x; t h) t F(x; h).
Proof: The result is a simple consequence of assumptions A1 and A3.
3.2. Proof of convergence
The technicalities of the convergence proof for SMTA are contained in the following three
lemmas.
Lemma 1. Let x R
n
be a non-stationary point. Then there exist positive numbers
1
,
2
and
1
such that for x
k
R
n
|x
k
x|
1
and
k

2

S
k
(h
k
)
1

k
if k

k
Proof: Since x is a non-stationary point there exist, by Proposition 1, a direction h such
that F
/
h
(x) < 0. Then it follows from Proposition 2 that there exist a point x d, d = t h,
t > 0, such that F(x; d) < 0.
Dene d
k
by
x
k
d
k
= x d.
If x
k
x then d
k
d. Therefore it follows from the uniform continuity of f , f
/
and H
that there exists a neighbourhood N(x) of x and a number such that
F(x
k
; d
k
) < 0 (13)
when x
k
N(x). Dene
1
> 0 and
2
> 0 such that if |x
k
x|
1
then x
k
N(x) and
|d
k
|
2
.
Let h
t
k
R
n
be a solution to (4) subject to the trust region:
h
t
k
arg min
h
H( f
k
(h))
s.t. x
k
h T
k
(14)
Suppose
k

2
. Let t
k
=
k
/|d
k
| and q
k
= t
k
d
k
. Then x
k
q
k
T
k
and since
H( f
k
(h)) = F(x
k
; h) F(x
k
) we can use the denition of h
t
k
and Proposition 3 to obtain
F

x
k
; h
t
k

F(x
k
; q
k
) t
k
F(x
k
; d
k
). (15)
152 MADSEN AND SNDERGAARD
Since {d
k
] is bounded away from 0 this implies the existence of > 0, independent of k,
such that
F

x
k
; h
t
k


k
(16)
(using (13)). Since H is locally Lipschitz, there exists a constant K such that

x
k
; h
t
k

h
t
k

and thus (16) implies

h
t
k


K

k
. (17)
Now let u
k
= |h
t
k
|/|d
k
|. The arguments showing (15) imply
F

x
k
; h
t
k

u
k
F(x
k
; d
k
).
Using (13) and the denition of F, this implies
H

f
k

h
t
k

u
k
H( f (x
k
)). (18)
Because of (17), property (12), A2, and the boundedness away from 0 of {|d
k
|], we have
the following inequalities for all sufciently large values of k, say k

k,
[w
k
H( f (x
k
))[

4
u
k
,
w
k
H

p
k

h
t
k



4
u
k
,
w
k

1
4
.
Therefore, using the convexity of H and (18), par
H

s
w
k
k

h
t
k

w
k
H

p
k

h
t
k

(1 w
k
)H

f
k

h
t
k


4
u
k
(1 w
k
)(u
k
H( f (x
k
)))
=

4
u
k
u
k
H( f (x
k
)) w
k
u
k
w
k
H( f (x
k
))

4
u
k
H( f (x
k
)).
Since h
k
minimizes H s
w
k
k
subject to the trust region, it follows that
S
k
(h
k
) = H

s
w
k
k

h
k

H( f (x
k
))
H

s
w
k
k

h
t
k

H( f (x
k
))
CONVERGENCE OF HYBRID SPACE MAPPING ALGORITHMS 153

h
t
k

/|d
k
|

k
/|d
k
|,
which proves Lemma 1.
Lemma 2. Let x R
n
be a non-stationary point. Let
1
be dened as in Lemma 1. For
every
3
> 0 there exists
2
> 0 such that
|x
k
x|
1
and
k

3

S
k
(h
k
)
2
if k

k
Proof: Let
4
= min{
2
,
3
],
2
being dened as in Lemma 1. Suppose

h
k
is generated
by (6) from x
k
with the trust region bound

k
=
4
. Then it follows from Lemma 1 that,
for k

k,
S
k
(

h
k
)
1

4
Suppose h
k
is generated from x
k
by (6) with the local bound
k

3
. Then
S
k
(h
k
) S
k
(

h
k
)

1

4
,
which proves Lemma 2.
Lemma 3. If {x
k
] is convergent then the limit point is stationary.
Proof: Suppose x
k
x for k and that x is non-stationary.
From Assumptions A1A3 and (12) we obtain
F(x
k
h
k
) = H( f
k
(h
k
) o(|h
k
|))
= H

s
w
k
k
(h
k
) w
k
(c( p
k
(h
k
)) f
k
(h
k
)) o(|h
k
|)

= H

s
w
k
k
(h
k
)

k
o(1) o(|h
k
|)
= H

s
w
k
k
(h
k
)

k
o(1)
Therefore
F(x
k
) F(x
k
h
k
) = S
k
(h
k
)
k
o(1) (19)
Let
2
be dened as in Lemma 1. Using Lemma 1 if
k

2
and Lemma 2 if
k
>
2
we
nd from (19)
F(x
k
) F(x
k
h
k
) = S
k
(h
k
)(1 o(1))
154 MADSEN AND SNDERGAARD
Therefore Lemma 1 and the rules (7), (8) and (10) for accepting a newpoint and for adjusting

k
imply the existence of > 0, with
2
such that for sufciently large k,

k
x
k1
= x
k
h
k
and
k1
= K
2
|h
k
| (20)
where K
2
> 1.
Equation (20) implies that if
k
, then
k1
>
k
. Hence the sequence of local bounds
must be bounded away from zero,

k

3
> 0 for k = 0, 1, 2, 3, . . . (21)
Therefore an innite number of proposals (x
k
h
k
) are accepted by the algorithm, because
otherwise we would have
k
0 for k (using the fact that the bound is decreased
linearly when a point is not accepted (see (9))). Furthermore, when a proposal (x
k
h
k
) is
accepted and k is sufciently large, then (8), Lemma 2 and (21) imply
F(x
k
) F(x
k
h
k
)

1
[S
k
(h
k
)]

2
> 0.
Since the sequence of function values F(x
k
) is non-increasing we obtain F(x
k
)
for k . This is a contradiction since x
k
x and F is continuous at x. Thus this
assumption is wrong: x is a stationary point.
The following theorem extends the result of Lemma 3.
Theorem 1. Let S be the set of stationary points of (1), S = { x R
n
[ 0 F(x)]. Let
d(x
k
, S) be the distance between x
k
and S. Then
d(x
k
, S) 0 for k
Proof: Suppose d(x
k
, S)0. Then innitely many points x
k
must be bounded away from
S, and hence Assumption A4 implies that the sequence {x
k
] must have a cluster point x
which is not stationary. According to Lemma 3 {x
k
] cannot converge to x. Thus, for all
small > 0 innitely many iterates x
k
must have a distance less than from x and innitely
many must have a distance larger than 2 from x. Let > 0 be chosen smaller than
1
/2,

1
being the bound used in Lemmas 1 and 2. Then we shall prove that if |x
k
x| < and
|x
kp
x| > 2 we have
F(x
k
) F(x
kp
) > 0, (22)
being independent of k and p. Since (22) holds for innitely many values of k, and since
the sequence {F(x
k
)] is non-increasing, we obtain that F(x
k
) for k which
contradicts that F is continuous at x and {x
k
] converges to x. Therefore the result follows
as a consequence of (22).
CONVERGENCE OF HYBRID SPACE MAPPING ALGORITHMS 155
Equation (22) is proved by the following argument: Consider
F(x
k
) F(x
kp
) =
kp1

j =k
[F(x
j
) F(x
j 1
)] , (23)
for k

k, with

k as in Lemma 1. The terms of the sum are non-negative. Suppose that
x
j 1
,= x
j
, i.e. the increment h
j
is accepted. Suppose further that if |x
j
x| 2 then we
can use the Lemmas 1 and 2. We obtain from (8) and these lemmas,
F(x
j
) F(x
j 1
)

1
[S
j
(h
j
)]

j
if
j

2

2
otherwise
(24)
Equation (22) now follows from (23) using (24): Let A
k
be the index set corresponding
to the terms in (23) with x
j
,= x
j 1
. If, for all of these terms, we have
j

2
then

j A
k
[F(x
j
) F(x
j 1
)]

j A
k

j A
k
|h
j
| (25)
The last sum exceeds since h
j
= x
j 1
x
j
for j A
k
, x
j
= x
j 1
for j / A
k
, and since
|x
kp
x
k
| > . Thus the sum in (23) exceeds when
j

2
for all j A
k
. If the last
condition is not fullled then there exists j A
k
with
F(x
j
) F(x
j 1
)

2
so in that case the sum exceeds a positive number which is independent of k and t . Thus
we have proved (22) with
= min{

1
,

2
]
This completes the proof of Theorem 1.
4. Conclusion
We have considered the problem of minimizing functions of the type H f , where f :
R
n
.R
m
is smooth and H : R
m
.Ris convex. It is proved that the hybrid space mapping
algorithm described in Bakr et al. (2001) has guaranteed global convergence to the set of
stationary points of h f . The proof covers a class of hybrid algorithms, including those
presented in Bakr et al. (2001) and Pedersen (2001).
156 MADSEN AND SNDERGAARD
References
M. H. Bakr, J. W. Bandler, K. Madsen, and J. Sndergaard, Reviewof the space mapping approach to engineering
optimization and modelling, Optimization and Engineering vol. 1, pp. 241276, 2000.
M. H. Bakr, J. W. Bandler, K. Madsen, and J. Sndergaard, An introduction to the space mapping technique,
Optimization and Engineering vol. 2, pp. 369384, 2001.
J. W. Bandler, R. M. Biernacki, S. H. Chen, P. A. Grobelny, and R. H. Hemmers, Space mapping technique for
electromagnetic optimization, IEEE Trans. Microwave Theory Tech. vol. 42, pp. 25362544, 1994.
C. G. Broyden, Aclass of methods for solving non-linear simultaneous equations, Math. Comp. vol. 19, pp. 577
593, 1965.
F. H. Clarke, Generalized gradients and applications, Trans. Am. Maths. Society vol. 205, pp. 247262, 1975.
F. H. Clarke, Optimization and Nonsmooth Analysis, Canadian Mathematical Society Series of Monographs and
Advanced Texts, John Wiley & Sons, 1983, pp. 1308.
R. Fletcher, A model algorithm for composite nondifferentiable optimization problems, in D. C. Sorensen, R.
J.-B. Wets, eds., Nondifferentiable and Variational Techniques in Optimization, Mathematical Programming
Study, vol. 17, North-Holland: Amsterdam, 1982.
J. Hald and K. Madsen, Combined LP and quasi-newton methods for non-linear L1 optimization, SIAM J. Num.
Anal. vol. 22, pp. 369384 1985.
K. Madsen, An algorithmfor minimax solution of overdetermined systems of non-linear equations, J. Inst. Math.
Appl. vol. 16, pp. 321328, 1975.
K. Madsen, Minimization of non-linear approximation functions, Dr. Techn. Thesis, Institute for Numerical
Analysis, Technical University of Denmark, 1985, pp. 1141.
F. . Pedersen, Advances on the space mapping optimization method, Master Thesis, IMM-THESIS-200135,
Informatics and Mathematical Modelling, Technical University of Denmark, 2001.
L. N. Vicente, Space mapping: Models, sensitivities, and trust-regions methods, to appear in Optimization and
Engineering, 2003.
R. S. Womersley and R. Fletcher, An algorithm for composite nonsmooth optimization problems, J. Opt. Theo.
Applns. vol. 48, pp. 493523, 1986.

You might also like