Optimization Methods (MFE) : Elena Perazzi

Optimization methods (MFE)
Lecture 03
Elena Perazzi
EPFL
Fall 2018
Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 1 / 28

Today’s topics
Constrained optimization with inequality and equality constraints.
Numerical methods: penalty function method (applies to equality &

inequality constraints), barrier method (applies to inequality
constraints only).
Kuhn-Tucker theory.

Penalty function methods
Numerical method to find the max/min subject to equality or

inequality constraints.
Approximate a constrained optimization problem with an
unconstrained one, then apply standard techniques (Newton, search
methods etc) to find the solution.
Main idea: Add a term to objective function that prescribes a high
cost for the violation of the constraint.
Consider the problem
Minimize{f (x1 , x2 , ...., xn ) s.t. (x1 , ..., xn ) ∈ S} (1)
where f is a continuous function Rn → R and S is a set in Rn .

the idea of the penalty function method is to replace (1) by
Minimize θ(~x , c) ≡ f (x1 , x2 , ...., xn ) + cP(x1 , x2 , ...., xn ) (2)
where c is a positive constant (the penalty parameter) and

P : Rn → R is such that
I P(x1 , x2 , ...., xn ) is continuous
I P(x1 , x2 , ...., xn ) ≥ 0 for every (x1 , x2 , ...., xn ) ∈ Rn
I P(x1 , x2 , ...., xn ) = 0 ⇔ x ∈ S
For large enough c it is clear that the min will be in S.

In the case of inequality constraints,

S = {(x1 , ..., xn ) s.t. gi (x1 , ..., xn ) ≤ 0, i = 1, ..., k,
hj (x, ..., xn ) = 0, j = 1, ..., m}
a useful penalty function is
P(x1 , x2 , ..., xn ) = Σki=1 max{0, gi (x1 , x2 , ..., xn )}2

+ Σm
j=1 hj (x1 , x2 , ..., xn )
2

cP(x)
c=1 c=1
c = 10 c = 10
c = 100 c = 100
a b x
g1 (x) = x − b, g2 (x) = a − x
P(x) = max{0, (a − x)}2 + max{0, (x − b)}2

Example
Consider the problem:
min f (x, y ) = (x − 2)2 + (y − 3)2 s.t. x +y =1 (3)
We build
θ(x, y , c) = (x − 2)2 + (y − 3)2 + c(1 − x − y )2 (4)
The minimum is found by imposing

~ 2x − 4 + 2cx − 2c + 2cy 0
∇f = = (5)
2y − 6 + 2cx + 2cy − 2c 0
For c = 0 the min is x = 2, y = 3 as expected. For c → ∞ the min

is x = 0, y = 1, which satisfies the constraint.

Selecting the penalty parameter
When solving the problem numerically, should we set c to a very large

number, to be sure that the min of the θ(~x , c) belongs to the feasible
region? No! Why not?
Large values of c result in very non-smooth functions and steep

gradients close to the constraint boundaries. This results in huge
convergence difficulties for all standard min-search methods unless
the algorithm starts at a point extremely close to the minimum
being sought.

Algorithm:
Start with a relatively small value of the penalty parameter c at an
infeasible point not too close to the constraint boundary. This will
ensure that no steep gradients are present in the initial optimization
of θ(~x , c). The min of θ(~x , c) will probably not be a feasible point for
the original problem.
Gradually increase c (e.g. ck+1 = ηck ), each time starting the
optimization from the solution of the problem with the previous value
of c. If c increases gradually the solution of the new problem will
never be far from the solution of the previous one. This will make it
easier to find the min of θ(~x , c) from one iteration to the next.
Stop when you find a solution that is in or close enough to the
feasible region.

Barrier function methods
Only for problems with inequality constraints
The penalty function method is an exterior method: start form a
point outside the feasible region, stop if we find a minimum of θ(~x , c)
inside or close to the feasible region.
The barrier function method is an interior method: start form a point
inside the feasible region, set a barrier at the border of the feasible
region to prevent the solution from being infeasible.
With inequality constraints gi ≤ 0, i = 1, ..., k define
1
B(~x , r ) = f (~x ) + r Σki=1 − (6)
gi (~x )
In the feasible region, the extra term on the RHS is positive, and
becomes infinite at the borders of the region, where at least one
constraint is satisfied with equality.

Barrier function methods
Algorithm:
Start with a relatively high value of the barrier parameter r at an point
inside the feasible region, not too close to the constraint boundary.
The solution of this problem will stay inside the feasible region, and
will not approach the region where the barrier term is high.
Gradually decrease r , each time starting the optimization from the
solution of the problem with the previous value of r .
Stop if xk+1 close enough to xk .
The solution will converge to a (local) constrained min of the original
problem from the inside of the feasible region, in a similar way as the
solution of the penalty function problem was converging from the
outside.

Kuhn-Tucker theory

Optimization with inequality constraints: intuition
8
f2 (x)=5-x 2 f1 (x)=5-(x-1)2
6
x*=0, f'(x*)=0 x*=1, f'(x*)=0
5
2
3 f1 (x)=5-(x+1)
x*=0, f'(x*)<0
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Consider the problem: max f (x) s.t. x ≥ 0

In the three cases above f 0 (x ∗ )x ∗ = 0

Consider the problem: min f (~x ) s.t. g (~x ) ≤ 0

~ (x ∗ ) = 0, g (x ∗ ) < 0
Left case: ∇f
~ (x ∗ ) 6= 0, g (x ∗ ) = 0
Right case: ∇f
~ (x ∗ ) = 0
In both cases g (x ∗ )∇f

Remember from previous lectures...
The gradient of a function points in the direction of fastest increase

of the function.
The gradient of a function at any point is perpendicular to the level
curve passing through that point.
At a constrained optimum, the tangent of the level curve passing
through that point coincides with the tangent of the constraint.
(Remember for example the consumer demand problem!)
It follows from the last point that at a constrained optimum, the
gradient of the constraint must be parallel to the gradient of the
function!

Optimization with equality constraints: reminder
Remember the case of optimization with only equality constraints,
e.g. min f (x) s.t. h(x) = 0.
For a minimization or maximization problem: at the optimum
∇L ~ (x ∗ ) + λ∇h(x
~ = ∇f ~ ∗
)=0 (7)
In other words, at the optimum the gradient of the function must be

parallel to that of the constraint, but the sign does not matter. (Sign
of λ not important).
This implies that the directional derivative along the constraint is
zero. This is the important point.
To find a better point, we would like to move in the direction of the
gradient (for a maximization) or in the opposite direction (for a
minimization), but we can’t because this would imply violating the
constraint.
Consider the problem: min f (~x ) s.t. g (~x ) ≤ 0

If x ∗ on the boundary of the feasible region (the constraint is
~ (x ∗ ) must point in the opposite direction than ∇g
“tight”), ∇f ~ (x ∗ ) .

Drawing pictures similar to the one in the last slide it is easy to convince
oneself that at an optimum x ∗ where the constraint binds...
For a maximization problem where the constraint is of the form
~ (x ∗ ) pointing in the same direction as
g(x) ≤ 0, we must have ∇f
~ (x ).
∇g ∗
For a minimization problem where the constraint is of the form

~ (x ∗ ) pointing in the opposite direction
g(x) ≤ 0, we must have ∇f
~ (x ).
than ∇g ∗
For a minimization problem where the constraint is of the form

~ (x ∗ ) pointing in the same direction as
g(x) ≥ 0, we must have ∇f
~ (x ).
∇g ∗
For a maximization problem where the constraint is of the form

~ (x ∗ ) pointing in the opposite direction
g(x) ≥ 0, we must have ∇f
~ (x ).
than ∇g ∗

Optimization with inequality constraints
Again the general form of the problem is
min/max f (x1 , x2 , ...., xn )

s.t. hi (x1 , x2 , ...., xn ) = 0 i = 1, ..., k
gj (x1 , x2 , ...., xn ) ≤ 0 j = 1, ..., m
or gj (x1 , x2 , ...., xn ) ≥ 0 j = 1, ..., m
Now we require that f , hi i = 1, ..., k and gj j = 1, ..., m are C 1 .

As in the case of optimization with only equality constraints, we
demand k < n, but no requirement on m!

1.5
0.5
0
0 0.5 1 1.5
x2 ≤ 1 − x1
x2 ≥ x1
x2 ≥ 0.5
Kuhn-Tucker theory – tools
Main tool: the Lagrangean L!

With all constraints in the form gj ≤ 0
L = f (~x ) + Σki=1 λi hi (~x ) + Σm

j=1 µi gj (~
x) (8)
for a minimization problem; and
L = f (~x ) + Σki=1 λi hi (~x ) − Σm

j=1 µi gj (~
x) (9)
for a maximization problem.

If some of the constraints are in the form gj (x) ≥ 0 define
g̃ (x) ≡ −g (x).
For each point x define J(x) as the Jacobian of the constraints
satisfied with equality at that point – all the hi and a subset of the gj .

Kuhn-Tucker theorem
If x ∗ is a local minimum conditional on gj (x) ≤ 0, j = 1, ..., m and
hi (x) = 0, i = 1, ..., k, and J(x ∗ ) has maximal rank (i.e. rank=number of
active constraints) then there exists a vector of Lagrange multipliers
(λ∗1 , λ∗2 , ...., λ∗k , µ∗1 , ..., µ∗m ) such that
~ x L(x ∗ , λ∗ , µ∗ ) = ∇f
∇ ~ (x ∗ ) + Σk λi ∇h
~ i (x ∗ ) + Σm µi ∇g
~ i (x ∗ ) = 0
i=1 j=1
∂L(x ∗ , λ∗ , µ∗ )
= hi (x ∗ ) = 0
∂λi
∂L(x ∗ , λ∗ , µ∗ )
= gj (x ∗ ) ≤ 0
∂µj
µj ≥ 0
∗ ∗ ∗
∂L(x , λ , µ ) ∗
Σm
j=1 µj = Σm
j=1 gj (x )µj = 0
∂µj
These are the necessary conditions! for a min if the constraint qualification
holds.
Kuhn-Tucker theorem
If x ∗ is a local maximum conditional on gj (x) ≤ 0, j = 1, ..., m and
hi (x) = 0, i = 1, ..., k, and J(x ∗ ) has maximal rank (i.e. rank=number of
active constraints) then there exists a vector of Lagrange multipliers
(λ∗1 , λ∗2 , ...., λ∗k , µ∗1 , ..., µ∗m ) such that
~ x L(x ∗ , λ∗ , µ∗ ) = ∇f
∇ ~ (x ∗ ) + Σk λi ∇h
~ i (x ∗ )−Σm µi ∇g
~ i (x ∗ ) = 0
i=1 j=1
∂L(x ∗ , λ∗ , µ∗ )
= hi (x ∗ ) = 0
∂λi
∂L(x , λ∗ , µ∗ )
∗
= gj (x ∗ ) ≤ 0
∂µj
µj ≥ 0
∗ ∗ ∗
∂L(x , λ , µ ) ∗
Σm
j=1 µj = Σm
j=1 gj (x )µj = 0
∂µj

Complementary slackness conditions
The equations
gj (x ∗ ) ≤ 0
µj ≥ 0
m ∗
Σj=1 gj (x )µj = 0
are called the complementary slackness conditions.

If the j − th constraint is slack, i.e. gj < 0, then we must have that
the corresponding multiplier is zero, µj = 0.
Meaning of the multiplier: gain in f if we relax the constraint. If the
constraint is slack → gain is 0. If the constraint is tight → gain is
positive.

Example
max 3x + 4y s.t. (x 2 + y 2 ) ≥ 4, x ≥ 1

Kuhn-Tucker sufficient conditions
Consider the problem max f (x) s.t. gj (x) ≤ 0, j = 1..., m.

If
At point x ∗ the Kuhn-Tucker necessary conditions are satisfied
At this point the constraint qualifications are satisfied
The function f is concave and each of the constraints in convex
then x ∗ is a local maximum.

Kuhn-Tucker sufficient conditions
Consider the problem max f (x) s.t. gj (x) ≤ 0, j = 1..., m. Let’s write
down the Lagrangean as a function of x, given the value of µ = µ∗ ≥ 0
that satisfies the Kuhn-Tucker equations
L(x|µ = µ∗ ) = f (x) − µ∗ Σm
j=1 gj (x) (10)
f concave, g convex ⇒ L(x) concave. Hence, if x ∗ satisfies the

Kuhn-Tucker first-order conditions
L(x ∗ |µ = µ∗ ) ≥ L(x|µ = µ∗ ) (11)
so
f (x ∗ ) − µ∗ Σm ∗ ∗ m
j=1 gj (x ) ≥ f (x) − µ Σj=1 gj (x) (12)
But since the −µ∗ Σm ∗

j=1 gj (x ) = 0 on the LHS is zero (complementary
slackness conditions), and −µ∗ Σm j=1 gj (x) ≥ 0 for any feasible point,
we can conclude that in the feasible region
f (x ∗ ) ≥ f (x) (13)
Kuhn-Tucker sufficient conditions and saddle points
It turns out that a point (x ∗ , µ∗ ) that satisfies the sufficient
conditions is a saddle point of the Lagrangean: a max with respect to
x and a min with respect to µ.
Z (x, µ∗ ) ≤ Z (x ∗ , µ∗ ) ≤ Z (x ∗ , µ) (14)
The Kuhn-Tucker algorithm can be interpreted as a maximization of
the Lagrangean with respect to x and a minimization with respect to
µ (subject to µ ≥ 0)

Optimization Methods (MFE) : Elena Perazzi

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimization Methods (MFE) : Elena Perazzi

Uploaded by

Copyright:

Available Formats

Optimization methods (MFE)

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 1 / 28

Constrained optimization with inequality and equality constraints.

Numerical methods: penalty function method (applies to equality &

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 2 / 28

Numerical method to find the max/min subject to equality or

Minimize{f (x1 , x2 , ...., xn ) s.t. (x1 , ..., xn ) ∈ S} (1)

where f is a continuous function Rn → R and S is a set in Rn .

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 3 / 28

the idea of the penalty function method is to replace (1) by

Minimize θ(~x , c) ≡ f (x1 , x2 , ...., xn ) + cP(x1 , x2 , ...., xn ) (2)

where c is a positive constant (the penalty parameter) and

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 4 / 28

In the case of inequality constraints,

P(x1 , x2 , ..., xn ) = Σki=1 max{0, gi (x1 , x2 , ..., xn )}2

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 5 / 28

P(x) = max{0, (a − x)}2 + max{0, (x − b)}2

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 6 / 28

Consider the problem:

min f (x, y ) = (x − 2)2 + (y − 3)2 s.t. x +y =1 (3)

θ(x, y , c) = (x − 2)2 + (y − 3)2 + c(1 − x − y )2 (4)

The minimum is found by imposing

For c = 0 the min is x = 2, y = 3 as expected. For c → ∞ the min

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 7 / 28

When solving the problem numerically, should we set c to a very large

Large values of c result in very non-smooth functions and steep

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 8 / 28

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 9 / 28

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 10 / 28

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 11 / 28

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 12 / 28

Consider the problem: max f (x) s.t. x ≥ 0

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 13 / 28

Consider the problem: min f (~x ) s.t. g (~x ) ≤ 0

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 14 / 28

The gradient of a function points in the direction of fastest increase

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 15 / 28

In other words, at the optimum the gradient of the function must be

Consider the problem: min f (~x ) s.t. g (~x ) ≤ 0

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 17 / 28

For a minimization problem where the constraint is of the form

For a minimization problem where the constraint is of the form

For a maximization problem where the constraint is of the form

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 18 / 28

Again the general form of the problem is

min/max f (x1 , x2 , ...., xn )

Now we require that f , hi i = 1, ..., k and gj j = 1, ..., m are C 1 .

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 19 / 28

Main tool: the Lagrangean L!

L = f (~x ) + Σki=1 λi hi (~x ) + Σm

for a minimization problem; and

L = f (~x ) + Σki=1 λi hi (~x ) − Σm

for a maximization problem.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 21 / 28

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 23 / 28

are called the complementary slackness conditions.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 24 / 28

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 25 / 28

Consider the problem max f (x) s.t. gj (x) ≤ 0, j = 1..., m.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 03 Fall 2018 26 / 28

f concave, g convex ⇒ L(x) concave. Hence, if x ∗ satisfies the

But since the −µ∗ Σm ∗