UnconstrainedOptimization III

Numerical Optimization
Unconstrained Optimization
Shirish Shevade
Computer Science and Automation
Indian Institute of Science
Bangalore 560 012, India.
NPTEL Course on Numerical Optimization
Shirish Shevade Numerical Optimization

Unconstrained Minimization
Let f : Rn → R. Consider the optimization problem:
min f (x)
s.t. x ∈ Rn
Assumption: f is bounded below.
Definition
x∗ ∈ Rn is said to be a local minimum of f if there is a δ > 0
such that f (x∗ ) ≤ f (x) ∀ x ∈ B(x∗ , δ).

20
15
f(x1,x2)
10
0
3
2
3
1 2
0 1
−1 0
−1
−2 −2
−3 −3
x2 x1
Surface Plot : f (x) = x12 + x22

0.5
f(x1,x2)
−0.5
2
1 2
1
0
0
−1
−1
−2 −2
x2 x1
Surface Plot : f (x) = x1 exp(−x12 − x22 )

3
2.5
1.5
1 .1 0.1
−0 −0.2 0.2
−0.1
−0
0.1
x2
.3 0.3
0.5
.4
0.
−0.2
0.2
−0
4
−0
−0
.3 3
0.
0.1
−0.5
.1
.2
0.2
−0
−1
−0.1 0.1
−1.5
−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
x1
Contour Plot : f (x) = x1 exp(−x12 − x22 )

The function value does not decrease in the local
neighbourhood of a local minimum.
Definition
Let x̄ ∈ Rn . If there exists a direction d ∈ Rn and δ > 0 such
that f (x̄ + αd) < f (x̄) for all α ∈ (0, δ), then d is said to be a
descent direction of f at x̄.
Result
Let f ∈ C 1 and x̄ ∈ Rn . Let g(x̄) = ∇f (x̄). If g(x̄)T d < 0 then,
d is a descent direction of f at x̄.
Proof.
Given g(x̄)T d < 0. Now, f ∈ C 1 ⇒ g ∈ C 0 .
∴ ∃ δ > 0 3 g(x)T d < 0 ∀ x ∈ LS(x̄, x̄ + δd).
Choose any α ∈ (0, δ). Using first order truncated Taylor series,
f (x̄ + αd) = f (x̄) + αg(x)T d where x ∈ LS(x̄, x̄ + αd)

∴ f (x̄ + αd) < f (x̄) ∀ α ∈ (0, δ)
⇒ d is a descent direction of f at x̄

First Order Necessary Conditions (Unconstrained
Minimization)
Let f : Rn → R, f ∈ C 1 . If x∗ is a local minimum of f , then
g(x∗ ) = 0.
Proof.
Let x∗ be a local minimum of f and g(x∗ ) 6= 0.
Choose d = −g(x∗ ).
∴ g(x∗ )T d = −g(x∗ )T g(x∗ ) < 0

g(x∗ )T d < 0 ⇒ d is a descent direction of f at x∗
⇒ x∗ is not a local minimum, a contradiction.
Therefore, g(x∗ ) = 0.
Provides a stopping condition for an optimization algorithm

Example:
Consider the problem
∆
min f (x) = x1 exp(−x12 − x22 )

exp(−x12 − x22 )(1 − 2x12 )
g(x) = .
exp(−x12 − x22 )(−2x1 x2 )
g(x) = 0 at ( √12 , 0)T and (− √12 , 0)T .

0.5
f(x1,x2)
−0.5
2
1 2
1
0
0
−1
−1
−2 −2
x2 x1
The function has a local minimum at (− √12 , 0)T and a local

maximum at ( √12 , 0)T

∆
min f (x) = x1 exp(−x12 − x22 )

exp(−x12 − x22 )(1 − 2x12 )
g(x) = .
exp(−x12 − x22 )(−2x1 x2 )
g(x) = 0 at ( √12 , 0)T and (− √12 , 0)T .
If g(x∗ ) = 0, then x∗ is a stationary point.
Need higher order derivatives to confirm that a stationary
point is a local minimum

Second Order Necessary Conditions
Let f : Rn → R, f ∈ C 2 . If x∗ is a local minimum of f , then
g(x∗ ) = 0 and H(x∗ ) is positive semi-definite.
Proof.
Let x∗ be a local minimum of f . From the first order necessary
condition, g(x∗ ) = 0.
Assume H(x∗ ) is not positive semi-definite. So, ∃ d such that
dT H(x∗ )d < 0. Since H is continuous near x∗ , ∃ δ > 0 such
that dT H(x∗ + αd)d < 0 ∀ α ∈ (0, δ).
Using second order truncated Taylor series around x∗ , we have
for all α ∈ (0, δ),
1
f (x∗ + αd) = f (x∗ ) + αg(x∗ )T d + α2 dT H(x̄)d
2
where x̄ ∈ LS(x∗ , x∗ + αd)
⇒ f (x∗ + αd) < f (x∗ )
∴ x∗ is not a local minimum, a contradiction.
Second Order Sufficient Conditions
Let f : Rn → R, f ∈ C 2 . If g(x∗ ) = 0 and H(x∗ ) is positive
definite, then x∗ is a strict local minimum of f .
Proof.
Since H is continuous and positive definite near x∗ , ∃ δ > 0
such that H(x) is positive definite for all x ∈ B(x∗ , δ).
Choose some x ∈ B(x∗ , δ). Using second order truncated Taylor
series,
1
f (x) = f (x∗ ) + g(x∗ )T (x − x∗ ) + (x − x∗ )T H(x̄)(x − x∗ )
2
where x̄ ∈ LS(x, x∗ ).
Since (x − x∗ )T H(x̄)(x − x∗ ) > 0 ∀ x ∈ B(x∗ , δ),
f (x) > f (x∗ ) ∀ x ∈ B(x∗ , δ).
This implies that x∗ is a strict local minimum.

Example:
∆
min f (x) = x1 exp(−x12 − x22 )

exp(−x12 − x22 )(1 − 2x12 )
g(x) = .
exp(−x12 − x22 )(−2x1 x2 )
g(x) = 0 at x∗1 T = ( √12 , 0)T and x∗2 T = (− √12 , 0)T .
√ 1

2 2 exp(− ) √ 0
H(x∗2 ) = 2 is positive
0 2 exp(− 12 )
definite ⇒ x∗2 is a strict local minimum
√
−2 2 exp(− 12 )

∗ √ 0
H(x1 ) = is negative
0 − 2 exp(− 21 )
definite ⇒ x∗1 is a strict local maximum

Example:
∆
min f (x) = (x2 − x12 )2 + x15
4
5x1 − 4x1 (x2 − x12 )
g(x) = .
2(x2 − x12 )
Stationary Point: (0, 0)T
Hessian matrix at (0, 0)T :

0 0
0 2
Hessian is positive semi-definite at (0, 0)T ; (0, 0)T is

neither a local minimum nor a local maximum of f (x).

Example:
∆
min f (x) = x12 + exp(x1 + x2 )

2x1 + exp(x1 + x2 )
g(x) = .
exp(x1 + x2 )
Need an iterative method to solve g(x) = 0 .

An iterative optimization algorithm generates a sequence
{xk }k≥0 , which converges to a local minimum.
Unconstrained Minimization Algorithm

(1) Initialize x0 , k := 0.
(2) while stopping condition is not satisfied at xk
(a) Find xk+1 such that f (xk+1 ) < f (xk ).
(b) k := k + 1
endwhile
Output : x∗ = xk , a local minimum of f (x).

Unconstrained Minimization Algorithm
(1) Initialize x0 , k := 0.
(2) while stopping condition is not satisfied at xk
(a) Find xk+1 such that f (xk+1 ) < f (xk ).
(b) k := k + 1
endwhile
Output : x∗ = xk , a local minimum of f (x).
How to find xk+1 in Step 2(a) of the algorithm?

Which stopping condition can be used?
Does the algorithm converge? If yes, how fast does it
converge?
Does the convergence and its speed depend on x0 ?

Stopping Conditions for a minimization problem:
kg(xk )k = 0 and H(xk ) is positive semi-definite
Practical Stopping conditions
Assumption: There are no stationary points
kg(xk )k ≤
kg(xk )k ≤ (1 + |f (xk )|)
f (xk ) − f (xk+1 )
≤
|f (xk )|

Speed of Convergence
Assume that an optimization algorithm generates a

sequence {xk }, which converges to x∗ .
How fast does the sequence converge to x∗ ?
Definition
The sequence {xk } converges to x∗ with order p if
kxk+1 − x∗ k
lim = β, β < ∞
k→∞ kxk − x∗ kp
Asymptotically, kxk+1 − x∗ k = βkxk − x∗ kp

Higher the value of p, faster is the convergence.

(1) p = 1, 0 < β < 1 (Linear Convergence)
Some Examples:
β = .1, kx0 − x∗ k = .1
Norms of kxk − x∗ k : 10−1 , 10−2 , 10−3 , 10−4 , . . .
β = .9, kx0 − x∗ k = .1
Norms of kxk − x∗ k : 10−1 , .09, .081, .0729, . . .
(2) p = 2, β > 0 (Quadratic Convergence)
Example:
β = 1, kx0 − x∗ k = .1
Norms of kxk − x∗ k : 10−1 , 10−2 , 10−4 , 10−8 , . . .
(3) Suppose an algorithm generates a convergent sequence
{xk } such that
kxk+1 − x∗ k kxk+1 − x∗ k
lim = 0 and lim =∞
k→∞ kxk − x∗ k k→∞ kxk − x∗ k2
then this convergence is called superlinear convergence


UnconstrainedOptimization III

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UnconstrainedOptimization III

Uploaded by

Copyright:

Available Formats

Numerical Optimization

NPTEL Course on Numerical Optimization

Shirish Shevade Numerical Optimization

Let f : Rn → R. Consider the optimization problem:

Assumption: f is bounded below.

Shirish Shevade Numerical Optimization

Surface Plot : f (x) = x12 + x22

Shirish Shevade Numerical Optimization

Surface Plot : f (x) = x1 exp(−x12 − x22 )

Shirish Shevade Numerical Optimization

Contour Plot : f (x) = x1 exp(−x12 − x22 )

f (x̄ + αd) = f (x̄) + αg(x)T d where x ∈ LS(x̄, x̄ + αd)

Shirish Shevade Numerical Optimization

∴ g(x∗ )T d = −g(x∗ )T g(x∗ ) < 0

Provides a stopping condition for an optimization algorithm

Shirish Shevade Numerical Optimization

The function has a local minimum at (− √12 , 0)T and a local

Shirish Shevade Numerical Optimization

Shirish Shevade Numerical Optimization

Since (x − x∗ )T H(x̄)(x − x∗ ) > 0 ∀ x ∈ B(x∗ , δ),

f (x) > f (x∗ ) ∀ x ∈ B(x∗ , δ).

This implies that x∗ is a strict local minimum.

Shirish Shevade Numerical Optimization

Hessian is positive semi-definite at (0, 0)T ; (0, 0)T is

Shirish Shevade Numerical Optimization

Shirish Shevade Numerical Optimization

Unconstrained Minimization Algorithm

Shirish Shevade Numerical Optimization

How to find xk+1 in Step 2(a) of the algorithm?

Shirish Shevade Numerical Optimization

Practical Stopping conditions

Assumption: There are no stationary points

kg(xk )k ≤ (1 + |f (xk )|)

Shirish Shevade Numerical Optimization

Assume that an optimization algorithm generates a

Asymptotically, kxk+1 − x∗ k = βkxk − x∗ kp

Shirish Shevade Numerical Optimization

then this convergence is called superlinear convergence

You might also like

kg(xk )k ≤ (1 + |f (xk )|)