Introduction Methods Line Search Trust Region Least Squares Final Remarks

Lecture 2: Unconstrained Optimization

Prof. Marcelo Escobar

Prof. Jorge Otávio Trierweiler

September 11, 2011


Introduction Methods Line Search Trust Region Least Squares Final Remarks


1 Introduction
Basic Concepts
Indirect Solution

2 Methods

3 Line Search

4 Trust Region

5 Least Squares

6 Final Remarks

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Unconstrained Optimization Problem

minimize f (x)

f (x): objective function

x: decision variables


Introduction Methods Line Search Trust Region Least Squares Final Remarks


Calculus: Smooth Functions

Consider a given function f (x) of n variables.

f (x) is continuous if:

lim f (x) = f (xk )


f (x) is differentiable if the limit exists:

df f (x + d) − f (x)
f 0 (x) = = lim
dx d→0 d

In other words af (x) is differentiable if its derivative is continuous. Smooth

Functions: If the function is continuous and twice differentiable! logo

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Calculus: Gradient Vector and Hessian Matrix

Gradient Vector:  
∂f (x)
 . 
 .. 
∇f (x) =  
∂f (x)

Hessian Matrix:
 ∂ 2 f (x) ∂ 2 f (x)

... ∂x1 ∂xn

H(x) = ∇2 f (x) =  .. .. .. 
 . . .

∂ 2 f (x) ∂ 2 f (x)
∂xn ∂x1 ... ∂xn2

∇f (x) points in the direction of greatest increase of f (x) ; H(x) = H(x)T .

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Calculus: Taylor’s Series

Taylor’s series expansion of f (x) about xk :

f (x) ≈ f (xk ) + ∇f (xk )T d + d T H(xk )d, d = x − xk

Examine the vicinity of xk :

The term ∇f (xk )T d is called the directional derivative;
If ∇f (xk )T d > 0 the function locally increases in the direction d;
If ∇f (xk )T d < 0 the function locally decreases in the direction d;
d T H(xk )d ≥ 0 for d 6= 0 if the matrix H(x) is positive semidefinite;
d T H(xk )d > 0 for d 6= 0 if the matrix H(x) is positive definite.

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Basic Concepts

What is a solution?

Global and Local Minimum:

A point x ∗ is a global minimizer if f (x ∗ ) < f (x) ∀ x

x ∗ is a local minimizer if f (x ∗ ) < f (x) ∀ x ∈ N = x| kx − x ∗ k ≤ δ


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Basic Concepts


A given f (x) is called convex on the interval [x1 , x2 ] if:

f (αx1 + (1 − α)x2 ) ≤ αx1 + (1 − α)x2 , α ∈ [0, 1]

The function is convex if the Hessian is positive definite.

Why is it so important?
For a convex function any local minimizer is a global minimizer


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Basic Concepts

What characterizes a solution?

For a function f (x) of a single variable x:

First order Necessary condition: f 0 (x ∗ ) = 0

Second Order Sufficient Condition: f 00 (x ∗ ) > 0

For a multivariable function f (x):

First order Necessary condition: ∇f (x ∗ ) = 0

Second Order Necessary Condition: d T ∇2 f (x ∗ )d ≥ 0, ∀ d 6= 0;
Second Order Sufficient Condition: d T ∇2 f (x ∗ )d > 0, ∀ d 6= 0;

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Basic Concepts

What characterizes a solution?

f (x) = x 2 f (x) = x 3 f (x) = −x 2

minimum saddle maximum

f 0 (0) = 0 f 00 (0) = 2 f 0 (0) = 0 f 00 (0) = 0 f 0 (0) = 0 f 00 (0) = −2

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Basic Concepts

What characterizes a solution?

f (x) = −x12 − x22 f (x) = x12 − x22

maximum saddle
λ1 = −2; λ2 = −2 λ1 = 2; λ2 = −2 logo

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Indirect Solution


As an example consider the following function:

minimize x 2 + 2x − 1

Analytical Solution:
= 2x − 2 = 0 ⇒ x ∗ = 1
Sufficient conditions:

d 2f
dx 2

this point is a local minimum! logo

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Indirect Solution

Indirect Solution

Suppose that the nonlinear function f (x) = f (x1 , x2 , . . . , xn ) is to be

minimized. The first order necessary conditions are:

 ∂f (x) 
1 ∂x
 ∂f (x) 
 ∂x2 
∇f (x) =  . 

 .. 
∂f (x)

It is a (non)linear system c(x) = 0 with n variables and n equations!!!


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Indirect Solution

Newton’s Method Derivation

Suppose that we want to solve the following system c(x) with n

equations and n variables. Consider an approximation about xk :

c(x) ≈ c(xk ) + ∇c(xk )(x − xk ) = 0

Solving for x:
xk+1 = xk − ∇c(xk )−1 c(xk )

Recall that we want to solve c(x) = ∇f (x) = 0:

xk+1 = xk − ∇2 f (xk )−1 ∇f (xk )

Newton Step (d = xk+1 − xk ): ∇2 f (xk )d = ∇f (xk ) logo

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Indirect Solution

Newton’s Method Example 01

As an example consider the following function:

minimize x 2 − x

The f 0 (x) = 2x − 1 and f 00 (x) = 2. Newton’s method:

xk+1 = xk − f 00 (xk )−1 f 0 (xk ) = xk − (2)−1 (2xk − 1)

For x0 = 3, x1 = 0.5 (optimal solution since f 0 (x1 ) = 0 )

Note: Because the function is quadratic (f 0 (x) is linear), the minimum is

obtained in one step. logo

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Indirect Solution

Newton’s Method Example 02

As an example consider the following function:

minimize x 4 − x + 1

4xk3 −1
Newton’s method: xk+1 = xk − 12xk2
. For x0 = 3:

k xk f (xk )
1 3.0000 79.0000
2 2.0093 15.2904
3 1.3601 3.0619
.. .. ..
. . .
8 0.6300 0.5275

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Methods for Unconstrained Optimization

Direct Methods:
Scanning and Bracketing
Grid search
Stochastic Algorithms

Indirect Methods:
Steepest Descent Method;
Newton Method;
Quasi-Newton Method;
Conjugate Gradient.

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Direct Methods

General Algorithm:
1. Select initial set of point(s);
2. Evaluate objective function at each point;
3. Compare the values and keep the best solution (smallest value);
Outline: Do it forever and then you will surely find the optimal solution.
they are easy to apply and suitable for nonsmooth problems;
they require many objective functions evaluations;
there is no guarantee of convergence, or any proof the point is a
methods of last resource - use them when nothing else works;

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Indirect Methods

Iterative procedure for generating a convergent sequence of points xk

such that f (x0 ) > f (x1 ) > . . . > f (xn∗ ):

Line Search:
Choose a promising direction
Find a step size to minimize along this direction
Trust Region:
Choose the maximum step size
Find a direction which minimizes


Introduction Methods Line Search Trust Region Least Squares Final Remarks


Rates of Convergence

Different types of convergence rates for iterative procedures:

Linear: If there exists a c ∈ (0, 1) such that:

kxk+1 − x ∗ k ≤ c kxk − x ∗ k

p-Order: (often p = 2) If there exists M > 0 such that:

kxk+1 − x ∗ k ≤ M kxk − x ∗ kp

Superlinear: If there exists a zero converging sequence ck ,

limx→∞ ck = 0 such that

kxk+1 − x ∗ k ≤ ck kxk − x ∗ k

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Line Search Strategy

Line Search

0. Initial point xk ;
1. Choose a search direction dk
2. Minimize along that direction to find a new point:

xk+1 = xk + αd
where α is a positive scalar called the step size

Note: This is an iterative procedure, repeated until the convergence is


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Line Search Strategy

Search Direction
The search direction must be a descent direction:

∇f (xk )d T < 0
2.2. Overview of Algorithms 23

See the contour curves of the function (consider d = pk ):


_ f


Prof.Figure 2.6Escobar
Introduction Methods Line Search Trust Region Least Squares Final Remarks

Line Search Strategy

Exact Line Search

The step size α is determine by minimizing a merit function φ(α):

min φ(α) = f (xk + αd)


A quadratic approximation of f (x) can be used as merit function:

1 2 T 2
φ(α) = f (xk ) + α∇f (xk )T d + α d ∇ f (xk )d
For which is possible an analytical solution:

∂φ(α) d T ∇f (xk )
= ∇f (xk )T d + αd T ∇2 f (xk )d = 0 ⇒ α = T 2
∂α d ∇ f (xk )d

Note: Exact line search is computationally expensive. There is a trade-off:

substantial reduction and computational effort. logo

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Line Search Strategy

Line Search

In order to ensure convergence, only sufficient decrease is necessary:

36 Chapter 3. f (x + αd) < f (xk )

L i n e S e a r ck
h Methods

φ (α)

first local

point global minimizer

Figure 3.1 The ideal step length is the global minimizer.

Basic Idea: Try out a sequence of candidates values for α, stopping to accept
iteration by means of a low-rank formula. When p is defined by (3.2) and B is positive
k k
one of these values when
definite, we have certain conditions are satisfied.

pkT ∇fk  −∇fkT Bk−1 ∇fk < 0,

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Line Search Strategy

Inexact Line Search: Armijo’s Condition

38 Chapter 3. Line Search Methods

φ (α) = f(xk+ α pk )

l( α)

acceptable acceptable

Figure 3.3 Sufficient decrease condition.

Sufficient decrease condition (Armijo’s Condition):
for some constant c1 ∈ (0, 1). In other words, the reduction in f should be proportional to
both the step length αk and the directional derivative ∇fkT pk . Inequality T
(3.4) is sometimes
f (x + αd) < f (x ) + c α(x ) d
called the Armijo condition. k 1 k
The sufficient decrease condition is illustrated in Figure 3.3. The right-hand-side of
For some c1 ∈ (0, 1]. In practice, c is chosen to b quite small, say
(3.4), which is a linear function, can be denoted by l(α). The function l(·) has negative slope logo
c1 ∇fkT pk , but because c1 ∈ (0, 1), it lies above the graph of φ for small positive values of
c1 = 1e − 4! α. The sufficient decrease condition states that α is acceptable only if φ(α) ≤ l(α). The
intervals on which this condition is satisfied are shown in Figure 3.3. In practice, c1 is chosen
to be quite small, c1  10−4 .Escobar
Introduction Methods Line Search Trust Region Least Squares Final Remarks

Line Search Strategy

Inexact Line Search: Curvature Condition

The sufficient decrease condition is not enough by
3 . 1 .itself
S t e p to
L e n ensure
g t h 39 that the
algorithm makes reasonable progress.
φ (α) =f(x k+α pk )


acceptable acceptable

In order to avoid short

Figure 3.4 steps, enforce
The curvature condition. the curvature conditions

f (x + αd)T d ≥ c α∇f (x )T d 2 curvature condition

k terminate the line search. The
so it might make sense to k is illustrated in
Figure 3.4. Typical values of c2 are 0.9 when the search direction pk is chosen by a Newton
Typical values of c are 0.9.
or quasi-Newton
2 method, and 0.1 when pk is obtained from a nonlinear conjugate gradient
Prof. decrease
The sufficient Marcelo andEscobar Optimization
curvature conditions of Chemical
are known collectively Processes
Introduction Methods Line Search Trust Region Least Squares Final Remarks

Line Search Strategy

Inexact Line Search: Wolfe Conditions

The sufficient decrease and curvature conditions are known collectively as the
Wolfe conditions:

f (xk + αd) < f (xk ) + c1 α(xk )T d

f (xk + αd)T d ≥ c2 α∇f (xk )T d

with 0 < c1 < c2 < 1.


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Line Search Strategy

Sufficient Decrease and Backtracking

Backtracking Line Search Algorithm

Chose ᾱ > 0, ρ, c ∈ (0, 1); Set α ← ᾱ
repeat until f (xk + αd) ≤ f (xk ) + c∇f (xk )T d
α ← ρα
end (repeat)
Terminate with αk = α

The initial step length ᾱ is chosen to be 1 in Newton and quasi-Newton
methods, but can have different values in other algorithms such as
steepest descent or conjugate gradient;
An acceptable step length will be found after a finite number of trials!!!

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Steepest Descent

Steepest Descent Direction:

d = −∇f (xk )

The gradient is the vector that gives the (local) direction of the
greatest increase in f (x).

If α is sufficiently small, it always converge;

It has a linear rate of convergence;
It can be very slow for highly nonlinear problems;
it might zigzag close to the optimum solution

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Steepest Descent
As an example consider the following function:

minimize f (x) = x 3 − 100x



This function has a maximum at -5.774 and a minimum at 5.774.

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Steepest Descent

The Gradient:
f 0 (x) = 3x 2 − 100
Steepest Descent:

xk+1 = xk − αk (3xk2 − 100)

For x0 = 10 and a tolerance of 1e−3 (Matlab File - Ex01a.m):

If αk = 0.1 the procedure diverges;
If αk = 0.01 the procedure converges in 27 iterations;
It exact line search is used the procedure converges in 1 iteration;

Note: The choice of α is essential for convergence. However, exact line

search is computationally expensive! logo

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Newton’s Method Derivation

Consider a second order approximation of f (x) about xk :

f (x) ≈ f (xk ) + d T ∇f (xk ) + d T ∇2 f (xk )d
What is the optimal direction to minimize f (x)?

∂f (x)
= ∇f (xk ) + ∇2 f (xk )d = 0

It results in:
d=-∇2 f (xk )−1 ∇f (xk )

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Newton’s Method Interpretations

Newton Direction:
d = −∇2 f (xk )−1 ∇f (xk )

d minimizes the second order approximation of f (x):

f (x) = f (xk ) + ∇f (xk )T d + d T ∇2 f (xk )d
d solves linearized optimality conditions for min f (x):

∇f (x) = ∇f (xk ) + ∇2 f (xk )d = 0

Note: It is important to note that Newton’s Method has a implicit α = 1.

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Newton Example

The Gradient and the Hessian are:

f 0 (x) = 3x 2 − 100 f 00 (x) = 6x

Newton Step:
3xk2 − 100
xk+1 = xk −
For x0 = 10 and a tolerance of 1e−3 (Matlab File - Ex01a.m):
If αk = 1 the procedure converges in 4 iterations;
It exact line search is used the procedure converges in 1 iteration;

Note: The Newton’s Method may diverge if x0 is far from optimum!


Introduction Methods Line Search Trust Region Least Squares Final Remarks


Newton Direction

Newton Direction:
d = −∇2 (fk )−1 ∇f (xk )

Close to the solution it converges fast - quadratic convergence;
In order to converge from poor starting points - use line search;
The newton direction may be not a descent direction even for
sufficiently small α!


Introduction Methods Line Search Trust Region Least Squares Final Remarks


Numerical Example 02
As an example consider the following function:

minimize f (x) = (x1 − 3)2 + 9(x2 − 5)2



5.5 5

5 2





0 1 2 3 4 5 6 7


This function has a minimum at (x1∗ , x2∗ ) = (3, 5).

Introduction Methods Line Search Trust Region Least Squares Final Remarks


Newton Example

The Gradient and the Hessian are:

2(x1 − 3) 2 0
∇f (x) = H(x) =
18(x2 − 3) 0 18

Line Search:
xk+1 = xk − H(xk )−1 ∇f (xk )

Consider the initial point x0 = (1, 1)T (Matlab File - Ex02a.m):

For steepest descent with α = 0.1 converges in 51 iterations;
For steepest descent with exact α converges in 6 iterations;
For Newton’s Method with exact α converges in 1 iteration;

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Practical Methods


If the Hessian is not positive definite the Newton direction is not a

descent direction:

Levenberg-Marquardt Method (Hessian Modification):

H̃(xk ) = H(xk ) + βk I , βk > −min(λi )

where λi are the eigenvalues of H(xk ).

Enjoy quadratic convergence!!!


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Practical Methods

Alternatively the Hessian can be approximated:

∇f (xk+1 ) ≈ ∇f (xk ) + Bk dk

This formula is referred as Secant equation:

Bk dk ≈ yk

where dk = xk+1 − xk and yk = ∇f (xk+1 ) − ∇f (xk )

The Hessian approximation must satisfy the secant equation:

must be positive definite
must be symmetric
as close to Bk−1 as possible (no wild changes) logo

Note: This matrix is not uniquely defined (use update formulas).

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Practical Methods

The most popular update formulas are:

Broyden-Fletcher-Goldfarb-Shanno (BFGS):

yk ykT Bk dk (Bk dk )T
Bk+1 = Bk + T

yk dk dkT Bk dk

Davidon-Fletcher-Powell (DFP):

yk ykT 
yk dkT
dk ykT

Bk+1 = + I− ykT dk
Bk I − ykT dk
ykT dk

Note: For the first iteration B0 = I . Numerical experiments have shown that logo

BFGS formula’s performance is superior over DFP formula

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Practical Methods


Alternatively, the inverse of the Hessian can be updated:


−1 (dkT yk + ykT Bk−1 yk )dk dkT Bk−1 yk dkT + dk ykT Bk−1

Bk+1 = Bk−1 + −
(ykT dk )2 dkT yk

−1 Bk−1 yk ykT Bk−1 dk dkT
Bk+1 = Bk−1 − +
ykT Bk−1 yk ykT dk


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Practical Methods

The methods presented in this section differ by the search direction:

d = −Bk−1 ∇f (xk )

Steepest Descent Bk = I
Newton Method Bk = H (Hessian)
Quasi Newton Method Bk (Hessian approximation)

See the file Ex02a.m (Using Quasi-Newton Method, convergence in 2 logo

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Practical Methods

Quasi-Newton: Example
As an example consider the function (toy02.m):

minimize f (x1 , x2 ) = αexp(−β)

x1 ,x2



0.7 −0.5



0.4 1

0.3 −5

0.2 −3


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

This function has a minimum at (x1∗ , x2∗ ) = (0.7395, 0.3144).

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Practical Methods

Quasi newton Example

Consider the initial point x0 = (0.8, 0.2)T :

Ex03b.m: Quasi Newton with Inexact Line Search 8 iterations;
Ex03b.m: Steepest Descent with Inexact Line Search 46 iterations;

See also different problems:

Ex03a.m: Quasi Newton with Exact Line Search - 14 iterations;
Ex03c.m: Quasi Newton with Inexact Line Search - 05 iterations;


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Practical Methods

Conjugate Gradient (CG)

A set of nonzero vectors (d1 , . . . , dn ) are said to be conjugate with

respect to the symmetric positive definite matrix H if:

diT Hdj = 0 ∀i 6= j

Orthogonality is a special case of conjugacy.

A quadratic problem is minimized in n steps using exact line search with
conjugate directions. Since the directions are linearly independent we
can write:

x ∗ = α0 d0 + . . . + αn−1 dn−1
Note: For non-quadratic functions more iterations may be necessary!

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Practical Methods

Conjugate Gradient: Fletcher-Reeves Method

Fletcher and Reeves showed an extension for nonlinear functions:

dk+1 = −∇f (xk+1 ) + βk dk

∇f (xk+1 )T ∇f (xk+1 )
βk =
∇f (xk )T ∇f (xk )

For the first iteration d0 = −∇f (x0 ).

There are many CG methods that differ by the parameter βk .

See Ex03cc.m, convergence in 7 iterations. logo

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Practical Methods

Summarizing: Line Search Strategy

General Algorithm:
0. Guess an initial point x0 and d0 = −∇f (x0 );
1. At xk for the given direction dk select αk (exact, inexact);
2. Set xk+1 = xk + αk dk ;
3. Estimate dk+1
(Steepest Descent, Newton, Quasi-Newton, Conjugate Gradient);

To ensure progress toward the minimum, make sure that:

dk is a descent direction;
αk is such that f (xk + αk dk ) < f (xk )

Have fun!!!

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Trust Region

Trust Region

Consider a second order approximation:

φ(d) = f (x) ≈ f (xk ) + ∇f (xk )T d + d T Hd
To obtain each step, we seek the solution of the subproblem:
minimize φ(d) = f (xk ) + ∇f (xk )T d + d T H(xk )d
kdk≤∆k 2

where ∆k is the trust region radius. It can be shown that the solution is:

(H(xk ) − λI )d ∗ = −∇f (xk )

For some λ ≥ 0 sucht that (H(xk ) − λI ) is positive definite.


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Trust Region

Trust Region: Solution of the Subproblem

If H(xk ) is positive definite and kdN k ≤ ∆k , the solution of the

subproblem is:
d ∗ = dN = H(xk )−1 ∇f (xk )

d ∗ = (H(xk ) + λI )−1 ∇f (xk ), kd ∗ k ≤ ∆k

When λ varies between 0 and ∞, the search direction d(λ) will vary
between the Newton direction dN and a multiple of −∇f (x k ).


◮ Furthermore, direct
Introduction Methods Line Search Trust Region Least Squares Final Remarks
λ(∆k − kp ∗ k) = 0. multip
c 2007 Niclas Börlin, CS, UmU Nonlinear Optimization; Trust-region methods
Trust Region

Trust Region: Solution of the Subproblem Line search and trust-region

Trust-region methods
The trust-region model
Least Squares; Levenberg-Marquardt
The trust-region subproblem
The Dogleg algorithm
The trust-region algorithm

The redu

◮ To en

pN is defi
◮ If the
p(λ) increa
◮ If the
◮ Furth
−g is not

c 2007 Niclas Börlin, CS, UmU Nonlinear Optimization; Trust-region methods

where pN is the Newton direction and −g = −∇f (x).

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Trust Region

Trust Region: Alternatives

Steepest Descent: first order approximation,

φ(d) = f (x) ≈ f (xk ) + ∇f (xk )T d

analytical solution: dk = − k∇f∆(xk k )k ∇f (xk )

Quasi Newton: second order approximation,

φ(d) = f (x) ≈ f (xk ) + ∇f (xk )T d + d T Bk d
where Bk is a Hessian approximation (DFP, BFGS).

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Trust Region

Trust Region Radius

Given a direction dk define a ratio:

f (xk ) − f (xk + dk )
ρk =
φ(0) − φ(dk )

the numerator is called actual reduction and the denominator predicted


ρk is always non-negative;
if ρk is close to one there is good agreement with the quadratic
approximation and then the trust region is expanded;
if ρk is close to zero, the trust region radius is shrink;

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Trust Region

Trust Region: Algorithm

Chose x0 ,∆¯ > 0, ∆0 ∈ (0, ∆),toland
¯ η ∈ (0, 0.25)
For k = 0, 1, 2, . . .
If k∇f (xk )k < tol, STOP with solution xk ;
Obtain dk by solving the subproblem (exact, or any approximation);
Evaluate ρk ;
If ρk < 0.25, then ∆k = 0.25∆k
If ρk > 0.25 and dk = ∆k , then ∆k+1 = min(2∆k , ∆)
else, ∆k = ∆k
If ρk > η, then xk+1 = xk + dk
else, xk+1 = xk
end (for)
Terminate with the trust region radius for the next iteration (∆k+1 ).

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Trust Region

Subproblem Solution
In order to converge, it is not necessary the optimal solution of the
subproblem. Only a crude approximation with sufficient reduction is

Cauchy Point (slow): Solve the linear approximation. Note that it

is steepest descent method with αk = ∆k :

∇f (xk )
d =− ∆k
k∇f (xk )k

The Dogleg method (fast - superlinear) : d = dC + τ (dN − dC )

∇f (xk )T ∇f (xk )
dC = − ∇f (xk ) dN = H(xk )−1 ∇f (xk )
∇f (xk )T Bk ∇f (xk )
The factor τ depends on dC ,dN and ∆k .

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Trust Region

Trust Region Example

Consider the example min f (x) = α exp(−beta) (toy02.m):

Consider the initial point x0 = (0.8, 0.2)T :
Ex04a.m: For Cauchy Point (steepest descent) it converges in 44
Ex04c.m: Newton with exact solution it converges in 6 iterations;
Ex04c2.m: Quasi Newton with exact solution it converges in 17
Ex04d.m: Newton with dogleg it converges in 6 iterations;
Ex04e.m: Quasi Newton with dogleg it converges in 15 iterations;


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Least Squares

Least Square Problems

Approximate solution of overdetermined systems minimizing the sum of
the squares of the errors.

minimize f (β) = r (β)T r (β)


where r is the error function and β are the unknown parameters.

Curve Fitting:
r (β) = y − f (x, β)
∂f ∂r
= 2r T =0
∂β ∂β

Note: For m equations we can determine at most m parameters.

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Least Squares

Least Square Problems

For linear error function: r = y − A(x)β

Analytical Solution: β ∗ = (AT A)−1 AT y

Fitting of Linear functions:

ri = yi − (β1 xi + β2 )

Fitting of Linear functions in terms of parameters (e.g.


ri = yi − (β1 f (xi ) + . . . + βn f (xi ))


Note: Transformation is useful, e.g. y = β1 e β2 x ; ln(y ) = ln(β1 ) + β2 x

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Least Squares

Least Square Problems

It is possible enforce different weights for each equation.

minimize f (β) = (Wr (β))T Wr (β)


where W = diag(w1 , . . . , wm ) and wi the weight of equation i .

For linear r , there is analytical solution:

β ∗ = (AT W T WA)−1 AT W T y

Nonlinear Least Square: Guass Newton ∆β ∗ = (J T J)−1 AT y , where

J = ∇β r .

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Least Squares

Applications in Chemical Engineering

Kinetic models, e.g.

k1 CA − k2 CB
−rA =
1 − k 3 C A CB
Thermodynamic models, e.g.
ln(P SAT ) = A −
T +C
Empirical models,e.g.
h = kRe α Pr 3


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Least Squares


Example 1: For a given data (Exa1.m) estimate A and B:

log(P SAT ) = A −
Example 2: For given experimental data (Exa2.m) estimate α and β:

Nu = αRe β


Introduction Methods Line Search Trust Region Least Squares Final Remarks

Final Remarks

Final Remarks

Stationary points can be found by solving a non-linear system;

The solution can be found by direct or indirect methods;
Direct methods uses only objective function evaluations;
Indirect solution strategies: Line Search and Trust Region;
An important application of unconstrained optimization is curve
fitting and parameter estimation for stationary models.

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Final Remarks

Further Readings

Numerical Optimization. Nocedal

and Wright (1999):
Chapters 1, 2, 3, 4 and 5.
Optimization of Chemical Processes.
Edgar, Himmelblau and Lasdon
(1995) Chapters 5 and 6.
Linear and Nonlinear Programming.
Luenberguer (2008): Chapters 7, 8,
9 and 10;
Nonlinear Programming. Biegler
(2010): Chapters 2 and 3.

Introduction Methods Line Search Trust Region Least Remarks

Final Remarks

Help, Comments, Suggestions

l Information
I f ti

Just in Case Contact

Marcelo Escobar Aragão

(Teaching Assistant)

Department of Chemical Engineering

Federal University of Rio Grande do Sul
Porto Alegre ‐ RS
Phone: 55 51 3308 4163
Mobile: 55 51 9684 4213

Presented by: Course 22/08/2011 11:39
Marcelo Escobar Slide 23/38

