Lecture2 Unconstrained Optimization

Introduction Methods Line Search Trust Region Least Squares Final Remarks
Lecture 2: Unconstrained Optimization
Prof. Marcelo Escobar
PPGEQ – UFRGS
Prof. Jorge Otávio Trierweiler
September 11, 2011
logo
Prof. Marcelo Escobar Optimization of Chemical Processes

Outline
1 Introduction
Basic Concepts
Indirect Solution
2 Methods
3 Line Search
4 Trust Region
5 Least Squares
6 Final Remarks
logo

Introduction
Unconstrained Optimization Problem
minimize f (x)
x
f (x): objective function

x: decision variables
logo

Introduction
Calculus: Smooth Functions
Consider a given function f (x) of n variables.
f (x) is continuous if:
lim f (x) = f (xk )

x→xk
f (x) is differentiable if the limit exists:
df f (x + d) − f (x)
f 0 (x) = = lim
dx d→0 d
In other words af (x) is differentiable if its derivative is continuous. Smooth

Functions: If the function is continuous and twice differentiable! logo

Introduction
Calculus: Gradient Vector and Hessian Matrix
Gradient Vector:  
∂f (x)
∂x1
 . 
 .. 
∇f (x) =  
∂f (x)
∂xn
Hessian Matrix:
 ∂ 2 f (x) ∂ 2 f (x)

∂x12
... ∂x1 ∂xn

H(x) = ∇2 f (x) =  .. .. .. 
 . . .


∂ 2 f (x) ∂ 2 f (x)
∂xn ∂x1 ... ∂xn2
logo
∇f (x) points in the direction of greatest increase of f (x) ; H(x) = H(x)T .

Introduction
Calculus: Taylor’s Series
Taylor’s series expansion of f (x) about xk :
1
f (x) ≈ f (xk ) + ∇f (xk )T d + d T H(xk )d, d = x − xk
2
Examine the vicinity of xk :

The term ∇f (xk )T d is called the directional derivative;
If ∇f (xk )T d > 0 the function locally increases in the direction d;
If ∇f (xk )T d < 0 the function locally decreases in the direction d;
d T H(xk )d ≥ 0 for d 6= 0 if the matrix H(x) is positive semidefinite;
d T H(xk )d > 0 for d 6= 0 if the matrix H(x) is positive definite.
logo

Basic Concepts
What is a solution?
Global and Local Minimum:

A point x ∗ is a global minimizer if f (x ∗ ) < f (x) ∀ x
x ∗ is a local minimizer if f (x ∗ ) < f (x) ∀ x ∈ N = x| kx − x ∗ k ≤ δ

logo

Basic Concepts
Convexity
A given f (x) is called convex on the interval [x1 , x2 ] if:
f (αx1 + (1 − α)x2 ) ≤ αx1 + (1 − α)x2 , α ∈ [0, 1]
The function is convex if the Hessian is positive definite.
Why is it so important?
For a convex function any local minimizer is a global minimizer
logo

Basic Concepts
What characterizes a solution?
For a function f (x) of a single variable x:
First order Necessary condition: f 0 (x ∗ ) = 0

Second Order Sufficient Condition: f 00 (x ∗ ) > 0
For a multivariable function f (x):
First order Necessary condition: ∇f (x ∗ ) = 0

Second Order Necessary Condition: d T ∇2 f (x ∗ )d ≥ 0, ∀ d 6= 0;
Second Order Sufficient Condition: d T ∇2 f (x ∗ )d > 0, ∀ d 6= 0;
logo

Basic Concepts
f (x) = x 2 f (x) = x 3 f (x) = −x 2
minimum saddle maximum

f 0 (0) = 0 f 00 (0) = 2 f 0 (0) = 0 f 00 (0) = 0 f 0 (0) = 0 f 00 (0) = −2
logo

Basic Concepts
f (x) = −x12 − x22 f (x) = x12 − x22
maximum saddle
λ1 = −2; λ2 = −2 λ1 = 2; λ2 = −2 logo

Indirect Solution
Example
As an example consider the following function:
minimize x 2 + 2x − 1
x
Analytical Solution:
df
= 2x − 2 = 0 ⇒ x ∗ = 1
dx
Sufficient conditions:
d 2f
=2>0
dx 2
this point is a local minimum! logo

Indirect Solution
Indirect Solution
Suppose that the nonlinear function f (x) = f (x1 , x2 , . . . , xn ) is to be

minimized. The first order necessary conditions are:
 ∂f (x) 
1 ∂x
 ∂f (x) 
 ∂x2 
∇f (x) =  . 

=0
 .. 
∂f (x)
∂xn
It is a (non)linear system c(x) = 0 with n variables and n equations!!!

logo

Indirect Solution
Newton’s Method Derivation
Suppose that we want to solve the following system c(x) with n

equations and n variables. Consider an approximation about xk :
c(x) ≈ c(xk ) + ∇c(xk )(x − xk ) = 0

Solving for x:
xk+1 = xk − ∇c(xk )−1 c(xk )
Recall that we want to solve c(x) = ∇f (x) = 0:
xk+1 = xk − ∇2 f (xk )−1 ∇f (xk )
Newton Step (d = xk+1 − xk ): ∇2 f (xk )d = ∇f (xk ) logo

Indirect Solution
Newton’s Method Example 01
minimize x 2 − x
x
The f 0 (x) = 2x − 1 and f 00 (x) = 2. Newton’s method:
xk+1 = xk − f 00 (xk )−1 f 0 (xk ) = xk − (2)−1 (2xk − 1)
For x0 = 3, x1 = 0.5 (optimal solution since f 0 (x1 ) = 0 )
Note: Because the function is quadratic (f 0 (x) is linear), the minimum is

obtained in one step. logo

Indirect Solution
Newton’s Method Example 02
minimize x 4 − x + 1
x
4xk3 −1
Newton’s method: xk+1 = xk − 12xk2
. For x0 = 3:
k xk f (xk )
1 3.0000 79.0000
2 2.0093 15.2904
3 1.3601 3.0619
.. .. ..
. . .
8 0.6300 0.5275
logo

Methods
Methods for Unconstrained Optimization
Direct Methods:
Scanning and Bracketing
Grid search
Interpolation
Stochastic Algorithms
Indirect Methods:
Steepest Descent Method;
Newton Method;
Quasi-Newton Method;
Conjugate Gradient.
logo

Methods
Direct Methods
General Algorithm:
1. Select initial set of point(s);
2. Evaluate objective function at each point;
3. Compare the values and keep the best solution (smallest value);
Outline: Do it forever and then you will surely find the optimal solution.
Remarks:
they are easy to apply and suitable for nonsmooth problems;
they require many objective functions evaluations;
there is no guarantee of convergence, or any proof the point is a
optimum;
methods of last resource - use them when nothing else works;
logo

Methods
Indirect Methods
Iterative procedure for generating a convergent sequence of points xk

such that f (x0 ) > f (x1 ) > . . . > f (xn∗ ):
Line Search:
Choose a promising direction
Find a step size to minimize along this direction
Trust Region:
Choose the maximum step size
Find a direction which minimizes
logo

Methods
Rates of Convergence
Different types of convergence rates for iterative procedures:
Linear: If there exists a c ∈ (0, 1) such that:
kxk+1 − x ∗ k ≤ c kxk − x ∗ k
p-Order: (often p = 2) If there exists M > 0 such that:
kxk+1 − x ∗ k ≤ M kxk − x ∗ kp
Superlinear: If there exists a zero converging sequence ck ,

limx→∞ ck = 0 such that
kxk+1 − x ∗ k ≤ ck kxk − x ∗ k
logo

Line Search Strategy
Line Search
Outline:
0. Initial point xk ;
1. Choose a search direction dk
2. Minimize along that direction to find a new point:
xk+1 = xk + αd
where α is a positive scalar called the step size
Note: This is an iterative procedure, repeated until the convergence is

achieved!
logo

Search Direction
The search direction must be a descent direction:
∇f (xk )d T < 0
2.2. Overview of Algorithms 23
See the contour curves of the function (consider d = pk ):
p
k
_ f
k
logo
Prof.Figure 2.6Escobar
Marcelo A downhill direction pk of Chemical Processes
Optimization
Exact Line Search
The step size α is determine by minimizing a merit function φ(α):
min φ(α) = f (xk + αd)

α≥0
A quadratic approximation of f (x) can be used as merit function:

1 2 T 2
φ(α) = f (xk ) + α∇f (xk )T d + α d ∇ f (xk )d
2
For which is possible an analytical solution:
∂φ(α) d T ∇f (xk )
= ∇f (xk )T d + αd T ∇2 f (xk )d = 0 ⇒ α = T 2
∂α d ∇ f (xk )d
Note: Exact line search is computationally expensive. There is a trade-off:

substantial reduction and computational effort. logo

Line Search
In order to ensure convergence, only sufficient decrease is necessary:
36 Chapter 3. f (x + αd) < f (xk )

L i n e S e a r ck
h Methods
φ (α)
first local
minimizer
first
stationary
point global minimizer
Figure 3.1 The ideal step length is the global minimizer.
Basic Idea: Try out a sequence of candidates values for α, stopping to accept
iteration by means of a low-rank formula. When p is defined by (3.2) and B is positive
k k
logo
one of these values when
definite, we have certain conditions are satisfied.
pkT ∇fk −∇fkT Bk−1 ∇fk < 0,
and therefore is a descent direction.

Prof.pk Marcelo Escobar Optimization of Chemical Processes
Inexact Line Search: Armijo’s Condition

38 Chapter 3. Line Search Methods
φ (α) = f(xk+ α pk )
l( α)
acceptable acceptable
Figure 3.3 Sufficient decrease condition.

Sufficient decrease condition (Armijo’s Condition):
for some constant c1 ∈ (0, 1). In other words, the reduction in f should be proportional to
both the step length αk and the directional derivative ∇fkT pk . Inequality T
(3.4) is sometimes
f (x + αd) < f (x ) + c α(x ) d
k
called the Armijo condition. k 1 k
The sufficient decrease condition is illustrated in Figure 3.3. The right-hand-side of
For some c1 ∈ (0, 1]. In practice, c is chosen to b quite small, say
(3.4), which is a linear function, can be denoted by l(α). The function l(·) has negative slope logo
1
c1 ∇fkT pk , but because c1 ∈ (0, 1), it lies above the graph of φ for small positive values of
c1 = 1e − 4! α. The sufficient decrease condition states that α is acceptable only if φ(α) ≤ l(α). The
intervals on which this condition is satisfied are shown in Figure 3.3. In practice, c1 is chosen
to be quite small, c1 10−4 .Escobar
Prof.sayMarcelo Optimization of Chemical Processes
Inexact Line Search: Curvature Condition

The sufficient decrease condition is not enough by
3 . 1 .itself
S t e p to
L e n ensure
g t h 39 that the
algorithm makes reasonable progress.
φ (α) =f(x k+α pk )
desired
slope
tangent
α
acceptable acceptable
In order to avoid short

Figure 3.4 steps, enforce
The curvature condition. the curvature conditions
f (x + αd)T d ≥ c α∇f (x )T d 2 curvature condition

k terminate the line search. The
so it might make sense to k is illustrated in
logo
Figure 3.4. Typical values of c2 are 0.9 when the search direction pk is chosen by a Newton
Typical values of c are 0.9.
or quasi-Newton
2 method, and 0.1 when pk is obtained from a nonlinear conjugate gradient
method.
Prof. decrease
The sufficient Marcelo andEscobar Optimization
curvature conditions of Chemical
are known collectively Processes
as the Wolfe
Inexact Line Search: Wolfe Conditions
The sufficient decrease and curvature conditions are known collectively as the
Wolfe conditions:
f (xk + αd) < f (xk ) + c1 α(xk )T d
f (xk + αd)T d ≥ c2 α∇f (xk )T d

with 0 < c1 < c2 < 1.
logo

Sufficient Decrease and Backtracking
Backtracking Line Search Algorithm

Chose ᾱ > 0, ρ, c ∈ (0, 1); Set α ← ᾱ
repeat until f (xk + αd) ≤ f (xk ) + c∇f (xk )T d
α ← ρα
end (repeat)
Terminate with αk = α
Remarks:
The initial step length ᾱ is chosen to be 1 in Newton and quasi-Newton
methods, but can have different values in other algorithms such as
steepest descent or conjugate gradient;
An acceptable step length will be found after a finite number of trials!!!
logo

Methods
Steepest Descent
Steepest Descent Direction:
d = −∇f (xk )
The gradient is the vector that gives the (local) direction of the
greatest increase in f (x).
If α is sufficiently small, it always converge;

It has a linear rate of convergence;
It can be very slow for highly nonlinear problems;
it might zigzag close to the optimum solution
logo

Methods
Steepest Descent
minimize f (x) = x 3 − 100x

x
logo
This function has a maximum at -5.774 and a minimum at 5.774.

Methods
Steepest Descent
The Gradient:
f 0 (x) = 3x 2 − 100
Steepest Descent:
xk+1 = xk − αk (3xk2 − 100)
For x0 = 10 and a tolerance of 1e−3 (Matlab File - Ex01a.m):

If αk = 0.1 the procedure diverges;
If αk = 0.01 the procedure converges in 27 iterations;
It exact line search is used the procedure converges in 1 iteration;
Note: The choice of α is essential for convergence. However, exact line

search is computationally expensive! logo

Methods
Newton’s Method Derivation
Consider a second order approximation of f (x) about xk :
1
f (x) ≈ f (xk ) + d T ∇f (xk ) + d T ∇2 f (xk )d
2
What is the optimal direction to minimize f (x)?
∂f (x)
= ∇f (xk ) + ∇2 f (xk )d = 0
∂d
It results in:
d=-∇2 f (xk )−1 ∇f (xk )
logo

Methods
Newton’s Method Interpretations

Newton Direction:
d = −∇2 f (xk )−1 ∇f (xk )
Interpretations:
d minimizes the second order approximation of f (x):
1
f (x) = f (xk ) + ∇f (xk )T d + d T ∇2 f (xk )d
2
d solves linearized optimality conditions for min f (x):
∇f (x) = ∇f (xk ) + ∇2 f (xk )d = 0
logo
Note: It is important to note that Newton’s Method has a implicit α = 1.

Methods
Newton Example
The Gradient and the Hessian are:
f 0 (x) = 3x 2 − 100 f 00 (x) = 6x
Newton Step:
3xk2 − 100
xk+1 = xk −
6x
For x0 = 10 and a tolerance of 1e−3 (Matlab File - Ex01a.m):
If αk = 1 the procedure converges in 4 iterations;
It exact line search is used the procedure converges in 1 iteration;
Note: The Newton’s Method may diverge if x0 is far from optimum!

logo

Methods
Newton Direction
Newton Direction:
d = −∇2 (fk )−1 ∇f (xk )
Remarks:
Close to the solution it converges fast - quadratic convergence;
In order to converge from poor starting points - use line search;
The newton direction may be not a descent direction even for
sufficiently small α!
logo

Methods
Numerical Example 02
minimize f (x) = (x1 − 3)2 + 9(x2 − 5)2

x
50
7
30
6.5
6
10
5.5 5
1
5 2
x2
4.5
4
30
3.5
50
3
60
2.5
80
2
0 1 2 3 4 5 6 7
x1
logo
This function has a minimum at (x1∗ , x2∗ ) = (3, 5).

Methods
Newton Example
The Gradient and the Hessian are:

2(x1 − 3) 2 0
∇f (x) = H(x) =
18(x2 − 3) 0 18
Line Search:
xk+1 = xk − H(xk )−1 ∇f (xk )
Consider the initial point x0 = (1, 1)T (Matlab File - Ex02a.m):

For steepest descent with α = 0.1 converges in 51 iterations;
For steepest descent with exact α converges in 6 iterations;
For Newton’s Method with exact α converges in 1 iteration;
logo

Practical Methods
Levenberg-Marquardt
If the Hessian is not positive definite the Newton direction is not a

descent direction:
Levenberg-Marquardt Method (Hessian Modification):
H̃(xk ) = H(xk ) + βk I , βk > −min(λi )
where λi are the eigenvalues of H(xk ).
Enjoy quadratic convergence!!!
logo

Practical Methods
Quasi-Newton
Alternatively the Hessian can be approximated:
∇f (xk+1 ) ≈ ∇f (xk ) + Bk dk
This formula is referred as Secant equation:
Bk dk ≈ yk
where dk = xk+1 − xk and yk = ∇f (xk+1 ) − ∇f (xk )
The Hessian approximation must satisfy the secant equation:

must be positive definite
must be symmetric
as close to Bk−1 as possible (no wild changes) logo
Note: This matrix is not uniquely defined (use update formulas).

Practical Methods
Quasi-Newton
The most popular update formulas are:
Broyden-Fletcher-Goldfarb-Shanno (BFGS):
yk ykT Bk dk (Bk dk )T
Bk+1 = Bk + T
−
yk dk dkT Bk dk
Davidon-Fletcher-Powell (DFP):
yk ykT
yk dkT

dk ykT

Bk+1 = + I− ykT dk
Bk I − ykT dk
ykT dk
Note: For the first iteration B0 = I . Numerical experiments have shown that logo
BFGS formula’s performance is superior over DFP formula

Practical Methods
Quasi-Newton
Alternatively, the inverse of the Hessian can be updated:

BFGS:
−1 (dkT yk + ykT Bk−1 yk )dk dkT Bk−1 yk dkT + dk ykT Bk−1

Bk+1 = Bk−1 + −
(ykT dk )2 dkT yk
DFP:
−1 Bk−1 yk ykT Bk−1 dk dkT
Bk+1 = Bk−1 − +
ykT Bk−1 yk ykT dk
logo

Practical Methods
Quasi-Newton
The methods presented in this section differ by the search direction:
d = −Bk−1 ∇f (xk )
where,
Steepest Descent Bk = I
Newton Method Bk = H (Hessian)
Quasi Newton Method Bk (Hessian approximation)
See the file Ex02a.m (Using Quasi-Newton Method, convergence in 2 logo
iterations).
Practical Methods
Quasi-Newton: Example
As an example consider the function (toy02.m):
minimize f (x1 , x2 ) = αexp(−β)

x1 ,x2
1
0.9
0.8
−0.3
0.7 −0.5
0.6
x2
0.5
−2
0.4 1
−4
0.3 −5
0.2 −3
−1
0.1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
x1
logo
This function has a minimum at (x1∗ , x2∗ ) = (0.7395, 0.3144).

Practical Methods
Quasi newton Example
Consider the initial point x0 = (0.8, 0.2)T :

Ex03b.m: Quasi Newton with Inexact Line Search 8 iterations;
Ex03b.m: Steepest Descent with Inexact Line Search 46 iterations;
See also different problems:

Ex03a.m: Quasi Newton with Exact Line Search - 14 iterations;
Ex03c.m: Quasi Newton with Inexact Line Search - 05 iterations;
logo

Practical Methods
Conjugate Gradient (CG)
A set of nonzero vectors (d1 , . . . , dn ) are said to be conjugate with

respect to the symmetric positive definite matrix H if:
diT Hdj = 0 ∀i 6= j
Orthogonality is a special case of conjugacy.

A quadratic problem is minimized in n steps using exact line search with
conjugate directions. Since the directions are linearly independent we
can write:
x ∗ = α0 d0 + . . . + αn−1 dn−1
Note: For non-quadratic functions more iterations may be necessary!
logo

Practical Methods
Conjugate Gradient: Fletcher-Reeves Method
Fletcher and Reeves showed an extension for nonlinear functions:
dk+1 = −∇f (xk+1 ) + βk dk
where,
∇f (xk+1 )T ∇f (xk+1 )
βk =
∇f (xk )T ∇f (xk )
For the first iteration d0 = −∇f (x0 ).
There are many CG methods that differ by the parameter βk .

See Ex03cc.m, convergence in 7 iterations. logo

Practical Methods
Summarizing: Line Search Strategy
General Algorithm:
0. Guess an initial point x0 and d0 = −∇f (x0 );
1. At xk for the given direction dk select αk (exact, inexact);
2. Set xk+1 = xk + αk dk ;
3. Estimate dk+1
(Steepest Descent, Newton, Quasi-Newton, Conjugate Gradient);
To ensure progress toward the minimum, make sure that:

dk is a descent direction;
αk is such that f (xk + αk dk ) < f (xk )
Have fun!!!
logo

Trust Region
Trust Region
Consider a second order approximation:

1
φ(d) = f (x) ≈ f (xk ) + ∇f (xk )T d + d T Hd
2
To obtain each step, we seek the solution of the subproblem:
1
minimize φ(d) = f (xk ) + ∇f (xk )T d + d T H(xk )d
kdk≤∆k 2
where ∆k is the trust region radius. It can be shown that the solution is:
(H(xk ) − λI )d ∗ = −∇f (xk )
For some λ ≥ 0 sucht that (H(xk ) − λI ) is positive definite.

logo

Trust Region
Trust Region: Solution of the Subproblem
If H(xk ) is positive definite and kdN k ≤ ∆k , the solution of the

subproblem is:
d ∗ = dN = H(xk )−1 ∇f (xk )
Otherwise,
d ∗ = (H(xk ) + λI )−1 ∇f (xk ), kd ∗ k ≤ ∆k
When λ varies between 0 and ∞, the search direction d(λ) will vary
between the Newton direction dN and a multiple of −∇f (x k ).
logo

◮ Furthermore, direct
λ(∆k − kp ∗ k) = 0. multip
c 2007 Niclas Börlin, CS, UmU Nonlinear Optimization; Trust-region methods
Trust Region
Trust Region: Solution of the Subproblem Line search and trust-region

Trust-region methods
The trust-region model
Least Squares; Levenberg-Marquardt
The trust-region subproblem
The Dogleg algorithm
The trust-region algorithm
The redu
◮ To en
pN is defi
◮ If the
p(λ) increa
◮ If the
decre
◮ Furth
−g is not
c 2007 Niclas Börlin, CS, UmU Nonlinear Optimization; Trust-region methods
logo
where pN is the Newton direction and −g = −∇f (x).

Trust Region
Trust Region: Alternatives
Steepest Descent: first order approximation,
φ(d) = f (x) ≈ f (xk ) + ∇f (xk )T d
analytical solution: dk = − k∇f∆(xk k )k ∇f (xk )
Quasi Newton: second order approximation,

1
φ(d) = f (x) ≈ f (xk ) + ∇f (xk )T d + d T Bk d
2
where Bk is a Hessian approximation (DFP, BFGS).
logo

Trust Region
Trust Region Radius
Given a direction dk define a ratio:

f (xk ) − f (xk + dk )
ρk =
φ(0) − φ(dk )
the numerator is called actual reduction and the denominator predicted

reduction.
ρk is always non-negative;
if ρk is close to one there is good agreement with the quadratic
approximation and then the trust region is expanded;
if ρk is close to zero, the trust region radius is shrink;
logo

Trust Region
Trust Region: Algorithm

Chose x0 ,∆¯ > 0, ∆0 ∈ (0, ∆),toland
¯ η ∈ (0, 0.25)
For k = 0, 1, 2, . . .
If k∇f (xk )k < tol, STOP with solution xk ;
Obtain dk by solving the subproblem (exact, or any approximation);
Evaluate ρk ;
If ρk < 0.25, then ∆k = 0.25∆k
else
¯
If ρk > 0.25 and dk = ∆k , then ∆k+1 = min(2∆k , ∆)
else, ∆k = ∆k
end
end
If ρk > η, then xk+1 = xk + dk
else, xk+1 = xk
end
end (for)
logo
Terminate with the trust region radius for the next iteration (∆k+1 ).

Trust Region
Subproblem Solution
In order to converge, it is not necessary the optimal solution of the
subproblem. Only a crude approximation with sufficient reduction is
needed.
Cauchy Point (slow): Solve the linear approximation. Note that it

is steepest descent method with αk = ∆k :
∇f (xk )
d =− ∆k
k∇f (xk )k
The Dogleg method (fast - superlinear) : d = dC + τ (dN − dC )
∇f (xk )T ∇f (xk )
dC = − ∇f (xk ) dN = H(xk )−1 ∇f (xk )
∇f (xk )T Bk ∇f (xk )
logo
The factor τ depends on dC ,dN and ∆k .

Trust Region
Trust Region Example
Consider the example min f (x) = α exp(−beta) (toy02.m):

Consider the initial point x0 = (0.8, 0.2)T :
Ex04a.m: For Cauchy Point (steepest descent) it converges in 44
iterations;
Ex04c.m: Newton with exact solution it converges in 6 iterations;
Ex04c2.m: Quasi Newton with exact solution it converges in 17
iterations;
Ex04d.m: Newton with dogleg it converges in 6 iterations;
Ex04e.m: Quasi Newton with dogleg it converges in 15 iterations;
logo

Least Squares
Least Square Problems

Approximate solution of overdetermined systems minimizing the sum of
the squares of the errors.
minimize f (β) = r (β)T r (β)

β
where r is the error function and β are the unknown parameters.
Curve Fitting:
r (β) = y − f (x, β)
Solution:
∂f ∂r
= 2r T =0
∂β ∂β
logo
Note: For m equations we can determine at most m parameters.

Least Squares

For linear error function: r = y − A(x)β
Analytical Solution: β ∗ = (AT A)−1 AT y
Applications:
Fitting of Linear functions:
ri = yi − (β1 xi + β2 )
Fitting of Linear functions in terms of parameters (e.g.

polynomials):
ri = yi − (β1 f (xi ) + . . . + βn f (xi ))
logo
Note: Transformation is useful, e.g. y = β1 e β2 x ; ln(y ) = ln(β1 ) + β2 x

Least Squares
It is possible enforce different weights for each equation.
minimize f (β) = (Wr (β))T Wr (β)

x
where W = diag(w1 , . . . , wm ) and wi the weight of equation i .
For linear r , there is analytical solution:
β ∗ = (AT W T WA)−1 AT W T y
Nonlinear Least Square: Guass Newton ∆β ∗ = (J T J)−1 AT y , where

J = ∇β r .
logo

Least Squares
Applications in Chemical Engineering
Kinetic models, e.g.

k1 CA − k2 CB
−rA =
1 − k 3 C A CB
Thermodynamic models, e.g.
B
ln(P SAT ) = A −
T +C
Empirical models,e.g.
1
h = kRe α Pr 3
logo

Least Squares
Example
Example 1: For a given data (Exa1.m) estimate A and B:
B
log(P SAT ) = A −
T
Example 2: For given experimental data (Exa2.m) estimate α and β:
Nu = αRe β
logo

Final Remarks
Final Remarks
Stationary points can be found by solving a non-linear system;

The solution can be found by direct or indirect methods;
Direct methods uses only objective function evaluations;
Indirect solution strategies: Line Search and Trust Region;
An important application of unconstrained optimization is curve
fitting and parameter estimation for stationary models.
logo

Final Remarks
Further Readings
Numerical Optimization. Nocedal

and Wright (1999):
Chapters 1, 2, 3, 4 and 5.
Optimization of Chemical Processes.
Edgar, Himmelblau and Lasdon
(1995) Chapters 5 and 6.
Linear and Nonlinear Programming.
Luenberguer (2008): Chapters 7, 8,
9 and 10;
Nonlinear Programming. Biegler
(2010): Chapters 2 and 3.
logo

Final Remarks
Help, Comments, Suggestions

P
Personal
l Information
I f ti
Just in Case Contact
Marcelo Escobar Aragão

(Teaching Assistant)
Department of Chemical Engineering

Federal University of Rio Grande do Sul
Porto Alegre ‐ RS
Phone: 55 51 3308 4163
Mobile: 55 51 9684 4213
email: escobar029@hotmail.com
logo
Presented by: Course 22/08/2011 11:39
Marcelo Escobar Slide 23/38

This presentation is over
Thank You for your attention!!!!

What you want to do now:
restart presentation
quit

Lecture2 Unconstrained Optimization

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture2 Unconstrained Optimization

Uploaded by

Copyright:

Available Formats

Introduction Methods Line Search Trust Region Least Squares Final Remarks

Lecture 2: Unconstrained Optimization

Prof. Marcelo Escobar

September 11, 2011

Prof. Marcelo Escobar Optimization of Chemical Processes

Prof. Marcelo Escobar Optimization of Chemical Processes

Unconstrained Optimization Problem

f (x): objective function

Prof. Marcelo Escobar Optimization of Chemical Processes

Calculus: Smooth Functions

Consider a given function f (x) of n variables.

f (x) is continuous if:

lim f (x) = f (xk )

f (x) is differentiable if the limit exists:

In other words af (x) is differentiable if its derivative is continuous. Smooth

Prof. Marcelo Escobar Optimization of Chemical Processes

Calculus: Gradient Vector and Hessian Matrix

Prof. Marcelo Escobar Optimization of Chemical Processes

Calculus: Taylor’s Series

Taylor’s series expansion of f (x) about xk :

Examine the vicinity of xk :

Prof. Marcelo Escobar Optimization of Chemical Processes

Global and Local Minimum:

x ∗ is a local minimizer if f (x ∗ ) < f (x) ∀ x ∈ N = x| kx − x ∗ k ≤ δ

Prof. Marcelo Escobar Optimization of Chemical Processes

A given f (x) is called convex on the interval [x1 , x2 ] if:

f (αx1 + (1 − α)x2 ) ≤ αx1 + (1 − α)x2 , α ∈ [0, 1]

The function is convex if the Hessian is positive definite.

Prof. Marcelo Escobar Optimization of Chemical Processes

What characterizes a solution?

For a function f (x) of a single variable x:

First order Necessary condition: f 0 (x ∗ ) = 0

For a multivariable function f (x):

First order Necessary condition: ∇f (x ∗ ) = 0

Prof. Marcelo Escobar Optimization of Chemical Processes

What characterizes a solution?

f (x) = x 2 f (x) = x 3 f (x) = −x 2

minimum saddle maximum

Prof. Marcelo Escobar Optimization of Chemical Processes

What characterizes a solution?

f (x) = −x12 − x22 f (x) = x12 − x22

Prof. Marcelo Escobar Optimization of Chemical Processes

As an example consider the following function:

this point is a local minimum! logo

Prof. Marcelo Escobar Optimization of Chemical Processes

Suppose that the nonlinear function f (x) = f (x1 , x2 , . . . , xn ) is to be

It is a (non)linear system c(x) = 0 with n variables and n equations!!!

Prof. Marcelo Escobar Optimization of Chemical Processes

Newton’s Method Derivation

Suppose that we want to solve the following system c(x) with n

c(x) ≈ c(xk ) + ∇c(xk )(x − xk ) = 0

Recall that we want to solve c(x) = ∇f (x) = 0:

xk+1 = xk − ∇2 f (xk )−1 ∇f (xk )

Newton Step (d = xk+1 − xk ): ∇2 f (xk )d = ∇f (xk ) logo

Prof. Marcelo Escobar Optimization of Chemical Processes

Newton’s Method Example 01

As an example consider the following function:

The f 0 (x) = 2x − 1 and f 00 (x) = 2. Newton’s method:

xk+1 = xk − f 00 (xk )−1 f 0 (xk ) = xk − (2)−1 (2xk − 1)

For x0 = 3, x1 = 0.5 (optimal solution since f 0 (x1 ) = 0 )

Note: Because the function is quadratic (f 0 (x) is linear), the minimum is

Prof. Marcelo Escobar Optimization of Chemical Processes

Newton’s Method Example 02

As an example consider the following function: