Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 33

Structural and Multidisciplinary

Optimization

Homework 1 – Unconstrained Optimization

Student: Dumitrașcu Celsia – Alexandra


S202355
Contents

1. Optimization Methods & Line Search Methods.............................................2

2. The minimized functions................................................................................3

3. Initial Points...................................................................................................4

4. Convergence Criteria......................................................................................5

5. Explanation of the Optimization Algorithms Used........................................6

6. Strengths and Weakness of the Analysed Methods......................................13


1. Optimization Methods & Line Search Methods

Optimization Methods
1. Steepest Descent
2. Conjugate Gradients
3. Newton Basic
4. Newton-like
5. Quasi-Newton method - Boyden-Fletcher-Goldfard-Shanno (BFGS)

Line Search Methods


1. Dichotomy – used in this report for the second function, in the Newton-
like method
2. Quadratic Interpolation -used in this report for the second function, in all
the methods
3. Newton - Raphson method
4. Secant method

The maximum number of iterations used is 70 and the tolerance is 1e-3.


2. The minimized functions

2.1 Function 1

The first function discussed in this report is a strictly convex quadratic


function, as it can be seen in the picture Fig 1.

Fig.1 Function 1. The Global Minimizer.

The gradient for this function is:


g= 4∗x +3∗y−2
(
4∗y +3∗x +10 )
The Hessian matrix has been calculated and it is a symmetric positive
definite.

Hessian ¿ ( 43 34)
For this function, no line search method is needed but they can still be
used.

2.2 Function 2
Unlike the first function, the second one is not a strictly convex quadratic
function. The Hessian matrix is not a symmetric positive definite. Below can be
seen the matrix for the gradient and the Hessian matrix (H m).

π π
( ( ) (( ) ))
( )
0.8∗x− 6∗ ∗sin ∗x +1
2 2
g=
π π
( (( )) (( ) ))
0.8∗y + 3∗
2
∗sin
2
∗y −1

π2 π
( ( ))
( )
0.8− 6 cos ∗x 0
4 2
H m=
π2 π
0 (
0.8+ 3
4
sin ∗y
2 ( ))

For this function, the optimisational alghoritms need a search line method. The
main used search line method in this report is the Quadratic Interpolation
method. However for the Newton-like optimization method, I used both the
Dichotomy and the Quadratic Interpolation.

Fig. 2 Function 2. Global and Local Minimizers


3. Initial Points

In order to define the initial point which will give us the minimum value
for the objective function, I made the following assumption: x=y.
Therefore, the studied functions become:
2
f 1 ( x , y ) =7 x2 +8 x−1 f 2 ( x , y ) =0.8 x +3 cos ( π2 x) The next step, we calculate

the second derivatives of the functions, which leads us to the following results.
∂2 f ∂2 f π2 π
∂x 2
=14> 0
∂x 2
=1.6−3
4 2 ( )
cos x >0 , for x ≤−1

Now, in order to find out what is the x of the initial point, we impose that
the first derivatives of f1 and f2 must be equal to 0.
∂f1
=14 x +8=0=¿ x=−0.5714 , y=−0.5714
∂x
∂f2 π π
∂x 2 2 ( )
=1.6 x−3 sin x =0=¿ x=−1.75265 , y=−1.75265

The solutions must be close to these initial points. That means that the
Global Minimum values of the objective functions are obtained for these points.
For the sake of a better understanding of how the methods work, several
initial points have been chosen (as it will be seen in the Annexes). In this report,
the results of the following initial points will be analysed:
 (0,0) for both functions – for the first function, it is a feasible point
but for the second one, the Hessian matrix will not be positive
definite. Some methods do not give good result because of this, but
it was a good way to check if the code was correct and to see how
the program will behave.
 (10,10) for both functions – to see the difference in the number of
iterations and to show the convergence approaching by right
 (-0.5714, -0.5714) for the first function – the point for which the
first derivative of f1=0
 (-1.75265, -1.75265) for the second function – the point for which
the first derivative of f2=0
4. Convergence Criteria

Convergence Criteria:
 for the first function
∂ f k +1
max
|
i=1 ,… n ∂ xi
(x ) < ε
|
It is a strict convex quadratic function and this criteria gives the best
computational time.

 for the second function


n 2
2 ∂ f k +1
‖∇ f ( x k+ 1)‖ =∑
i=1
[ ∂x ]
(x ) <ε

The second function is not strictly convergent quadratic, so I chose to


check the convergence by the magnitude of the gradient of the next point.
These criteria are used for all the methods in the present paperwork.

5. Explanation of the Optimization Algorithms Used

5.1 The Steepest Descent Method

For this method, first take an initial guess for X initial, which in my case is
(0,0). Using the iterative method, the program moves along the downwards
direction, until it reaches the optimum value of the objective function.
Sk =−∇ f k =−∇ f ( X k )

Where:
 Sk = search direction
 ∇ f k = gradient of the function at point Xk
 Xk = the analyzed point

The search direction is defined as minus the gradient because I am search


for the minimum of the functions, so I need to go downwards.
To calculate the next point in the search for the optimum, the formula
used is the one below:

Xk+1 = Xk + αSk = Xk – α∇ f k

Where:
 α = the optimal step length in the defined direction

In this method, two successive search directions are orthogonal. If the


initial point is not defined close to the optimal point, the convergence is slow.
For the first function (because it is strictly convex quadratic), the function
converges to a stationary point of the function (the global minimum). However,
for the second function, if the initial point is not defined well, the function will
converge to a local minimum.

Line Search:
 f1 – since it is a SCQF, no line search was needed
2
‖g‖
α= '
s ∗Hm∗s
Where:
s’ = the direction matrix transposed
Hm = the Hessian matrix
s = the direction matrix

 f2 – the used line search is the Quadratic Interpolation

Algorithm steps
1. Initialization
a) Choose initial point, x0, which needs to have real coordinates
b) Set k=0

2. Find the direction


a) Compute sk such that sk ’∗∇ f ∗X k <0

3. Line Search (for function 2)


k k k
a) Compute αk such that f ( X +α s )=min f ( X k +α s k )
α ≥0

4. Update the point


X k+1= X k + α k sk
5. Convergence Check
a) Satisfied: Stop => X*≈ Xk+1
b) Unsatisfied: Set k=k+1 and reiterate

In the table above can be seen the results for the 3 initial points, in the
case of the first function. It is easy to see (and just as expected) that for the
second point, which is the closest to X*, the program needed only 2 iterations to
reach the Global Minimum. The big number of iterations needed for the first
initial point ([0,0]) is caused by the asymptotic behaviour of the function at the
named point. The higher the value of the derivative in a point, the faster the
convergence.

Since the second function is not a strictly convex quadratic function, the
function will not converge only to one stationary point. It presents local
minimizers as well, thing that is noticed in the table above as well. As it can be
seen, for X0 = [0,0], the method converges to a local minimizer while for
X0 = [10,10] it reaches the Global minimum. However, for the x closest to X*, it
gets remarkably close to the global minimizer, yet not there.
The cosine creates waves in the x direction which makes it difficult to
figure out how fast the convergence will be. Therefore, for X0= [0,0] the
program needed 21 iterations while for the other 2 initial points, the global
minimum was reached in only 6 iterations.
The zig-zagging behaviour is noticed again.

5.2 The Conjugate Gradients Method


The previous method presents a zig-zagging behaviour. The Conjugate
Gradients is meant to solve this problem. The first direction for this method is
computed with respect to the Hessian matrix (just as sk) but the following
directions are calculated as shown below:

d k +1=−g+ β k∗d k where:

 d k +1 = the direction for the new point


 d k = the direction of the current point
2
k +1
k ‖∇ f ( x )‖
 β= 2
‖∇ f ( x k )‖
The directions of this method are mutually conjugate with respect to the
Hessian matrix. This method uses a reinitialization process after n iterations,
where “n” is the number of the function’s variables, in my case being 2.

Algorithm Steps

1. Initialization
a) Choose initial point, X0, which needs to have real coordinates
b) Set k=0

2. Find the direction


a) Set d0 = -g0

3. Iteration k
a ¿ X k+1= X k + α k d k

k −d k' g k
b) Compute α =
dk ' H m dk

c) d k +1=−gk +1+ β k∗d k


g(k +1)' H m d k
βk=
dk ' H m dk
d) Set k= k+1 and start the iteration again
The method converges to the global minimizer because the first
function is a strictly convex quadratic function. The asymptotic behaviour of the
function at the point X0= [0,0] disappears, which leads to a faster convergence.
However, using this method, the Global Minimum of the objective function is
reached, no matter what the initial point is.
The problem of the zigzagging disappears, which is fit to the theory.

Since the second function is not SCQF, I used the quadratic


interpolation line search method. The fact that the function is not strictly convex
quadratic also leads to a higher number of iterations until the global minimizer
is reached.
In order to reach the global convergence, it is needed to periodically
re-initialize the process. If “n” is not chosen properly, the convergence will be
slow.
The behaviour for the path at X0 = [0,0] as well as the big number of
iterations can be explained only by errors in the coding. This case is still to be
studied and repaired. In the path for X0 = [10,10] it can be observed that there is
no zigzag, just as expected.
Even with a number this big of iterations, it can be seen that this method
actually leads to the global minimizer, while the Steepest Descent method stops
to a local minimizer.
STEEPEST DESCENT VS CONJUGATE GRADIENTS
two successive search directions are mutually conjugate with respect to the
orthogonal Hessian matrix
Zig-zag path No “zig-zag”
Converges Well Takes less iterations
Does not globally converge for non- strictly convex quadratic functions
Converged for f1 in 2 iterations Converged for f1 in 2 iterations
Converged for f2 in 6 iterations Converged for f2 in 11 iterations

5.3 Basic Newton Method

This method takes a more direct path than the Conjugated Gradients
method. However, it requires the Hessian matrix for each iteration, to find the
extrema. It is perfect for SCQF and for functions with positive definite Hessian
matrix, otherwise it will not reach the global minimizer. As it will be seen
further in this report, for the second function, it found only local minimizers.
This is because this method does not use a line search.
In this method, α is not computed for the search direction. Instead, the
inverse of the Hessian matrix is multiplied by the gradient. The descent
direction for the Basic Newton method is given by the formula shown below:

−1
X k+1= X k −[∇2 f ( X k ) ] ∇ f ( X k )

It is amazingly easy to notice that this method, for the first function,
reaches the global minimizer in just 1 iteration. The convergence to a global
minimizer from both sides (left and right) of X ¿ was verified, even if X 0 is not
close to the said global minimizer. (As it will be shown in the annexes,
X =[100 100] is convergent.)
0

Even from the path of the method, it is seen that the method is going
straight for X*.
For the second function, the Newton Basic method reaches only local
minimizers. The local minimizers are found depending on the initial point
chosen. By running the method only for one initial point, it cannot be said if it
reached the global minimum or not.

5.4 Newton-like
The Newton-like method is the first adaptation of the Newton Basic
Method. Compared to the basic one, this method converges for the non-
SCQF too. In order for the non-SCQF to converge, in this method it is used
following adaptation:
−1
X k+1= X k −α k [∇ 2 f ( X k ) ] ∇ f ( X k )

Algorithm Steps

1. Choose α k to obtain the minimum of f ( X k +α s k ) in the downwards direction


2. Check if f ( X k + sk ) < f ( X k ) , α k =1
3. If f ( X k + sk ) ≥ f ( X k ); use the line search method to obtain updated α k such that it
satisfies

f ( X k +α k s k ) <f ( X k )

In the case of the first function, the results are similar to the Newton
Basic method. The global minimizer is reached in one single iteration. The
convergence is verified no matter how close or how far the chosen initial point
is from X*.
However, for the second function, the method reaches the minimum
only if the chosen X0 is close to X*. In the table above it can be seen that, just as
the Newton basic, the method stops at the first local minimizer met on the path.
The Dichotomy doesn’t give a good result for the initial point [10, 10] because
of the Hessian matrix. Therefore, I used the Quadratic Interpolation for that
point.

5.5 Quasi-Newton Method

In this method, it is used


k [ δ ¿¿ k ]T k T k k k [δ ¿¿ k ]T
=H +¿ δ δ k H H γ
k +1 k
H ¿ - [γ ¿ ¿ ] + ¿¿
[δ¿¿ k ]T γ k ¿ [δ¿¿ k ]T γ k ¿

When the initial H, H 0 is unknown, it is initialized as the identity


matrix. Unlike the Newton Basic method, this method needs more iterations to
find the minimum and the convergence path has a lower precision.

For the first function, it is noticeable that the method reached X* but it
took 2 iterations to get to it.
Unlike the Newton-like method, this one reached the global minimum
when the chosen initial point is close to X *. However, the results suggest that
the minimizer point that the method finds is dependent on the initial point.

NEWTON: BASIC VS LIKE VS QUASI

The Hessian matrix The Hessian matrix Needs just a positive


must be calculated must be calculated definite matrix
If the function doesn’t If the function doesn’t -
have a positive definite have a positive definite
Hessian, there will be Hessian, there will be
problems in the problems in the
computation computation
For the non-SCQF, the methods don’t reach the global minimizer
- Requires Line Search Requires Line Search

6. Strengths and Weakness of the Analysed Methods

Methods Strengths Weakness


Steepest Descent The simplest gradient Big number of iterations
Zig-zag behaviour of the
direction
Conjugate Gradients Straight path to the X* Reinitialization
Newton Basic Smallest number of Not globally convergent
iterations for f2
Newton Like Reaches the Global Works only if the
Minimizer Hessian is positive
definite
Newton-Quasi Less iterations than Needs a big storage
Newton Like
7. Annexes

1. Steepest Descent, function 1, X0 = [0,0], X* = [5.4282, -6.571], Obj. F = -39.2857.

2. Conjugated Gradients, function 1, X0 = [0,0], X* = [5.4286, -6.5714],


Obj. F = -39.2857.
3. Newton Basic, function 1, X0 = [0,0], X* = [5.4286, -6.5714], Obj. F = -39.2857.

4. Newton Like, function 1, X0 = [0,0], X* = [5.4286, -6.5714], Obj. F = -39.2857.


5.
6. Quasi-Newton, function 1, X0 = [0,0], X* = [5.4286, -6.5714], Obj. F = -39.2857.

7. Steepest Descent, function 2, X0 = [0,0], X* = [-1.962, 3.7213], Obj. F = - 7.3107.


8. Conjugated Gradients, function 2, X0 = [0,0], X* = [-1.9611, 0.12376], Obj. F =
-9.4727.

9. Newton Basic, function 2, X0 = [0,0], X* = [0.071406, 0.12192], Obj. F = 2.9746.


10. Newton Like, function 2, Dichotomy, X0 = [0,0], X* = [0.071406, 0.12129], Obj
F = 2.9746.

11. Quasi Newton, function 2, X0 = [0,0], X* = [-5.7497, 0.12328], Obj F = -1.1294.


12. Steepest Descent, function 1, X0 = [-0.5714, -0.5714], X* = [5.4286, -6.5714], Obj.
F = -39.2857

13. Conjugated Gradients, function 1, X0 = [-0.5714, -0.5714], X* = [5.4286, -6.5714],


Obj. F = -39.2857
14. Newton Basic, function 1, X0 = [-0.5714, -0.5714], X* = [5.4286, -6.5714],
Obj. F = -39.2857

15. Newton Like, function 1, X0 = [-0.5714, -0.5714], X* = [5.4286, -6.5714], Obj. F =


39.2857
16. Quasi-Newton, function 1, X0 = [-0.5714, -0.5714], X* = [5.4286, -6.5714], Obj. F =
-39.2857

17. Steepest Descent, function 1, X0 = [100, 100], X* = [5.4284, -6.5713],


Obj. F =-39.2857
18. Conjugated Gradients, function 1, X0 = [100, 100], X* = [5.4284, -6.5713],
Obj. F =-39.2857

19. Newton Basic, function 1, X0 = [100, 100], X* = [5.4284, -6.5713], Obj.


F =-39.2857
20. Newton Like, function 1, X0 = [100, 100], X* = [5.4284, -6.5713], Obj. F =-39.2857

21. Quasi-Newton, function 1, X0 = [100, 100], X* = [5.4284, -6.5713],


Obj. F =-39.2857
22. Steepest Descent, function 1, X0 = [10, 10], X* = [5.4284, -6.5713],
Obj. F =-39.2857

23. Conjugated Gradients, function 1, X0 = [10, 10], X* = [5.4284, -6.5713],


Obj. F =-39.2857
24. Newton Basic, function 1, X0 = [10, 10], X* = [5.4286, -6.5714], Obj. F =-39.2857

25. Newton Like, function 1, X0 = [10, 10], X* = [5.4286, -6.5714], Obj. F =-39.2857
26. Quasi-Newton, function 1, X0 = [10, 10], X* = [5.4286, -6.5714], Obj. F =-39.2857

27. Steepest Descent, function 2, X0 = [10, 10], X* = [-1.9622, 0.1241],


Obj. F = -9.4727
28. Conjugated Gradients, function 2, X0 = [10, 10], X* = [-1.961, 0.12396],
Obj. F = -9.4727

29. Newton Basic, function 2, X0 = [10, 10], X* = [9.2943, 0.12234], Obj. F = 38.1108
30. Newton Like, Quadratic Interpolation, function 2, X0 = [10, 10], X* = [1.8315,
3.7241], Obj. F = -3.517

31. Newton Like, Dichotonomy, function 2, X0 = [10, 10], X* = [0.071406, 0.12129],


Obj. F = 2.9746
32. Quasi-Newton, function 2, X0 = [10, 10], X* = [-1.9598, 0.12387], Obj. F = -9.4726
33. Centralization of the results

You might also like