Professional Documents
Culture Documents
Untitled
Untitled
Untitled
The equations that the matrices Hk are required to satisfy do not determine
those matrices uniquely. Thus, we have some freedom in the way we compute
the Hk : In the following methods, we compute Hk+1 by adding a correction to
Hk :
The rank one correction formula
The rank two correction formula
The DFP algorithm, originally developed by Davidon in 1959 and then modi…ed
by Fletcher and Powell in 1963
The BFGS algorithm ( Broyden, Fletcher, Goldfarb and Shanno 1970 )
In the rank one correction formula, the correction term is symmetric, and
has the form
ak z (k) z (k)T
where ak 2 R and z (k) 2 Rn . Therefore, the update equation is
Hk+1 = Hk + ak z (k) z (k)T
Note that
02 3 1
(k)
z1
B6 . 7 h (k) iC
rank ak z (k) z (k)T = rank B6 7
@4 .. 5 z1 zn(k) C
A=1
(k)
zn
and hence the name "rank one" correction (it is also called the single-rank
symmetric (SRS) algorithm).
If a and b are two column vectors, then any matrix of the form abT is a
matrix of rank one ( reducing it to the row echelon form give you only one non
zero row)
The product
z (k) z (k)T
is sometimes referred to as the dyadic product or outer product.
Observe that if Hk is symmetric, then so is Hk+1 .
Our goal now is to determine ak and z (k) given that
Hk ; 4g (k) ; and 4x(k)
so that the required relationship
Hk+1 4g (i) = 4x(i) ; 0 i k:
is satis…ed.
To begin, let us …rst consider the condition
Hk+1 4g (k) = 4x(k) ; that is, take the case when i = k:
In other words, we have Hk ; 4g (k) and 4x(k) , we wish to …nd ak and z (k) to
ensure that
Hk+1 4g (k) = (Hk + ak z (k) z (k)T )4g (k) = 4x(k) :
1
First note that z (k)T 4g (k) is a scalar ((1 n)(n 1)) matrix ( inner product of
two vectors).
Thus,
and hence
4x(k) Hk 4g (k)
z (k) =
ak (z (k)T 4g (k) )
Using z (k) ; we …nd z (k) z (k)T as follows:
Hence,
(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T
Hk+1 = Hk + :
ak (z (k)T 4g (k) )2
The next step is to express the denominator of the second term on the right-hand
side of the above equation as a function of the given quantities Hk ; 4g (k) and
4x(k) . That is, to get rid of ak and z (k) .
For this,
Premultiply equation one given below
by 4g (k)T to obtain
2
We summarize the above development in the following algorithm.
Rank One Algorithm
1. Set k := 0; select x(0) and a real symmetric positive de…nite H0 .
2. If g (k) = 0, stop; else
d(k) = Hk g (k) :
3. Compute
4. Compute
It turns out that the above is, in fact, automatically true, as stated in the
following theorem.
Theorem: For the rank one algorithm applied to the quadratic with
Hessian Q = QT ; we have
Proof. We prove the result by induction. From the discussion before the
theorem it is clear that the claim is true for k = 0. Suppose now that the
theorem is true for k 1 0; that is,
3
To this end, …x i < k. Since
To complete the proof it is enough to show that the second term on the right-
hand side of the above equation is equal to zero. For this to be true it is enough
that
Indeed, since
we have
Hence,
4
Use x(0) = [1; 2]T and H0 = I2 (2 2 identity matrix). We can represent f
as
1 2 0 x1
f (x) = x1 x2 + 3:
2 0 1 x2
T
1 x1 2 0 x1
= +3
2 x2 0 1 x2
Thus,
2 0
g (k) = x(k) :
0 1
Because H0 = I2 ;
d(0) = g (0) = [ 2; 2]T :
The objective function is quadratic, and hence
g (0)T d(0)
0 = arg min f (x(0) + d(0) ) =
0 d(0)T Qd(0)
2
[2; 2]
2 2
= = ;
2 0 2 3
[2; 2]
0 1 2
and thus
1 2T
x(1) = x(0) + 0d
(0)
=[ ; ] :
3 3
We then compute
4 4T
4x(0) = 0d
(0)
=[ ; ] ;
3 3
2 2T
g (1) = Qx(1) =[ ; ] ;
3 3
8 4T
4g (0) = g (1) g (0) = [ ; ] :
3 3
Because
8 4 4 32
4g (0)T (4x(0) H0 4g (0) ) = [ ; ] 3 = ;
3 3 0 9
we obtain
(4x(0) H0 4g (0) )(4x(0) H0 4g (0) )T 1
2 0
H1 = H0 + = :
4g (0)T (4x(0) H0 4g (0) ) 0 1
Therefore,
1 2T
d(1) = H1 g (1) = [ ; ] ;
3 3
and
g (1)T d(1)
1 = = 1:
d(1)T Qd(1)
5
We now compute
x(2) = x(1) + 1d
(1)
= [0; 0]T :
Note that g (2) = 0, and therefore x(2) = x .
As expected, the algorithm solves the problem in two steps. Note that the
directions d(0) and d(1) are Q conjugate which is in accordance with the theorem
we have proved.
The rank one correction algorithm works well for the case of constant Hessian
matrix; that is, the quadratic case. Our analysis was, in fact, done for this case.
However, ultimately we wish to apply the algorithm to general functions, not
just quadratics.
Unfortunately, for the nonquadratic case, the rank one correction algorithm is
not much satisfactory for several reasons.
For a nonquadratic objective function, Hk+1 may not be positive de…nite (see
Example 11:2 below) and thus d(k+1) ’may not be a descent direction. Further-
more, if
4g (k) (4x(k) Hk 4g (k) )
is close to zero, then there may be numerical problems in evaluating Hk+1 .
Example Assume that Hk > 0. It turns out that if
4g (k)T (4x(k) Hk 4g (k) ) > 0;
then Hk+1 > 0 (see Exercise 11:3). However, if
4g (k)T (4x(k) Hk 4g (k) ) < 0;
then Hk+1 may not be positive de…nite.
As an example of what might happen if 4g (k)T (4x(k) Hk 4g (k) ) < 0, consider
applying the rank one algorithm to the function
x41 x2
f (x) = + 2 x1 x2 + x1 x2
4 2
with an initial point
x(0) = [0:5960; 0:59607]T ;
and initial matrix
0:94913 0:14318
H0 = :
0:14318 0:59702
Note that H0 > 0. We have
4g (0)T (4x(0) H0 4g (0) ) = 0:03276
and
0:94481 0:23324
H1 = :
0:23324 1:2788
It is easy to check that H1 is not positive de…nite (it is inde…nite, with eigen-
values 0:96901 and 1:3030).
Fortunately, alternative algorithms have been developed for updating Hk .
In particular, if we use a "rank two" update, then Hk is guaranteed to be
positive de…nite for all k, provided the line search is exact.
6
Quasi Newton method
April 2023
Outlines of a lecture:
Outlines of a lecture:
QN method:
Review:
Recall that;
1 Idea behind the Newton method: x (k +1 ) = x (k ) F (x (k ) ) 1 g (k ) was
to locally approximate a function being minimized, at every step, by a
quadratice function
Review:
Recall that;
1 Idea behind the Newton method: x (k +1 ) = x (k ) F (x (k ) ) 1 g (k ) was
to locally approximate a function being minimized, at every step, by a
quadratice function
2 If the initial point is not su¢ ceintly close to the solution, it may not
possess the descent property.
Review:
Recall that;
1 Idea behind the Newton method: x (k +1 ) = x (k ) F (x (k ) ) 1 g (k ) was
to locally approximate a function being minimized, at every step, by a
quadratice function
2 If the initial point is not su¢ ceintly close to the solution, it may not
possess the descent property.
3 In Newton method, we determined an appropriate value of αk by
performing a line search in the direction of the vector
d k = F (x (k ) ) 1 g (k ) . A modi…cation so that the Newton method
has the descent property.
Review:
Recall that;
1 Idea behind the Newton method: x (k +1 ) = x (k ) F (x (k ) ) 1 g (k ) was
to locally approximate a function being minimized, at every step, by a
quadratice function
2 If the initial point is not su¢ ceintly close to the solution, it may not
possess the descent property.
3 In Newton method, we determined an appropriate value of αk by
performing a line search in the direction of the vector
d k = F (x (k ) ) 1 g (k ) . A modi…cation so that the Newton method
has the descent property.
4 Line search is simply the minimization of the real variable function
φk (α) = f (x (k ) αF (x (k ) ) 1 g (k ) ), which is not a trivial problem to
solve.
Review:
Recall that;
1 Idea behind the Newton method: x (k +1 ) = x (k ) F (x (k ) ) 1 g (k ) was
to locally approximate a function being minimized, at every step, by a
quadratice function
2 If the initial point is not su¢ ceintly close to the solution, it may not
possess the descent property.
3 In Newton method, we determined an appropriate value of αk by
performing a line search in the direction of the vector
d k = F (x (k ) ) 1 g (k ) . A modi…cation so that the Newton method
has the descent property.
4 Line search is simply the minimization of the real variable function
φk (α) = f (x (k ) αF (x (k ) ) 1 g (k ) ), which is not a trivial problem to
solve.
5 If it is convergent, it has a quadratic order of convergence
M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 4 / 17
Outlines of a Lecture:
Review:
Review:
f ( x (k +1 ) ) = f ( x (k ) ) + ( g (k ) ) T ( x (k +1 ) x (k ) ) + o ( x (k +1 ) x (k ) )
= f ( x (k ) ) α ( g ( k ) ) T Hk g ( k ) + o ( Hk g ( k ) α ) .
f ( x (k +1 ) ) = f ( x (k ) ) + ( g (k ) ) T ( x (k +1 ) x (k ) ) + o ( x (k +1 ) x (k ) )
= f ( x (k ) ) α ( g ( k ) ) T Hk g ( k ) + o ( Hk g ( k ) α ) .
f ( x (k +1 ) ) = f ( x (k ) ) + ( g (k ) ) T ( x (k +1 ) x (k ) ) + o ( x (k +1 ) x (k ) )
= f ( x (k ) ) α ( g ( k ) ) T Hk g ( k ) + o ( Hk g ( k ) α ) .
Conclusion:
Conclusion:
Proposition:
x (k +1 ) = x (k ) αk Hk g (k ) ,
where
αk = arg min f (x (k ) αHk g (k ) ),
α 0
Hk +1 4g (i ) = 4x (i ) , where 0 i k.
Continued:
Continued:
Continued:
Continued:
Continued:
Continued:
Continued:
Q 1
= [4x (0 ) , 4x (1 ) , ..., 4x (n 1)
][4g (0 ) , 4g (1 ) , ..., 4g (n 1)
] 1
.
Continued:
Q 1
= [4x (0 ) , 4x (1 ) , ..., 4x (n 1)
][4g (0 ) , 4g (1 ) , ..., 4g (n 1)
] 1
.
3 Thus, Hn = Q 1
Continued:
x (k +1 ) = x (k ) α k Hk g ( k )
αk = arg min f (x (k ) αHk g (k ) )
α 0
Continued:
x (k +1 ) = x (k ) α k Hk g ( k )
αk = arg min f (x (k ) αHk g (k ) )
α 0
Continued:
x (k +1 ) = x (k ) α k Hk g ( k )
αk = arg min f (x (k ) αHk g (k ) )
α 0
d (k ) = Hk g (k )
αk = arg min f (x (k ) + αd (k ) )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,
d (k ) = Hk g (k )
αk = arg min f (x (k ) + αd (k ) )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,
d (k ) = Hk g (k )
αk = arg min f (x (k ) + αd (k ) )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,
d (k ) = Hk g (k )
αk = arg min f (x (k ) + αd (k ) )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,
Theorem:
Hk +1 4g (i ) = 4x (i ) , 0 i k,
Proof:
Proof:
Proof:
Proof:
Proof:
Proof:
1 Assume the result is true for k 1 ( note that k < n 1), that is,
d (0 ) , ..., d (k ) are Q conjugate
Proof:
1 Assume the result is true for k 1 ( note that k < n 1), that is,
d (0 ) , ..., d (k ) are Q conjugate
2 We now prove the result for k; that is, d (0 ) , ..., d (k +1 ) are
Q conjugate.
Proof:
1 Assume the result is true for k 1 ( note that k < n 1), that is,
d (0 ) , ..., d (k ) are Q conjugate
2 We now prove the result for k; that is, d (0 ) , ..., d (k +1 ) are
Q conjugate.
3 It su¢ ces to show that (d (k +1 ) )T Qd (i ) = 0, 0 i k.
Proof:
1 Assume the result is true for k 1 ( note that k < n 1), that is,
d (0 ) , ..., d (k ) are Q conjugate
2 We now prove the result for k; that is, d (0 ) , ..., d (k +1 ) are
Q conjugate.
3 It su¢ ces to show that (d (k +1 ) )T Qd (i ) = 0, 0 i k.
4 x (i )
4 As αi 6= 0, so we can write d (i ) = αi .
Continued:
1 So, given i, 0 i k, we have
d (k +1 )T Qd (i ) = (g (k +1 ) )T Hk +1 Qd (i )
4 x (i )
= (g (k +1 ) )T Hk +1 Q
αi
4g i )
(
= (g (k +1 ) )T Hk +1
αi
4x ( i )
= ( g (k +1 ) ) T
αi
(k +1 ) T (i )
= (g ) d .