Untitled

THE RANK ONE CORRECTION FORMULA
The equations that the matrices Hk are required to satisfy do not determine
those matrices uniquely. Thus, we have some freedom in the way we compute
the Hk : In the following methods, we compute Hk+1 by adding a correction to
Hk :
The rank one correction formula
The rank two correction formula
The DFP algorithm, originally developed by Davidon in 1959 and then modi…ed
by Fletcher and Powell in 1963
The BFGS algorithm ( Broyden, Fletcher, Goldfarb and Shanno 1970 )
In the rank one correction formula, the correction term is symmetric, and
has the form
ak z (k) z (k)T
where ak 2 R and z (k) 2 Rn . Therefore, the update equation is
Hk+1 = Hk + ak z (k) z (k)T
Note that
02 3 1
(k)
z1
B6 . 7 h (k) iC
rank ak z (k) z (k)T = rank B6 7
@4 .. 5 z1 zn(k) C
A=1
(k)
zn
and hence the name "rank one" correction (it is also called the single-rank
symmetric (SRS) algorithm).
If a and b are two column vectors, then any matrix of the form abT is a
matrix of rank one ( reducing it to the row echelon form give you only one non
zero row)
The product
z (k) z (k)T
is sometimes referred to as the dyadic product or outer product.
Observe that if Hk is symmetric, then so is Hk+1 .
Our goal now is to determine ak and z (k) given that
Hk ; 4g (k) ; and 4x(k)
so that the required relationship
Hk+1 4g (i) = 4x(i) ; 0 i k:
is satis…ed.
To begin, let us …rst consider the condition
Hk+1 4g (k) = 4x(k) ; that is, take the case when i = k:
In other words, we have Hk ; 4g (k) and 4x(k) , we wish to …nd ak and z (k) to
ensure that
Hk+1 4g (k) = (Hk + ak z (k) z (k)T )4g (k) = 4x(k) :
1
First note that z (k)T 4g (k) is a scalar ((1 n)(n 1)) matrix ( inner product of
two vectors).
Thus,
Hk 4g (k) + ak z (k) z (k)T 4g (k) = 4x(k) =)

(k) (k)
4x Hk 4g = (ak z (k)T 4g (k) )z (k) call it equation one
and hence
4x(k) Hk 4g (k)
z (k) =
ak (z (k)T 4g (k) )
Using z (k) ; we …nd z (k) z (k)T as follows:
(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T

z (k) z (k)T = :
a2k (z (k)T 4g (k) )2
(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T
=) ak z (k) z (k)T = :
ak (z (k)T 4g (k) )2
Hence,
(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T
Hk+1 = Hk + :
ak (z (k)T 4g (k) )2
The next step is to express the denominator of the second term on the right-hand
side of the above equation as a function of the given quantities Hk ; 4g (k) and
4x(k) . That is, to get rid of ak and z (k) .
For this,
Premultiply equation one given below
4x(k) Hk 4g (k) = (ak z (k)T 4g (k) )z (k)
by 4g (k)T to obtain
4g (k)T 4x(k) 4g (k)T Hk 4g (k)

= 4g (k)T ak z (k) z (k)T 4g (k)
= ak 4g (k)T z (k) [z (k)T 4g (k) ]
= ak 4g (k)T z (k) [(4g (k) )T z (k) ]T
Observe that ak is a scalar and so is 4g (k)T ak z (k) = ak z (k)T 4g (k) . Thus,
4g (k)T 4x(k) 4g (k)T Hk 4g (k)

= ak (z (k)T 4g (k) )2
) 4g (k)T [4x(k) Hk 4g (k) ] = ak (z (k)T 4g (k) )2
Taking the above relation into account yields
(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T

Hk+1 = Hk + :
4g (k)T (4x(k) Hk 4g (k) )
2
We summarize the above development in the following algorithm.
Rank One Algorithm
1. Set k := 0; select x(0) and a real symmetric positive de…nite H0 .
2. If g (k) = 0, stop; else
d(k) = Hk g (k) :
3. Compute
k = arg min f (x(k) + d(k) )

0
(k+1) (k) (k)
x = x + kd :
4. Compute
4x(k) = x(k+1) x(k) = kd

(k)
4g (k) = g (k+1) g (k)

(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T
Hk+1 = Hk + :
4g (k)T (4x(k) Hk 4g (k) )
5. Set k := k + 1; go to step 2.
The rank one algorithm is based on satisfying the equation
Hk+1 4g (k) = 4x(k) :
However, what we want is
Hk+1 4g (i) = 4x(i) holds; for each i = 0; 1; 2; :::; k:
It turns out that the above is, in fact, automatically true, as stated in the
following theorem.
Theorem: For the rank one algorithm applied to the quadratic with
Hessian Q = QT ; we have
Hk+1 4g (i) = 4x(i) ; 0 i k:
Proof. We prove the result by induction. From the discussion before the
theorem it is clear that the claim is true for k = 0. Suppose now that the
theorem is true for k 1 0; that is,
Hk+1 4g (i) = 4x(i) ; i < k:
We now show that the theorem is true for k:

Our construction of the correction term ensures that
Hk+1 4g (k) = 4x(k) :
So we only have to show
Hk+1 4g (i) = 4x(i) ; i < k:
3
To this end, …x i < k. Since
(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T

Hk+1 = Hk +
4g (k)T (4x(k) Hk 4g (k) )
We have
(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T
Hk+1 4g (i) = Hk 4g (i) + 4g (i) :
4g (k)T (4x(k) Hk 4g (k) )
By the induction hypothesis, theorem is true for k 1 0; that is
Hk 4g (i) = 4x(i) holds
To complete the proof it is enough to show that the second term on the right-
hand side of the above equation is equal to zero. For this to be true it is enough
that
(4x(k) Hk 4g (k) )T 4g (i)

(k)T
= [4x (4g (k) )T Hk ]4g (i)
= 4x(k)T 4g (i) 4g (k)T Hk 4g (i) = 0
Indeed, since
4g (k)T Hk 4g (i) = 4g (k)T (Hk 4g (i) ) = 4g (k)T 4x(i)
by the induction hypothesis. Moreover,
4g (k) = Q4x(k) and (4g (k) )T = (Q4x(k) )T = (4x(k) )T QT = (4x(k) )T Q
we have
4g (k)T Hk 4g (i) = 4g (k)T 4x(i) : induction

= 4x(k)T Q4x(i) : use 4g (k)T from the above
= 4x(k)T 4g (i) :
Hence,
(4x(k) Hk 4g (k) )T 4g (i) = 4x(k)T 4g (i) 4x(k)T 4g (i) = 0:
which completes the proof.

Example
Let
1
f (x1 ; x2 ) = x21 + x22 + 3
2
1
= (2x21 + x22 ) + 3:
2
Apply the rank one correction algorithm to minimize f .
4
Use x(0) = [1; 2]T and H0 = I2 (2 2 identity matrix). We can represent f
as
1 2 0 x1
f (x) = x1 x2 + 3:
2 0 1 x2
T
1 x1 2 0 x1
= +3
2 x2 0 1 x2
Thus,
2 0
g (k) = x(k) :
0 1
Because H0 = I2 ;
d(0) = g (0) = [ 2; 2]T :
The objective function is quadratic, and hence
g (0)T d(0)
0 = arg min f (x(0) + d(0) ) =
0 d(0)T Qd(0)
2
[2; 2]
2 2
= = ;
2 0 2 3
[2; 2]
0 1 2
and thus
1 2T
x(1) = x(0) + 0d
(0)
=[ ; ] :
3 3
We then compute
4 4T
4x(0) = 0d
(0)
=[ ; ] ;
3 3
2 2T
g (1) = Qx(1) =[ ; ] ;
3 3
8 4T
4g (0) = g (1) g (0) = [ ; ] :
3 3
Because
8 4 4 32
4g (0)T (4x(0) H0 4g (0) ) = [ ; ] 3 = ;
3 3 0 9
we obtain
(4x(0) H0 4g (0) )(4x(0) H0 4g (0) )T 1
2 0
H1 = H0 + = :
4g (0)T (4x(0) H0 4g (0) ) 0 1
Therefore,
1 2T
d(1) = H1 g (1) = [ ; ] ;
3 3
and
g (1)T d(1)
1 = = 1:
d(1)T Qd(1)
5
We now compute
x(2) = x(1) + 1d
(1)
= [0; 0]T :
Note that g (2) = 0, and therefore x(2) = x .
As expected, the algorithm solves the problem in two steps. Note that the
directions d(0) and d(1) are Q conjugate which is in accordance with the theorem
we have proved.
The rank one correction algorithm works well for the case of constant Hessian
matrix; that is, the quadratic case. Our analysis was, in fact, done for this case.
However, ultimately we wish to apply the algorithm to general functions, not
just quadratics.
Unfortunately, for the nonquadratic case, the rank one correction algorithm is
not much satisfactory for several reasons.
For a nonquadratic objective function, Hk+1 may not be positive de…nite (see
Example 11:2 below) and thus d(k+1) ’may not be a descent direction. Further-
more, if
4g (k) (4x(k) Hk 4g (k) )
is close to zero, then there may be numerical problems in evaluating Hk+1 .
Example Assume that Hk > 0. It turns out that if
4g (k)T (4x(k) Hk 4g (k) ) > 0;
then Hk+1 > 0 (see Exercise 11:3). However, if
4g (k)T (4x(k) Hk 4g (k) ) < 0;
then Hk+1 may not be positive de…nite.
As an example of what might happen if 4g (k)T (4x(k) Hk 4g (k) ) < 0, consider
applying the rank one algorithm to the function
x41 x2
f (x) = + 2 x1 x2 + x1 x2
4 2
with an initial point
x(0) = [0:5960; 0:59607]T ;
and initial matrix
0:94913 0:14318
H0 = :
0:14318 0:59702
Note that H0 > 0. We have
4g (0)T (4x(0) H0 4g (0) ) = 0:03276
and
0:94481 0:23324
H1 = :
0:23324 1:2788
It is easy to check that H1 is not positive de…nite (it is inde…nite, with eigen-
values 0:96901 and 1:3030).
Fortunately, alternative algorithms have been developed for updating Hk .
In particular, if we use a "rank two" update, then Hk is guaranteed to be
positive de…nite for all k, provided the line search is exact.
6
Quasi Newton method
Mujahid Abbas (abbas.mujahid@gmail.com)
Department of Mathematics, GCU, Lahore Pakistan
April 2023
M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 1 / 17

Outlines of a Lecture:
Outlines of a lecture:
1 Quasi- Newton method

Outlines of a lecture:
1 Quasi- Newton method

2 Results and Discussion

QN method:
Consider a quadratic objective function f as follows:

1 T
f (x ) = x Qx + b T x + c
2
where Q is a positive de…nite symmetric matrix.

Review:
Recall that;
1 Idea behind the Newton method: x (k +1 ) = x (k ) F (x (k ) ) 1 g (k ) was
to locally approximate a function being minimized, at every step, by a
quadratice function

Review:
Recall that;
quadratice function
2 If the initial point is not su¢ ceintly close to the solution, it may not
possess the descent property.

Review:
Recall that;
quadratice function
3 In Newton method, we determined an appropriate value of αk by
performing a line search in the direction of the vector
d k = F (x (k ) ) 1 g (k ) . A modi…cation so that the Newton method
has the descent property.

Review:
Recall that;
quadratice function
4 Line search is simply the minimization of the real variable function
φk (α) = f (x (k ) αF (x (k ) ) 1 g (k ) ), which is not a trivial problem to
solve.

Review:
Recall that;
quadratice function
4 Line search is simply the minimization of the real variable function
φk (α) = f (x (k ) αF (x (k ) ) 1 g (k ) ), which is not a trivial problem to
solve.
5 If it is convergent, it has a quadratic order of convergence
Review:
1 To avoid the computation of F (x (k ) ) 1 , In this lecture, we use an

approximation H (x (k ) ) to F (x (k ) ) 1 in place of the true inverse and
set a direction H (x (k ) )g (k ) , where we want: H (x (k ) ) is positive
de…nite symmetric matrix (why).

Review:
1 To avoid the computation of F (x (k ) ) 1 , In this lecture, we use an

approximation H (x (k ) ) to F (x (k ) ) 1 in place of the true inverse and
set a direction H (x (k ) )g (k ) , where we want: H (x (k ) ) is positive
de…nite symmetric matrix (why).
2 The method employs search direction that is conjugate to the
previous search direction.

Why positive de…nite:

We want to preserve some properties of F (x (k ) ) 1 in the process of
approximation. We demand an important property that an approximation
to F (x (k ) ) 1 should satisfy:
1 Let x (k +1 ) = x (k ) αHk g (k ) , where Hk is an n n real matrix, and α
is a positive search parameter.


2 Expanding f about x (k ) we obtain that
f ( x (k +1 ) ) = f ( x (k ) ) + ( g (k ) ) T ( x (k +1 ) x (k ) ) + o ( x (k +1 ) x (k ) )
= f ( x (k ) ) α ( g ( k ) ) T Hk g ( k ) + o ( Hk g ( k ) α ) .


3 As α tends to zero, the second term on the right-hand side of the

above equation dominates the third. Thus, to guarantee a decrease in
f for small α, that is, f (x (k +1 ) ) < f (x (k ) ), we have to have
g (k )T Hk g (k ) > 0.


3 As α tends to zero, the second term on the right-hand side of the

above equation dominates the third. Thus, to guarantee a decrease in
f for small α, that is, f (x (k +1 ) ) < f (x (k ) ), we have to have
g (k )T Hk g (k ) > 0.
4 A simple way to ensure this is to require that Hk be positive de…nite.
Conclusion:
In constructing an approximation to the inverse of the Hessian matrix, we

should use only the objective function and gradient values. Thus, if we can
…nd a suitable method of choosing Hk , the iteration may be carried out
without any evaluation of the Hessian and without the solution of any set
of linear equations.
1 Let d (k ) = Hk g (k ) , where Hk is an n n real positive de…nite
symmetric matrix, and αk is a positive and minimizes f when we carry
out a linear search on the line x (k ) + αd (k ) , that is, in the direction of
d (k ) .

Conclusion:
In constructing an approximation to the inverse of the Hessian matrix, we

should use only the objective function and gradient values. Thus, if we can
…nd a suitable method of choosing Hk , the iteration may be carried out
without any evaluation of the Hessian and without the solution of any set
of linear equations.
1 Let d (k ) = Hk g (k ) , where Hk is an n n real positive de…nite
symmetric matrix, and αk is a positive and minimizes f when we carry
out a linear search on the line x (k ) + αd (k ) , that is, in the direction of
d (k ) .
2 This direction is a downhill direction.

Proposition:
Let f 2 C 1 ,x (k ) 2 Rn , g (k ) = rf (x (k ) ) 6= 0, and Hk an n n real

symmetric positive de…nite matrix. If we set
x (k +1 ) = x (k ) αk Hk g (k ) ,
where
αk = arg min f (x (k ) αHk g (k ) ),
α 0
then αk > 0, and f (x (k +1 ) ) < f (x (k ) ).

Another important property:
1 Let H0 , H1 , H2 be successive approximations of the inverse F (x (k ) ) 1.

2 Suppose that the Hessian matrix F (x ) of the objective function f is

constant and independent of x. That is, the objective function is
quadratic with Hessian F (x ) = Q for all x, where Q = Q T .


3 Then, g (k +1 ) g (k ) = Q (x (k +1 ) x (k ) ) or 4g (k ) = Q 4x (k ) , where
4g (k ) = g (k +1 ) g (k ) , and 4x (k ) = x (k +1 ) x (k ) .


4 Note that for any given k, the matrix Q 1 satis…es:
Q 1 4g (i ) = 4x (i ) , where 0 i k.


4 Note that for any given k, the matrix Q 1 satis…es:
Q 1 4g (i ) = 4x (i ) , where 0 i k.
5 If we start with a real symmetric positive de…nite matrix H0 , then our
second demand is the approximation Hk +1 must satisfy
Hk +1 4g (i ) = 4x (i ) , where 0 i k.

Continued:
1 If n steps are involved, then moving in n directions

4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) gives

Continued:

4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) gives
2 Hn 4g (0 ) = 4x (0 ) , Hn 4g (1 ) = 4x (1 ) , , Hn 4g (n 1) = 4 x (n 1) .

Continued:

4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) gives
2 Hn 4g (0 ) = 4x (0 ) , Hn 4g (1 ) = 4x (1 ) , , Hn 4g (n 1) = 4 x (n 1) .
3 The above set of equations can be written as

Hn [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].

Continued:

4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) gives
2 Hn 4g (0 ) = 4x (0 ) , Hn 4g (1 ) = 4x (1 ) , , Hn 4g (n 1) = 4 x (n 1) .

Hn [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].
4 Therefore, if [4g (0 ) , 4g (1 ) , ..., 4g (n 1) ] is nonsingular, then

Continued:

4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) gives
2 Hn 4g (0 ) = 4x (0 ) , Hn 4g (1 ) = 4x (1 ) , , Hn 4g (n 1) = 4 x (n 1) .

Hn [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].
4 Therefore, if [4g (0 ) , 4g (1 ) , ..., 4g (n 1) ] is nonsingular, then

5 Hn = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ][4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] 1

Continued:
1 Note that Q satis…es

Q [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ] = [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] and
Q 1 [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].

Continued:

Q [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ] = [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] and
Q 1 [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].
2 Therefore, if [4g (0 ) , 4g (1 ) , ..., 4g (n 1) ] is nonsingular, then Q 1 is
determined uniquely after n steps, via
Q 1
= [4x (0 ) , 4x (1 ) , ..., 4x (n 1)
][4g (0 ) , 4g (1 ) , ..., 4g (n 1)
] 1
.

Continued:

Q [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ] = [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] and
Q 1 [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].
2 Therefore, if [4g (0 ) , 4g (1 ) , ..., 4g (n 1) ] is nonsingular, then Q 1 is
determined uniquely after n steps, via
Q 1
= [4x (0 ) , 4x (1 ) , ..., 4x (n 1)
][4g (0 ) , 4g (1 ) , ..., 4g (n 1)
] 1
.
3 Thus, Hn = Q 1

Continued:
1 As a consequence, we conclude that if Hn satis…es the equations

Hn 4g (i ) = 4x (i ) , 0 i n 1, then the algorithm
x (k +1 ) = x (k ) α k Hk g ( k )
αk = arg min f (x (k ) αHk g (k ) )
α 0
is guaranteed to solve problems with quadratic objective functions in

n + 1 steps

Continued:

x (k +1 ) = x (k ) α k Hk g ( k )
α 0

n + 1 steps
2 The update x (n +1 ) = x (n ) αn Hn g (n ) is equivalent to Newton’s
algorithm. If this method is applied to the quadratic form of the
function with Q symmetric positive de…nite, then Hn = Q 1 and
process will terminate after n stages.

Continued:

x (k +1 ) = x (k ) α k Hk g ( k )
α 0

n + 1 steps
2 The update x (n +1 ) = x (n ) αn Hn g (n ) is equivalent to Newton’s
algorithm. If this method is applied to the quadratic form of the
function with Q symmetric positive de…nite, then Hn = Q 1 and
process will terminate after n stages.
3 We shall prove that such algorithms solve quadratic problems of n
variables in at most n steps.

Quasi Newton Algorithm:
Quasi-Newton algorithms have the following form

1
d (k ) = Hk g (k )
αk = arg min f (x (k ) + αd (k ) )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,


1
d (k ) = Hk g (k )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,
2 The matrices H0 , H1 , ... are symmetric.


1
d (k ) = Hk g (k )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,

3 In the quadratic case, the above matrices are required to satisfy
Hk +1 4g (i ) = 4x (i ) , 0 i k, where
4x (i ) = x (i +1 ) x (i ) = αi d (i ) , and 4g (i ) = g (i +1 ) g (i ) = Q 4x (i ) .


1
d (k ) = Hk g (k )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,

3 In the quadratic case, the above matrices are required to satisfy
Hk +1 4g (i ) = 4x (i ) , 0 i k, where
4x (i ) = x (i +1 ) x (i ) = αi d (i ) , and 4g (i ) = g (i +1 ) g (i ) = Q 4x (i ) .
4 It turns out that quasi-Newton methods are also conjugate direction
methods.

Theorem:
Consider a quasi-Newton algorithm applied to a quadratic function with

Hessian Q = Q T such that for 0 k < n 1,
Hk +1 4g (i ) = 4x (i ) , 0 i k,
where Hk +1 = HkT+1 . If αi 6= 0, 0 i k, then d (0 ) , ..., d (k +1 ) are

Q conjugate. Note that k + 1 < n.

Proof:
1 For k = 0, the result holds trivially, that is, d (0 ) and d (1 ) are

Q conjugate.

Proof:

Q conjugate.
4 x (0 )
2 As α0 6= 0, so we can write d (0 ) = α0

Proof:

Q conjugate.
4 x (0 )
(0 )
3 d (1 )T Qd (0 ) = (g (1 ) )T H1 Qd (0 ) = (g (1 ) )T H1 Q 4αx0 : use
d ( k ) = Hk g ( k )

Proof:

Q conjugate.
4 x (0 )
(0 )
d ( k ) = Hk g ( k )
(0 ) ) (0 ) ) (0 ) )
4 = (g (1 ) )T H1 Q (4α0x = (g (1 ) )T H1 Q (4
α0
x
= (g (1 ) )T H1 4αg0 :
Use 4g (i ) = Q 4x (i )

Proof:

Q conjugate.
4 x (0 )
(0 )
d ( k ) = Hk g ( k )
(0 ) ) (0 ) ) (0 ) )
4 = (g (1 ) )T H1 Q (4α0x = (g (1 ) )T H1 Q (4
α0
x
= (g (1 ) )T H1 4αg0 :
Use 4g (i ) = Q 4x (i )
(0 )
5 (g (1 ) )T 4αx0 = (g (1 ) )T d (0 ) = 0 ( Exercise 11.1) as a
consequence of α0 > 0 being the minimizer of f (x (0 ) + αd (0 ) ).

Proof:
1 Assume the result is true for k 1 ( note that k < n 1), that is,
d (0 ) , ..., d (k ) are Q conjugate

Proof:
2 We now prove the result for k; that is, d (0 ) , ..., d (k +1 ) are
Q conjugate.

Proof:
Q conjugate.
3 It su¢ ces to show that (d (k +1 ) )T Qd (i ) = 0, 0 i k.

Proof:
Q conjugate.
3 It su¢ ces to show that (d (k +1 ) )T Qd (i ) = 0, 0 i k.
4 x (i )
4 As αi 6= 0, so we can write d (i ) = αi .

Continued:
1 So, given i, 0 i k, we have
d (k +1 )T Qd (i ) = (g (k +1 ) )T Hk +1 Qd (i )
4 x (i )
= (g (k +1 ) )T Hk +1 Q
αi
4g i )
(
= (g (k +1 ) )T Hk +1
αi
4x ( i )
= ( g (k +1 ) ) T
αi
(k +1 ) T (i )
= (g ) d .
Because d (0 ) , ..., d (k ) are Q conjugate by assumption, we conclude from

the result proved earlier that (g (k +1 ) )T d (i ) = 0. Hence,
(d (k +1 ) )T Qd (i ) = 0, which completes the proof.

Untitled

Uploaded by

Copyright:

Available Formats

You might also like

Untitled

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Untitled

Uploaded by

Copyright:

Available Formats

THE RANK ONE CORRECTION FORMULA

Hk 4g (k) + ak z (k) z (k)T 4g (k) = 4x(k) =)

(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T

4x(k) Hk 4g (k) = (ak z (k)T 4g (k) )z (k)

4g (k)T 4x(k) 4g (k)T Hk 4g (k)

Observe that ak is a scalar and so is 4g (k)T ak z (k) = ak z (k)T 4g (k) . Thus,

4g (k)T 4x(k) 4g (k)T Hk 4g (k)

Taking the above relation into account yields

(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T

k = arg min f (x(k) + d(k) )

4x(k) = x(k+1) x(k) = kd

4g (k) = g (k+1) g (k)

Hk+1 4g (k) = 4x(k) :

However, what we want is

Hk+1 4g (i) = 4x(i) holds; for each i = 0; 1; 2; :::; k:

Hk+1 4g (i) = 4x(i) ; 0 i k:

Hk+1 4g (i) = 4x(i) ; i < k:

We now show that the theorem is true for k:

Hk+1 4g (k) = 4x(k) :

So we only have to show

Hk+1 4g (i) = 4x(i) ; i < k:

(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T

Hk 4g (i) = 4x(i) holds

(4x(k) Hk 4g (k) )T 4g (i)

4g (k)T Hk 4g (i) = 4g (k)T (Hk 4g (i) ) = 4g (k)T 4x(i)

by the induction hypothesis. Moreover,

4g (k) = Q4x(k) and (4g (k) )T = (Q4x(k) )T = (4x(k) )T QT = (4x(k) )T Q

4g (k)T Hk 4g (i) = 4g (k)T 4x(i) : induction

(4x(k) Hk 4g (k) )T 4g (i) = 4x(k)T 4g (i) 4x(k)T 4g (i) = 0:

which completes the proof.

Mujahid Abbas (abbas.mujahid@gmail.com)

Department of Mathematics, GCU, Lahore Pakistan

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 1 / 17

1 Quasi- Newton method

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 2 / 17

1 Quasi- Newton method

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 2 / 17

Consider a quadratic objective function f as follows:

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 3 / 17

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 4 / 17

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 4 / 17

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 4 / 17

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 4 / 17

1 To avoid the computation of F (x (k ) ) 1 , In this lecture, we use an

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 5 / 17

1 To avoid the computation of F (x (k ) ) 1 , In this lecture, we use an

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 5 / 17

Why positive de…nite:

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 6 / 17

Why positive de…nite:

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 6 / 17

Why positive de…nite:

3 As α tends to zero, the second term on the right-hand side of the

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 6 / 17

Why positive de…nite:

3 As α tends to zero, the second term on the right-hand side of the

In constructing an approximation to the inverse of the Hessian matrix, we

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 7 / 17