Untitled

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

THE RANK ONE CORRECTION FORMULA

The equations that the matrices Hk are required to satisfy do not determine
those matrices uniquely. Thus, we have some freedom in the way we compute
the Hk : In the following methods, we compute Hk+1 by adding a correction to
Hk :
The rank one correction formula
The rank two correction formula
The DFP algorithm, originally developed by Davidon in 1959 and then modi…ed
by Fletcher and Powell in 1963
The BFGS algorithm ( Broyden, Fletcher, Goldfarb and Shanno 1970 )
In the rank one correction formula, the correction term is symmetric, and
has the form
ak z (k) z (k)T
where ak 2 R and z (k) 2 Rn . Therefore, the update equation is
Hk+1 = Hk + ak z (k) z (k)T
Note that
02 3 1
(k)
z1
B6 . 7 h (k) iC
rank ak z (k) z (k)T = rank B6 7
@4 .. 5 z1 zn(k) C
A=1
(k)
zn
and hence the name "rank one" correction (it is also called the single-rank
symmetric (SRS) algorithm).
If a and b are two column vectors, then any matrix of the form abT is a
matrix of rank one ( reducing it to the row echelon form give you only one non
zero row)
The product
z (k) z (k)T
is sometimes referred to as the dyadic product or outer product.
Observe that if Hk is symmetric, then so is Hk+1 .
Our goal now is to determine ak and z (k) given that
Hk ; 4g (k) ; and 4x(k)
so that the required relationship
Hk+1 4g (i) = 4x(i) ; 0 i k:
is satis…ed.
To begin, let us …rst consider the condition
Hk+1 4g (k) = 4x(k) ; that is, take the case when i = k:
In other words, we have Hk ; 4g (k) and 4x(k) , we wish to …nd ak and z (k) to
ensure that
Hk+1 4g (k) = (Hk + ak z (k) z (k)T )4g (k) = 4x(k) :

1
First note that z (k)T 4g (k) is a scalar ((1 n)(n 1)) matrix ( inner product of
two vectors).
Thus,

Hk 4g (k) + ak z (k) z (k)T 4g (k) = 4x(k) =)


(k) (k)
4x Hk 4g = (ak z (k)T 4g (k) )z (k) call it equation one

and hence
4x(k) Hk 4g (k)
z (k) =
ak (z (k)T 4g (k) )
Using z (k) ; we …nd z (k) z (k)T as follows:

(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T


z (k) z (k)T = :
a2k (z (k)T 4g (k) )2
(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T
=) ak z (k) z (k)T = :
ak (z (k)T 4g (k) )2

Hence,
(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T
Hk+1 = Hk + :
ak (z (k)T 4g (k) )2
The next step is to express the denominator of the second term on the right-hand
side of the above equation as a function of the given quantities Hk ; 4g (k) and
4x(k) . That is, to get rid of ak and z (k) .
For this,
Premultiply equation one given below

4x(k) Hk 4g (k) = (ak z (k)T 4g (k) )z (k)

by 4g (k)T to obtain

4g (k)T 4x(k) 4g (k)T Hk 4g (k)


= 4g (k)T ak z (k) z (k)T 4g (k)
= ak 4g (k)T z (k) [z (k)T 4g (k) ]
= ak 4g (k)T z (k) [(4g (k) )T z (k) ]T

Observe that ak is a scalar and so is 4g (k)T ak z (k) = ak z (k)T 4g (k) . Thus,

4g (k)T 4x(k) 4g (k)T Hk 4g (k)


= ak (z (k)T 4g (k) )2
) 4g (k)T [4x(k) Hk 4g (k) ] = ak (z (k)T 4g (k) )2

Taking the above relation into account yields

(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T


Hk+1 = Hk + :
4g (k)T (4x(k) Hk 4g (k) )

2
We summarize the above development in the following algorithm.
Rank One Algorithm
1. Set k := 0; select x(0) and a real symmetric positive de…nite H0 .
2. If g (k) = 0, stop; else

d(k) = Hk g (k) :

3. Compute

k = arg min f (x(k) + d(k) )


0
(k+1) (k) (k)
x = x + kd :

4. Compute

4x(k) = x(k+1) x(k) = kd


(k)

4g (k) = g (k+1) g (k)


(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T
Hk+1 = Hk + :
4g (k)T (4x(k) Hk 4g (k) )
5. Set k := k + 1; go to step 2.
The rank one algorithm is based on satisfying the equation

Hk+1 4g (k) = 4x(k) :

However, what we want is

Hk+1 4g (i) = 4x(i) holds; for each i = 0; 1; 2; :::; k:

It turns out that the above is, in fact, automatically true, as stated in the
following theorem.
Theorem: For the rank one algorithm applied to the quadratic with
Hessian Q = QT ; we have

Hk+1 4g (i) = 4x(i) ; 0 i k:

Proof. We prove the result by induction. From the discussion before the
theorem it is clear that the claim is true for k = 0. Suppose now that the
theorem is true for k 1 0; that is,

Hk+1 4g (i) = 4x(i) ; i < k:

We now show that the theorem is true for k:


Our construction of the correction term ensures that

Hk+1 4g (k) = 4x(k) :

So we only have to show

Hk+1 4g (i) = 4x(i) ; i < k:

3
To this end, …x i < k. Since

(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T


Hk+1 = Hk +
4g (k)T (4x(k) Hk 4g (k) )
We have
(4x(k) Hk 4g (k) )(4x(k) Hk 4g (k) )T
Hk+1 4g (i) = Hk 4g (i) + 4g (i) :
4g (k)T (4x(k) Hk 4g (k) )
By the induction hypothesis, theorem is true for k 1 0; that is

Hk 4g (i) = 4x(i) holds

To complete the proof it is enough to show that the second term on the right-
hand side of the above equation is equal to zero. For this to be true it is enough
that

(4x(k) Hk 4g (k) )T 4g (i)


(k)T
= [4x (4g (k) )T Hk ]4g (i)
= 4x(k)T 4g (i) 4g (k)T Hk 4g (i) = 0

Indeed, since

4g (k)T Hk 4g (i) = 4g (k)T (Hk 4g (i) ) = 4g (k)T 4x(i)

by the induction hypothesis. Moreover,

4g (k) = Q4x(k) and (4g (k) )T = (Q4x(k) )T = (4x(k) )T QT = (4x(k) )T Q

we have

4g (k)T Hk 4g (i) = 4g (k)T 4x(i) : induction


= 4x(k)T Q4x(i) : use 4g (k)T from the above
= 4x(k)T 4g (i) :

Hence,

(4x(k) Hk 4g (k) )T 4g (i) = 4x(k)T 4g (i) 4x(k)T 4g (i) = 0:

which completes the proof.


Example
Let
1
f (x1 ; x2 ) = x21 + x22 + 3
2
1
= (2x21 + x22 ) + 3:
2
Apply the rank one correction algorithm to minimize f .

4
Use x(0) = [1; 2]T and H0 = I2 (2 2 identity matrix). We can represent f
as
1 2 0 x1
f (x) = x1 x2 + 3:
2 0 1 x2
T
1 x1 2 0 x1
= +3
2 x2 0 1 x2
Thus,
2 0
g (k) = x(k) :
0 1
Because H0 = I2 ;
d(0) = g (0) = [ 2; 2]T :
The objective function is quadratic, and hence
g (0)T d(0)
0 = arg min f (x(0) + d(0) ) =
0 d(0)T Qd(0)
2
[2; 2]
2 2
= = ;
2 0 2 3
[2; 2]
0 1 2
and thus
1 2T
x(1) = x(0) + 0d
(0)
=[ ; ] :
3 3
We then compute
4 4T
4x(0) = 0d
(0)
=[ ; ] ;
3 3
2 2T
g (1) = Qx(1) =[ ; ] ;
3 3
8 4T
4g (0) = g (1) g (0) = [ ; ] :
3 3
Because
8 4 4 32
4g (0)T (4x(0) H0 4g (0) ) = [ ; ] 3 = ;
3 3 0 9
we obtain
(4x(0) H0 4g (0) )(4x(0) H0 4g (0) )T 1
2 0
H1 = H0 + = :
4g (0)T (4x(0) H0 4g (0) ) 0 1
Therefore,
1 2T
d(1) = H1 g (1) = [ ; ] ;
3 3
and
g (1)T d(1)
1 = = 1:
d(1)T Qd(1)

5
We now compute
x(2) = x(1) + 1d
(1)
= [0; 0]T :
Note that g (2) = 0, and therefore x(2) = x .
As expected, the algorithm solves the problem in two steps. Note that the
directions d(0) and d(1) are Q conjugate which is in accordance with the theorem
we have proved.
The rank one correction algorithm works well for the case of constant Hessian
matrix; that is, the quadratic case. Our analysis was, in fact, done for this case.
However, ultimately we wish to apply the algorithm to general functions, not
just quadratics.
Unfortunately, for the nonquadratic case, the rank one correction algorithm is
not much satisfactory for several reasons.
For a nonquadratic objective function, Hk+1 may not be positive de…nite (see
Example 11:2 below) and thus d(k+1) ’may not be a descent direction. Further-
more, if
4g (k) (4x(k) Hk 4g (k) )
is close to zero, then there may be numerical problems in evaluating Hk+1 .
Example Assume that Hk > 0. It turns out that if
4g (k)T (4x(k) Hk 4g (k) ) > 0;
then Hk+1 > 0 (see Exercise 11:3). However, if
4g (k)T (4x(k) Hk 4g (k) ) < 0;
then Hk+1 may not be positive de…nite.
As an example of what might happen if 4g (k)T (4x(k) Hk 4g (k) ) < 0, consider
applying the rank one algorithm to the function
x41 x2
f (x) = + 2 x1 x2 + x1 x2
4 2
with an initial point
x(0) = [0:5960; 0:59607]T ;
and initial matrix
0:94913 0:14318
H0 = :
0:14318 0:59702
Note that H0 > 0. We have
4g (0)T (4x(0) H0 4g (0) ) = 0:03276
and
0:94481 0:23324
H1 = :
0:23324 1:2788
It is easy to check that H1 is not positive de…nite (it is inde…nite, with eigen-
values 0:96901 and 1:3030).
Fortunately, alternative algorithms have been developed for updating Hk .
In particular, if we use a "rank two" update, then Hk is guaranteed to be
positive de…nite for all k, provided the line search is exact.

6
Quasi Newton method

Mujahid Abbas (abbas.mujahid@gmail.com)

Department of Mathematics, GCU, Lahore Pakistan

April 2023

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 1 / 17


Outlines of a Lecture:

Outlines of a lecture:

1 Quasi- Newton method

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 2 / 17


Outlines of a Lecture:

Outlines of a lecture:

1 Quasi- Newton method


2 Results and Discussion

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 2 / 17


Outlines of a Lecture:

QN method:

Consider a quadratic objective function f as follows:


1 T
f (x ) = x Qx + b T x + c
2
where Q is a positive de…nite symmetric matrix.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 3 / 17


Outlines of a Lecture:

Review:
Recall that;
1 Idea behind the Newton method: x (k +1 ) = x (k ) F (x (k ) ) 1 g (k ) was
to locally approximate a function being minimized, at every step, by a
quadratice function

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 4 / 17


Outlines of a Lecture:

Review:
Recall that;
1 Idea behind the Newton method: x (k +1 ) = x (k ) F (x (k ) ) 1 g (k ) was
to locally approximate a function being minimized, at every step, by a
quadratice function
2 If the initial point is not su¢ ceintly close to the solution, it may not
possess the descent property.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 4 / 17


Outlines of a Lecture:

Review:
Recall that;
1 Idea behind the Newton method: x (k +1 ) = x (k ) F (x (k ) ) 1 g (k ) was
to locally approximate a function being minimized, at every step, by a
quadratice function
2 If the initial point is not su¢ ceintly close to the solution, it may not
possess the descent property.
3 In Newton method, we determined an appropriate value of αk by
performing a line search in the direction of the vector
d k = F (x (k ) ) 1 g (k ) . A modi…cation so that the Newton method
has the descent property.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 4 / 17


Outlines of a Lecture:

Review:
Recall that;
1 Idea behind the Newton method: x (k +1 ) = x (k ) F (x (k ) ) 1 g (k ) was
to locally approximate a function being minimized, at every step, by a
quadratice function
2 If the initial point is not su¢ ceintly close to the solution, it may not
possess the descent property.
3 In Newton method, we determined an appropriate value of αk by
performing a line search in the direction of the vector
d k = F (x (k ) ) 1 g (k ) . A modi…cation so that the Newton method
has the descent property.
4 Line search is simply the minimization of the real variable function
φk (α) = f (x (k ) αF (x (k ) ) 1 g (k ) ), which is not a trivial problem to
solve.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 4 / 17


Outlines of a Lecture:

Review:
Recall that;
1 Idea behind the Newton method: x (k +1 ) = x (k ) F (x (k ) ) 1 g (k ) was
to locally approximate a function being minimized, at every step, by a
quadratice function
2 If the initial point is not su¢ ceintly close to the solution, it may not
possess the descent property.
3 In Newton method, we determined an appropriate value of αk by
performing a line search in the direction of the vector
d k = F (x (k ) ) 1 g (k ) . A modi…cation so that the Newton method
has the descent property.
4 Line search is simply the minimization of the real variable function
φk (α) = f (x (k ) αF (x (k ) ) 1 g (k ) ), which is not a trivial problem to
solve.
5 If it is convergent, it has a quadratic order of convergence
M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 4 / 17
Outlines of a Lecture:

Review:

1 To avoid the computation of F (x (k ) ) 1 , In this lecture, we use an


approximation H (x (k ) ) to F (x (k ) ) 1 in place of the true inverse and
set a direction H (x (k ) )g (k ) , where we want: H (x (k ) ) is positive
de…nite symmetric matrix (why).

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 5 / 17


Outlines of a Lecture:

Review:

1 To avoid the computation of F (x (k ) ) 1 , In this lecture, we use an


approximation H (x (k ) ) to F (x (k ) ) 1 in place of the true inverse and
set a direction H (x (k ) )g (k ) , where we want: H (x (k ) ) is positive
de…nite symmetric matrix (why).
2 The method employs search direction that is conjugate to the
previous search direction.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 5 / 17


Outlines of a Lecture:

Why positive de…nite:


We want to preserve some properties of F (x (k ) ) 1 in the process of
approximation. We demand an important property that an approximation
to F (x (k ) ) 1 should satisfy:
1 Let x (k +1 ) = x (k ) αHk g (k ) , where Hk is an n n real matrix, and α
is a positive search parameter.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 6 / 17


Outlines of a Lecture:

Why positive de…nite:


We want to preserve some properties of F (x (k ) ) 1 in the process of
approximation. We demand an important property that an approximation
to F (x (k ) ) 1 should satisfy:
1 Let x (k +1 ) = x (k ) αHk g (k ) , where Hk is an n n real matrix, and α
is a positive search parameter.
2 Expanding f about x (k ) we obtain that

f ( x (k +1 ) ) = f ( x (k ) ) + ( g (k ) ) T ( x (k +1 ) x (k ) ) + o ( x (k +1 ) x (k ) )

= f ( x (k ) ) α ( g ( k ) ) T Hk g ( k ) + o ( Hk g ( k ) α ) .

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 6 / 17


Outlines of a Lecture:

Why positive de…nite:


We want to preserve some properties of F (x (k ) ) 1 in the process of
approximation. We demand an important property that an approximation
to F (x (k ) ) 1 should satisfy:
1 Let x (k +1 ) = x (k ) αHk g (k ) , where Hk is an n n real matrix, and α
is a positive search parameter.
2 Expanding f about x (k ) we obtain that

f ( x (k +1 ) ) = f ( x (k ) ) + ( g (k ) ) T ( x (k +1 ) x (k ) ) + o ( x (k +1 ) x (k ) )

= f ( x (k ) ) α ( g ( k ) ) T Hk g ( k ) + o ( Hk g ( k ) α ) .

3 As α tends to zero, the second term on the right-hand side of the


above equation dominates the third. Thus, to guarantee a decrease in
f for small α, that is, f (x (k +1 ) ) < f (x (k ) ), we have to have
g (k )T Hk g (k ) > 0.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 6 / 17


Outlines of a Lecture:

Why positive de…nite:


We want to preserve some properties of F (x (k ) ) 1 in the process of
approximation. We demand an important property that an approximation
to F (x (k ) ) 1 should satisfy:
1 Let x (k +1 ) = x (k ) αHk g (k ) , where Hk is an n n real matrix, and α
is a positive search parameter.
2 Expanding f about x (k ) we obtain that

f ( x (k +1 ) ) = f ( x (k ) ) + ( g (k ) ) T ( x (k +1 ) x (k ) ) + o ( x (k +1 ) x (k ) )

= f ( x (k ) ) α ( g ( k ) ) T Hk g ( k ) + o ( Hk g ( k ) α ) .

3 As α tends to zero, the second term on the right-hand side of the


above equation dominates the third. Thus, to guarantee a decrease in
f for small α, that is, f (x (k +1 ) ) < f (x (k ) ), we have to have
g (k )T Hk g (k ) > 0.
4 A simple way to ensure this is to require that Hk be positive de…nite.
M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 6 / 17
Outlines of a Lecture:

Conclusion:

In constructing an approximation to the inverse of the Hessian matrix, we


should use only the objective function and gradient values. Thus, if we can
…nd a suitable method of choosing Hk , the iteration may be carried out
without any evaluation of the Hessian and without the solution of any set
of linear equations.
1 Let d (k ) = Hk g (k ) , where Hk is an n n real positive de…nite
symmetric matrix, and αk is a positive and minimizes f when we carry
out a linear search on the line x (k ) + αd (k ) , that is, in the direction of
d (k ) .

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 7 / 17


Outlines of a Lecture:

Conclusion:

In constructing an approximation to the inverse of the Hessian matrix, we


should use only the objective function and gradient values. Thus, if we can
…nd a suitable method of choosing Hk , the iteration may be carried out
without any evaluation of the Hessian and without the solution of any set
of linear equations.
1 Let d (k ) = Hk g (k ) , where Hk is an n n real positive de…nite
symmetric matrix, and αk is a positive and minimizes f when we carry
out a linear search on the line x (k ) + αd (k ) , that is, in the direction of
d (k ) .
2 This direction is a downhill direction.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 7 / 17


Outlines of a Lecture:

Proposition:

Let f 2 C 1 ,x (k ) 2 Rn , g (k ) = rf (x (k ) ) 6= 0, and Hk an n n real


symmetric positive de…nite matrix. If we set

x (k +1 ) = x (k ) αk Hk g (k ) ,

where
αk = arg min f (x (k ) αHk g (k ) ),
α 0

then αk > 0, and f (x (k +1 ) ) < f (x (k ) ).

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 8 / 17


Outlines of a Lecture:

Another important property:

1 Let H0 , H1 , H2 be successive approximations of the inverse F (x (k ) ) 1.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 9 / 17


Outlines of a Lecture:

Another important property:

1 Let H0 , H1 , H2 be successive approximations of the inverse F (x (k ) ) 1.

2 Suppose that the Hessian matrix F (x ) of the objective function f is


constant and independent of x. That is, the objective function is
quadratic with Hessian F (x ) = Q for all x, where Q = Q T .

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 9 / 17


Outlines of a Lecture:

Another important property:

1 Let H0 , H1 , H2 be successive approximations of the inverse F (x (k ) ) 1.

2 Suppose that the Hessian matrix F (x ) of the objective function f is


constant and independent of x. That is, the objective function is
quadratic with Hessian F (x ) = Q for all x, where Q = Q T .
3 Then, g (k +1 ) g (k ) = Q (x (k +1 ) x (k ) ) or 4g (k ) = Q 4x (k ) , where
4g (k ) = g (k +1 ) g (k ) , and 4x (k ) = x (k +1 ) x (k ) .

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 9 / 17


Outlines of a Lecture:

Another important property:

1 Let H0 , H1 , H2 be successive approximations of the inverse F (x (k ) ) 1.

2 Suppose that the Hessian matrix F (x ) of the objective function f is


constant and independent of x. That is, the objective function is
quadratic with Hessian F (x ) = Q for all x, where Q = Q T .
3 Then, g (k +1 ) g (k ) = Q (x (k +1 ) x (k ) ) or 4g (k ) = Q 4x (k ) , where
4g (k ) = g (k +1 ) g (k ) , and 4x (k ) = x (k +1 ) x (k ) .
4 Note that for any given k, the matrix Q 1 satis…es:
Q 1 4g (i ) = 4x (i ) , where 0 i k.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 9 / 17


Outlines of a Lecture:

Another important property:

1 Let H0 , H1 , H2 be successive approximations of the inverse F (x (k ) ) 1.

2 Suppose that the Hessian matrix F (x ) of the objective function f is


constant and independent of x. That is, the objective function is
quadratic with Hessian F (x ) = Q for all x, where Q = Q T .
3 Then, g (k +1 ) g (k ) = Q (x (k +1 ) x (k ) ) or 4g (k ) = Q 4x (k ) , where
4g (k ) = g (k +1 ) g (k ) , and 4x (k ) = x (k +1 ) x (k ) .
4 Note that for any given k, the matrix Q 1 satis…es:
Q 1 4g (i ) = 4x (i ) , where 0 i k.
5 If we start with a real symmetric positive de…nite matrix H0 , then our
second demand is the approximation Hk +1 must satisfy

Hk +1 4g (i ) = 4x (i ) , where 0 i k.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 9 / 17


Outlines of a Lecture:

Continued:

1 If n steps are involved, then moving in n directions


4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) gives

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 10 / 17


Outlines of a Lecture:

Continued:

1 If n steps are involved, then moving in n directions


4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) gives
2 Hn 4g (0 ) = 4x (0 ) , Hn 4g (1 ) = 4x (1 ) , , Hn 4g (n 1) = 4 x (n 1) .

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 10 / 17


Outlines of a Lecture:

Continued:

1 If n steps are involved, then moving in n directions


4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) gives
2 Hn 4g (0 ) = 4x (0 ) , Hn 4g (1 ) = 4x (1 ) , , Hn 4g (n 1) = 4 x (n 1) .

3 The above set of equations can be written as


Hn [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 10 / 17


Outlines of a Lecture:

Continued:

1 If n steps are involved, then moving in n directions


4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) gives
2 Hn 4g (0 ) = 4x (0 ) , Hn 4g (1 ) = 4x (1 ) , , Hn 4g (n 1) = 4 x (n 1) .

3 The above set of equations can be written as


Hn [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].

4 Therefore, if [4g (0 ) , 4g (1 ) , ..., 4g (n 1) ] is nonsingular, then

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 10 / 17


Outlines of a Lecture:

Continued:

1 If n steps are involved, then moving in n directions


4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) gives
2 Hn 4g (0 ) = 4x (0 ) , Hn 4g (1 ) = 4x (1 ) , , Hn 4g (n 1) = 4 x (n 1) .

3 The above set of equations can be written as


Hn [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].

4 Therefore, if [4g (0 ) , 4g (1 ) , ..., 4g (n 1) ] is nonsingular, then


5 Hn = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ][4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] 1

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 10 / 17


Outlines of a Lecture:

Continued:

1 Note that Q satis…es


Q [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ] = [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] and
Q 1 [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 11 / 17


Outlines of a Lecture:

Continued:

1 Note that Q satis…es


Q [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ] = [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] and
Q 1 [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].
2 Therefore, if [4g (0 ) , 4g (1 ) , ..., 4g (n 1) ] is nonsingular, then Q 1 is
determined uniquely after n steps, via

Q 1
= [4x (0 ) , 4x (1 ) , ..., 4x (n 1)
][4g (0 ) , 4g (1 ) , ..., 4g (n 1)
] 1
.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 11 / 17


Outlines of a Lecture:

Continued:

1 Note that Q satis…es


Q [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ] = [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] and
Q 1 [4g (0 ) , 4g (1 ) , ..., 4g (n 1 ) ] = [4x (0 ) , 4x (1 ) , ..., 4x (n 1 ) ].
2 Therefore, if [4g (0 ) , 4g (1 ) , ..., 4g (n 1) ] is nonsingular, then Q 1 is
determined uniquely after n steps, via

Q 1
= [4x (0 ) , 4x (1 ) , ..., 4x (n 1)
][4g (0 ) , 4g (1 ) , ..., 4g (n 1)
] 1
.

3 Thus, Hn = Q 1

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 11 / 17


Outlines of a Lecture:

Continued:

1 As a consequence, we conclude that if Hn satis…es the equations


Hn 4g (i ) = 4x (i ) , 0 i n 1, then the algorithm

x (k +1 ) = x (k ) α k Hk g ( k )
αk = arg min f (x (k ) αHk g (k ) )
α 0

is guaranteed to solve problems with quadratic objective functions in


n + 1 steps

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 12 / 17


Outlines of a Lecture:

Continued:

1 As a consequence, we conclude that if Hn satis…es the equations


Hn 4g (i ) = 4x (i ) , 0 i n 1, then the algorithm

x (k +1 ) = x (k ) α k Hk g ( k )
αk = arg min f (x (k ) αHk g (k ) )
α 0

is guaranteed to solve problems with quadratic objective functions in


n + 1 steps
2 The update x (n +1 ) = x (n ) αn Hn g (n ) is equivalent to Newton’s
algorithm. If this method is applied to the quadratic form of the
function with Q symmetric positive de…nite, then Hn = Q 1 and
process will terminate after n stages.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 12 / 17


Outlines of a Lecture:

Continued:

1 As a consequence, we conclude that if Hn satis…es the equations


Hn 4g (i ) = 4x (i ) , 0 i n 1, then the algorithm

x (k +1 ) = x (k ) α k Hk g ( k )
αk = arg min f (x (k ) αHk g (k ) )
α 0

is guaranteed to solve problems with quadratic objective functions in


n + 1 steps
2 The update x (n +1 ) = x (n ) αn Hn g (n ) is equivalent to Newton’s
algorithm. If this method is applied to the quadratic form of the
function with Q symmetric positive de…nite, then Hn = Q 1 and
process will terminate after n stages.
3 We shall prove that such algorithms solve quadratic problems of n
variables in at most n steps.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 12 / 17


Outlines of a Lecture:

Quasi Newton Algorithm:

Quasi-Newton algorithms have the following form


1

d (k ) = Hk g (k )
αk = arg min f (x (k ) + αd (k ) )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 13 / 17


Outlines of a Lecture:

Quasi Newton Algorithm:

Quasi-Newton algorithms have the following form


1

d (k ) = Hk g (k )
αk = arg min f (x (k ) + αd (k ) )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,

2 The matrices H0 , H1 , ... are symmetric.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 13 / 17


Outlines of a Lecture:

Quasi Newton Algorithm:

Quasi-Newton algorithms have the following form


1

d (k ) = Hk g (k )
αk = arg min f (x (k ) + αd (k ) )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,

2 The matrices H0 , H1 , ... are symmetric.


3 In the quadratic case, the above matrices are required to satisfy
Hk +1 4g (i ) = 4x (i ) , 0 i k, where
4x (i ) = x (i +1 ) x (i ) = αi d (i ) , and 4g (i ) = g (i +1 ) g (i ) = Q 4x (i ) .

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 13 / 17


Outlines of a Lecture:

Quasi Newton Algorithm:

Quasi-Newton algorithms have the following form


1

d (k ) = Hk g (k )
αk = arg min f (x (k ) + αd (k ) )
α 0
(k +1 ) (k )
x = x + α k d (k ) ,

2 The matrices H0 , H1 , ... are symmetric.


3 In the quadratic case, the above matrices are required to satisfy
Hk +1 4g (i ) = 4x (i ) , 0 i k, where
4x (i ) = x (i +1 ) x (i ) = αi d (i ) , and 4g (i ) = g (i +1 ) g (i ) = Q 4x (i ) .
4 It turns out that quasi-Newton methods are also conjugate direction
methods.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 13 / 17


Outlines of a Lecture:

Theorem:

Consider a quasi-Newton algorithm applied to a quadratic function with


Hessian Q = Q T such that for 0 k < n 1,

Hk +1 4g (i ) = 4x (i ) , 0 i k,

where Hk +1 = HkT+1 . If αi 6= 0, 0 i k, then d (0 ) , ..., d (k +1 ) are


Q conjugate. Note that k + 1 < n.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 14 / 17


Outlines of a Lecture:

Proof:

1 For k = 0, the result holds trivially, that is, d (0 ) and d (1 ) are


Q conjugate.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 15 / 17


Outlines of a Lecture:

Proof:

1 For k = 0, the result holds trivially, that is, d (0 ) and d (1 ) are


Q conjugate.
4 x (0 )
2 As α0 6= 0, so we can write d (0 ) = α0

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 15 / 17


Outlines of a Lecture:

Proof:

1 For k = 0, the result holds trivially, that is, d (0 ) and d (1 ) are


Q conjugate.
4 x (0 )
2 As α0 6= 0, so we can write d (0 ) = α0
(0 )
3 d (1 )T Qd (0 ) = (g (1 ) )T H1 Qd (0 ) = (g (1 ) )T H1 Q 4αx0 : use
d ( k ) = Hk g ( k )

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 15 / 17


Outlines of a Lecture:

Proof:

1 For k = 0, the result holds trivially, that is, d (0 ) and d (1 ) are


Q conjugate.
4 x (0 )
2 As α0 6= 0, so we can write d (0 ) = α0
(0 )
3 d (1 )T Qd (0 ) = (g (1 ) )T H1 Qd (0 ) = (g (1 ) )T H1 Q 4αx0 : use
d ( k ) = Hk g ( k )
(0 ) ) (0 ) ) (0 ) )
4 = (g (1 ) )T H1 Q (4α0x = (g (1 ) )T H1 Q (4
α0
x
= (g (1 ) )T H1 4αg0 :
Use 4g (i ) = Q 4x (i )

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 15 / 17


Outlines of a Lecture:

Proof:

1 For k = 0, the result holds trivially, that is, d (0 ) and d (1 ) are


Q conjugate.
4 x (0 )
2 As α0 6= 0, so we can write d (0 ) = α0
(0 )
3 d (1 )T Qd (0 ) = (g (1 ) )T H1 Qd (0 ) = (g (1 ) )T H1 Q 4αx0 : use
d ( k ) = Hk g ( k )
(0 ) ) (0 ) ) (0 ) )
4 = (g (1 ) )T H1 Q (4α0x = (g (1 ) )T H1 Q (4
α0
x
= (g (1 ) )T H1 4αg0 :
Use 4g (i ) = Q 4x (i )
(0 )
5 (g (1 ) )T 4αx0 = (g (1 ) )T d (0 ) = 0 ( Exercise 11.1) as a
consequence of α0 > 0 being the minimizer of f (x (0 ) + αd (0 ) ).

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 15 / 17


Outlines of a Lecture:

Proof:

1 Assume the result is true for k 1 ( note that k < n 1), that is,
d (0 ) , ..., d (k ) are Q conjugate

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 16 / 17


Outlines of a Lecture:

Proof:

1 Assume the result is true for k 1 ( note that k < n 1), that is,
d (0 ) , ..., d (k ) are Q conjugate
2 We now prove the result for k; that is, d (0 ) , ..., d (k +1 ) are
Q conjugate.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 16 / 17


Outlines of a Lecture:

Proof:

1 Assume the result is true for k 1 ( note that k < n 1), that is,
d (0 ) , ..., d (k ) are Q conjugate
2 We now prove the result for k; that is, d (0 ) , ..., d (k +1 ) are
Q conjugate.
3 It su¢ ces to show that (d (k +1 ) )T Qd (i ) = 0, 0 i k.

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 16 / 17


Outlines of a Lecture:

Proof:

1 Assume the result is true for k 1 ( note that k < n 1), that is,
d (0 ) , ..., d (k ) are Q conjugate
2 We now prove the result for k; that is, d (0 ) , ..., d (k +1 ) are
Q conjugate.
3 It su¢ ces to show that (d (k +1 ) )T Qd (i ) = 0, 0 i k.
4 x (i )
4 As αi 6= 0, so we can write d (i ) = αi .

M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 16 / 17


Outlines of a Lecture:

Continued:
1 So, given i, 0 i k, we have

d (k +1 )T Qd (i ) = (g (k +1 ) )T Hk +1 Qd (i )
4 x (i )
= (g (k +1 ) )T Hk +1 Q
αi
4g i )
(
= (g (k +1 ) )T Hk +1
αi
4x ( i )
= ( g (k +1 ) ) T
αi
(k +1 ) T (i )
= (g ) d .

Because d (0 ) , ..., d (k ) are Q conjugate by assumption, we conclude from


the result proved earlier that (g (k +1 ) )T d (i ) = 0. Hence,
(d (k +1 ) )T Qd (i ) = 0, which completes the proof.
M. Abbas (abbas.mujahid@gmail.com) (UP) QN method April 2023 17 / 17

You might also like