Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

E1

 251  Linear  and  Nonlinear  


Op2miza2on  
 

   Chapter  10:    Quasi-­‐Newton  method  

1  
10.1.  The  basic  idea  

x k+1 = x k − α k S k ∇f (x k )
S k : Approximation of the inverse-Hessian
Positive definiteness of S k guarentees the existence of α k such
that f (x k+1 ) < f (x k ).

When applied for the quadratic function f (x) = 0.5xT Qx − b T x,


the optimial α k is given by

gTkS k g k
αk = T
g k S k QS k g k

2  
10.2  Convergence  rate  for  quadra@c  func@on  
1 1 T
E(x k ) = (x k − x ) Q(x k − x ) = y k Qy k , where y k = x k − x* .
* T *

2 2
1
E(x k+1 ) = (x k+1 − x* )T Q(x k+1 − x* )
2
Substitute x k+1 = x k − α k S k g k :
1
E(x k+1 ) = (x k − α k S k g k − x* )T Q(x k − α k S k g k − x* )
2
1
= (y k − α k S k g k )T Q(y k − α k S k g k )
2
1 T 1 2 T
= y k Qy k − α k g k S k Qy k + α k g k S k QS k g k
T

2 2
⎡ 2α k g Tk S k Qy k − α k2 g Tk S k QS k g k ⎤ ⎛ 1 T ⎞
= ⎢1 − T ⎥ ⎜⎝ y k Qy ⎟⎠
⎣ y k Qy k ⎦ 2
3  
Substitute for α k :
⎡ ⎛ gk Sk gk ⎞ T
T
⎛ gk Sk gk ⎞ T
T 2

⎢ 2⎜ T ⎟ g k S k Qy k − ⎜ T ⎟ g k S k QS k g k ⎥
⎢ ⎝ g k S k QS k g k ⎠ ⎝ g k S k QS k g k ⎠ ⎥ E(x )
= ⎢1 − ⎥ k
y Tk Qy k
⎢ ⎥
⎢⎣ ⎥⎦
Using the relation Qy k = g k we get
⎡ ( ) ( ) ⎤
T 2 T 2
gk Sk gk gk Sk gk
⎢ 2 T − T ⎥
⎢ g k S k QS k g k g k S k QS k g k ⎥
= ⎢1 − ⎥ E(x k )
g Tk Q−1g k
⎢ ⎥
⎢⎣ ⎥⎦
⎡ ( ) ⎤
T 2
gk Sk gk
= ⎢1 − T ⎥ E(x k ) = [1 − γ k ] E(x k )

⎣ ( )(
g k S k QS k g k g Tk Q−1g k ) ⎥

4  
( g Sg ) T 2

γk =
k k k
.
( g S QS g )( g Q
T
k k k k
T
k
−1
gk )
Let Tk = S k1/2QS k1/2 and p k = S k1/2 g k gives

(p pk )
T 2

γk =
k
.
(p k
T
Tk p k ) ( p k Tk p k )
T −1

To get upper bound on γ k we need to minimize γ k over


p k . As did in Chapter 8, we get
4(ak Ak )
γ min = , where ak and Ak are the smallest and
(ak + Ak ) 2

largest eigen values of Tk .

5  
10.3  Construc@ng  and  upda@ng  the  inverse-­‐
Hessian  
1) WKT for a quadratic function, (1/2)xT Qx − xT b, the Hessian is
constant and is given by F = Q, and ∇f (x) = Fx − b.

2) Let g k = ∇f (x k ), p k = x k+1 − x k and q k = g k+1 − g k . Then q k = Fp k

3) Evaluation of q k and p k for k = 0,...,n − 1 completely


determines F or F −1 .

The main idea of all Quasi-Newton method is to construct, for


each k, an approximate inverse Hessian, H k+1 based on {q j ,p j } kj=0
in such a way that H k+1 satisfies
H k+1qi = p i for all 0 ≤ i ≤ k.
The update rule for H k+1 is contructed in such a way that H n = F −1 .
6  
10.4.  The  general  form  of    Quasi-­‐Newton  
algorithms  
Intialization: x 0 , g 0 = Fx 0 − b, H 0 = I.
For k = 0,..., n − 1 do
(1) d k = −H k g k
d Tk g k
(2) Compute α k = − T
d k Fd k
(3) x k +1 = x k + α k d k , g k +1 = g k + α k Fd k
p k = x k +1 − x k = α k d k , q k = g k +1 − g k = α k Fd k
(4) H k +1 = H k + U k , where, the correction term, U k ,
is chosen such that H k +1qi = p i for all 0 ≤ i ≤ k.

7  
10.5.  The  rank  one  correc@on  

The idea is to make H k +1 to satisfy H k +1q k = p k .

H k +1 = Hk +
( p k − Hk qk )( p k − Hk qk )
T

qTk ( p k − H k q k )
Check:

H k +1q k = H k +
( p k − Hk qk )( p k − Hk qk )
T

qk
qk ( p k − Hk qk )
T

= Hk qk +
( p k − Hk qk )( p k − Hk qk ) qk
T

qTk ( p k − H k q k )
= Hk qk + ( p k − Hk qk ) = p k

8  
To verify H k+1qi = p i , for i ∈{0,..., k}...........(E1)
Step 1: verify for k = 0
which requires only to verify H k+1q k = p k :
done in the previous page
Step 2: Assume H k qi = p i , for i ∈{0,..., k − 1}....(E2)
Step 3: Prove H k+1qi = p i , for i ∈{0,..., k}
3.a) Verify H k+1q k = p k − Done
3.b) Verify H k+1qi = p i , for i ∈{0,..., k − 1}

9  
Verify H k +1qi = p i , for i ∈{0,..., k − 1}

H k +1qi = H k qi +
( p k − Hk qk )( p k − Hk qk )
T

qi
qk ( p k − Hk qk )
T

( p k − H k q k )( pTkqi − qTkH k qi )
= H k qi + .
qk ( p k − Hk qk )
T

Since H k qi = p i (by (E2)),


( T
k
T
k )
H k +1qi = H k qi + y k p qi − q p i , .................(E3)
(
where y k = T
p k − Hk qk )
qk ( p k − Hk qk )
qTk p i = pTk Fp i = pTk qi .
This verifies H k +1qi = p i , for i ∈{0,..., k − 1}. 10  
The  main  issue  in  rank  one  correc@on  

H k+1 = H k +
( p k − Hk qk )( p k − Hk qk )
T

qTk ( p k − H k q k )
qTk ( p k − H k q k ) : may not be positive, hence H k+1 may not
be positive definite.
qTk ( p k − H k q k ) may be close to zero, and hence H k+1 may
be ill-conditioned.

11  
10.6.  Advanced  methods  
Davidon-­‐Fletcher-­‐Powell  (DFP)  method  
Op@miza@on  problem:  
B k : current estimate of Hessian
B k+1 : next refined estimate of Hessian

arg min
B k+1 = fM ( B k − B ) subject to Bp k = q k and BT = B
B
where fM (• ) = W −1/2 (•)W −1/2 F
(weighted Frobenius norm), where
1
W = ∫ F(x k + τα k p k )dτ .
0

Solu@on:  
B k+1 = (I − ρ k q k p k T )B k (I − ρ k p k q k T ) + ρ k q k qT k , ρ k = 1 / (qTk p k ).

−1 p k p k T H k q k qT k H k
H k+1 = B k+1 = Hk + T − T , where H k = B−1
k
pk qk q k Hk qk 12  
Broyden,  Fletcher,  Goldfab,  and  Shanno  (BFGS)  methed  
Op@miza@on  problem:  

H k : current estimate of inverse Hessian


H k+1 : next refined estimate of inverse Hessian

arg min
H k+1 = fM ( H k − H ) subject to Hq k = p k and HT = H
H
where fM (• ) = W1/2 (•)W1/2 F
(weighted Frobenius norm), where
1
W = ∫ F(x k + τα k p k )dτ .
0

Solu@on:  

H k+1 = (I − ρ k p k q k T )H k (I − ρ k q k p k T ) + ρ k p k p T k , ρ k = 1 / (qTk p k ).

−1 B k p k p k T B k q k qT k −1
B k+1 = H k+1 = Bk − T
+ T
, where H k = Bk
pk qk q k qk
13  
Broyden    Family  
B k p k p k T B k q k qT k
B k+1 = B k − T
+ T
+ φ k (p k
T
B p
k k )v T
k k ,
v
p k Bk p k p k qk
!#### "#### $
B k+1BFGS

⎡ qk Bk p k ⎤
vk = ⎢ T − T ⎥
⎣ k k
q p p k k k ⎦
B p

Alternative expressions:
B k+1 = φ B k+1DFP + (1− φ )B k+1BFGS
H k+1 = (1− φ )H k+1DFP + φ H k+1BFGS

Constraint on φk to ensure positive definiteness

φk =
1
, µk =
( q B q )( p B p )
k
T
−1
k k
T

k k k

1− µ k (p q )
T 2
k k 14  
Theorem 10.6A:
Consider quadratic function f (x) = 0.5xT Qx − xT b with positive
definite Q. Let {H k } represent the sequence of inverse Hessian
approximations obtained using Broyden formula with φk ∈[0,1]
with postive definite starting matrix H 0 . let λ j { }
(k ) n
j=1
be the eigen

values of the matrix Q1/2 H kQ1/2 . Then for all k we have


{ } { }
min λ j (k ) ,1 ≤ λ j (k+1) ≤ max λ j (k ) ,1 . Further, this property is not
satisfied if the Broyden parameter is outside the interval [0,1].

15  
Theorem 10.6B:
Consider quadratic function f (x) = 0.5xT Qx − xT b with positive
definite Q. Let {H k } represent the sequence of inverse Hessian
approximations obtained using Broyden formula with φk ≥ φk
with postive definite starting matrix H 0 . If the step size is computated
by means of exact line search formula, then the following hold:
(i) The iterates are independent of φk and converges to the solution
in at most n steps.
(ii) B k p j = q j , j = 1,2,..., k − 1.
(iii) If B0 = I, the iterates are identical to the those generated by
conjugate gradient method, and the search directions {p j } are
Q conjugate.
(iv) If n iterations are performed, we have Bn = Q.
16  
Broyden  method  for  non-­‐quadra@c  func@ons  
Intialization: x 0 , g 0 = ∇f (x 0 ), H 0 = I.
For k = 0,..., N − 1 do
(1) d k = −H k g k
(2) Compute α k using Wolfe's condition
(3) x k+1 = x k + α k d k , g k+1 = ∇f (x k+1 )
p k = x k+1 − x k , q k = g k+1 − g k
(4) Compute H k+1 from H k ,p k , and q k using
Broyden update formula
The relation H k+1q j = p j is ensured only for j = k.

17  
Convergence of Broyden method with φk = 0
Theorem 10.6C:
Let the function f (x) be strictly convex function. Then the sequence {x k }
generated by the Broyden method with φk = 0 and using Wolfe's line
search converges to the minimum of f (x).

18  

You might also like