Professional Documents
Culture Documents
E1 251 Linear and Nonlinear Op2miza2on: Chapter 10: Quasi - Newton Method
E1 251 Linear and Nonlinear Op2miza2on: Chapter 10: Quasi - Newton Method
1
10.1.
The
basic
idea
x k+1 = x k − α k S k ∇f (x k )
S k : Approximation of the inverse-Hessian
Positive definiteness of S k guarentees the existence of α k such
that f (x k+1 ) < f (x k ).
gTkS k g k
αk = T
g k S k QS k g k
2
10.2
Convergence
rate
for
quadra@c
func@on
1 1 T
E(x k ) = (x k − x ) Q(x k − x ) = y k Qy k , where y k = x k − x* .
* T *
2 2
1
E(x k+1 ) = (x k+1 − x* )T Q(x k+1 − x* )
2
Substitute x k+1 = x k − α k S k g k :
1
E(x k+1 ) = (x k − α k S k g k − x* )T Q(x k − α k S k g k − x* )
2
1
= (y k − α k S k g k )T Q(y k − α k S k g k )
2
1 T 1 2 T
= y k Qy k − α k g k S k Qy k + α k g k S k QS k g k
T
2 2
⎡ 2α k g Tk S k Qy k − α k2 g Tk S k QS k g k ⎤ ⎛ 1 T ⎞
= ⎢1 − T ⎥ ⎜⎝ y k Qy ⎟⎠
⎣ y k Qy k ⎦ 2
3
Substitute for α k :
⎡ ⎛ gk Sk gk ⎞ T
T
⎛ gk Sk gk ⎞ T
T 2
⎤
⎢ 2⎜ T ⎟ g k S k Qy k − ⎜ T ⎟ g k S k QS k g k ⎥
⎢ ⎝ g k S k QS k g k ⎠ ⎝ g k S k QS k g k ⎠ ⎥ E(x )
= ⎢1 − ⎥ k
y Tk Qy k
⎢ ⎥
⎢⎣ ⎥⎦
Using the relation Qy k = g k we get
⎡ ( ) ( ) ⎤
T 2 T 2
gk Sk gk gk Sk gk
⎢ 2 T − T ⎥
⎢ g k S k QS k g k g k S k QS k g k ⎥
= ⎢1 − ⎥ E(x k )
g Tk Q−1g k
⎢ ⎥
⎢⎣ ⎥⎦
⎡ ( ) ⎤
T 2
gk Sk gk
= ⎢1 − T ⎥ E(x k ) = [1 − γ k ] E(x k )
⎢
⎣ ( )(
g k S k QS k g k g Tk Q−1g k ) ⎥
⎦
4
( g Sg ) T 2
γk =
k k k
.
( g S QS g )( g Q
T
k k k k
T
k
−1
gk )
Let Tk = S k1/2QS k1/2 and p k = S k1/2 g k gives
(p pk )
T 2
γk =
k
.
(p k
T
Tk p k ) ( p k Tk p k )
T −1
5
10.3
Construc@ng
and
upda@ng
the
inverse-‐
Hessian
1) WKT for a quadratic function, (1/2)xT Qx − xT b, the Hessian is
constant and is given by F = Q, and ∇f (x) = Fx − b.
7
10.5.
The
rank
one
correc@on
H k +1 = Hk +
( p k − Hk qk )( p k − Hk qk )
T
qTk ( p k − H k q k )
Check:
H k +1q k = H k +
( p k − Hk qk )( p k − Hk qk )
T
qk
qk ( p k − Hk qk )
T
= Hk qk +
( p k − Hk qk )( p k − Hk qk ) qk
T
qTk ( p k − H k q k )
= Hk qk + ( p k − Hk qk ) = p k
8
To verify H k+1qi = p i , for i ∈{0,..., k}...........(E1)
Step 1: verify for k = 0
which requires only to verify H k+1q k = p k :
done in the previous page
Step 2: Assume H k qi = p i , for i ∈{0,..., k − 1}....(E2)
Step 3: Prove H k+1qi = p i , for i ∈{0,..., k}
3.a) Verify H k+1q k = p k − Done
3.b) Verify H k+1qi = p i , for i ∈{0,..., k − 1}
9
Verify H k +1qi = p i , for i ∈{0,..., k − 1}
H k +1qi = H k qi +
( p k − Hk qk )( p k − Hk qk )
T
qi
qk ( p k − Hk qk )
T
( p k − H k q k )( pTkqi − qTkH k qi )
= H k qi + .
qk ( p k − Hk qk )
T
H k+1 = H k +
( p k − Hk qk )( p k − Hk qk )
T
qTk ( p k − H k q k )
qTk ( p k − H k q k ) : may not be positive, hence H k+1 may not
be positive definite.
qTk ( p k − H k q k ) may be close to zero, and hence H k+1 may
be ill-conditioned.
11
10.6.
Advanced
methods
Davidon-‐Fletcher-‐Powell
(DFP)
method
Op@miza@on
problem:
B k : current estimate of Hessian
B k+1 : next refined estimate of Hessian
arg min
B k+1 = fM ( B k − B ) subject to Bp k = q k and BT = B
B
where fM (• ) = W −1/2 (•)W −1/2 F
(weighted Frobenius norm), where
1
W = ∫ F(x k + τα k p k )dτ .
0
Solu@on:
B k+1 = (I − ρ k q k p k T )B k (I − ρ k p k q k T ) + ρ k q k qT k , ρ k = 1 / (qTk p k ).
−1 p k p k T H k q k qT k H k
H k+1 = B k+1 = Hk + T − T , where H k = B−1
k
pk qk q k Hk qk 12
Broyden,
Fletcher,
Goldfab,
and
Shanno
(BFGS)
methed
Op@miza@on
problem:
arg min
H k+1 = fM ( H k − H ) subject to Hq k = p k and HT = H
H
where fM (• ) = W1/2 (•)W1/2 F
(weighted Frobenius norm), where
1
W = ∫ F(x k + τα k p k )dτ .
0
Solu@on:
H k+1 = (I − ρ k p k q k T )H k (I − ρ k q k p k T ) + ρ k p k p T k , ρ k = 1 / (qTk p k ).
−1 B k p k p k T B k q k qT k −1
B k+1 = H k+1 = Bk − T
+ T
, where H k = Bk
pk qk q k qk
13
Broyden
Family
B k p k p k T B k q k qT k
B k+1 = B k − T
+ T
+ φ k (p k
T
B p
k k )v T
k k ,
v
p k Bk p k p k qk
!#### "#### $
B k+1BFGS
⎡ qk Bk p k ⎤
vk = ⎢ T − T ⎥
⎣ k k
q p p k k k ⎦
B p
Alternative expressions:
B k+1 = φ B k+1DFP + (1− φ )B k+1BFGS
H k+1 = (1− φ )H k+1DFP + φ H k+1BFGS
φk =
1
, µk =
( q B q )( p B p )
k
T
−1
k k
T
k k k
1− µ k (p q )
T 2
k k 14
Theorem 10.6A:
Consider quadratic function f (x) = 0.5xT Qx − xT b with positive
definite Q. Let {H k } represent the sequence of inverse Hessian
approximations obtained using Broyden formula with φk ∈[0,1]
with postive definite starting matrix H 0 . let λ j { }
(k ) n
j=1
be the eigen
15
Theorem 10.6B:
Consider quadratic function f (x) = 0.5xT Qx − xT b with positive
definite Q. Let {H k } represent the sequence of inverse Hessian
approximations obtained using Broyden formula with φk ≥ φk
with postive definite starting matrix H 0 . If the step size is computated
by means of exact line search formula, then the following hold:
(i) The iterates are independent of φk and converges to the solution
in at most n steps.
(ii) B k p j = q j , j = 1,2,..., k − 1.
(iii) If B0 = I, the iterates are identical to the those generated by
conjugate gradient method, and the search directions {p j } are
Q conjugate.
(iv) If n iterations are performed, we have Bn = Q.
16
Broyden
method
for
non-‐quadra@c
func@ons
Intialization: x 0 , g 0 = ∇f (x 0 ), H 0 = I.
For k = 0,..., N − 1 do
(1) d k = −H k g k
(2) Compute α k using Wolfe's condition
(3) x k+1 = x k + α k d k , g k+1 = ∇f (x k+1 )
p k = x k+1 − x k , q k = g k+1 − g k
(4) Compute H k+1 from H k ,p k , and q k using
Broyden update formula
The relation H k+1q j = p j is ensured only for j = k.
17
Convergence of Broyden method with φk = 0
Theorem 10.6C:
Let the function f (x) be strictly convex function. Then the sequence {x k }
generated by the Broyden method with φk = 0 and using Wolfe's line
search converges to the minimum of f (x).
18