Professional Documents
Culture Documents
Newton Algorithm For Unconstrained Optimization: October 21, 2008
Newton Algorithm For Unconstrained Optimization: October 21, 2008
Newton Algorithm For Unconstrained Optimization: October 21, 2008
Outline
• Classic Analysis
Convex Optimization 1
Lecture 14
Convex Optimization 2
Lecture 14
Convex Optimization 3
Lecture 14
Convex Optimization 4
Lecture 14
Convex Optimization 5
Lecture 14
Unconstrained Minimization
minimize f (x)
Suppose that:
• The function f is convex and twice continuously differentiable over
domf
• The optimal value is attained: there exists x∗ such that
Convex Optimization 6
Lecture 14
Newton’s direction
dk = −∇2f (xk )−1∇f (xk )
Interpretations:
• Second-order Taylor’s expansion at xk yields
T 1 T 2
2
f (xk + d) = f (xk ) + ∇f (xk ) d + d ∇ f (xk )d + o kdk
2
• The right-hand side without the small order term, provides a (quadratic)
approximation of f in a (small) neighborhood of xk
1
f (xk + d) ≈ f (xk ) + ∇f (xk )T d + dT ∇2f (xk )d
2
• Minimizing the quadratic approximation w/r to d yields:
∇f (xk )T + ∇2f (xk )d = 0
• Newton’s dk can be also viewed as solving linearized optimality condition
∇f (xk + d) ≈ ∇f (xk ) + ∇2f (xk )d = 0
Convex Optimization 7
Lecture 14
Newton Decrement
1 1
f (xk ) − f (xk ) + ∇f (xk )T dk + dTk ∇2f (xk )dk = λ(xk )2
2 2
Properties:
• Equal to the norm of the Newton step in the quadratic Hessian norm
h i1/2
2
λ(xk ) = dk ∇ f (xk )dk = kdk k∇2f (xk )
• Affine invariant (unlike k∇f (x)k)
Convex Optimization 8
Lecture 14
Newton’s Method
Convex Optimization 9
Lecture 14
Convex Optimization 10
Lecture 14
2l−k 2l−k
L L 1
2
k∇f (x l )k ≤ 2
k∇f (xk )k ≤ for l ≥ k
2m 2m 2
Convex Optimization 11
Lecture 14
T α2 M
f (xk + αd) ≤ f (xk ) + α∇f (xk ) dk + kdk k2.
2
Convex Optimization 12
Lecture 14
1/2
Using the Newton decrement λk = ∇f (xk )T ∇2f (xk )−1∇f (xk )T , and
the fact dk = −∇2f (xk )−1∇f (xk ) we can write
α2 M
f (xk + αdk ) ≤ f (xk ) − αλ2k + kdk k2.
2
Note that
λ2k = ∇f (xk )T ∇2f (xk )−1∇f (xk ) = dTk ∇2f (xk )dk ≥ mkdk k2,
α2 M 2
f (xk + αdk ) ≤ f (xk ) − αλ2k
+ λk
2m
αM
≤ f (xk ) − α 1 − λ2k .
2m
Convex Optimization 13
Lecture 14
m
Thus the stepsize α = M satisfies the exit condition in the backtracking
m
line search (with σ ≤ 1/2), since for α = M we have
m 2 m
f (xk + αdk ) ≤ f (xk ) − λk < f (xk ) − σ λ2k .
2M M
m m
Therefore, the backtracking line search stops with some αk ≥ M
≥ βM .
Thus, when the line search is exited with step αk , we have
m 2
f (xk + αk dk ) < f (xk ) − σαk λ2k ≤ f (xk ) − σβ λk .
M
−1 1
λ2k T 2
= ∇f (xk ) ∇ f (xk ) T
∇f (xk ) ≥ k∇f (xk )k.
M
Convex Optimization 14
Lecture 14
m 2
f (xk + αk dk ) < f (xk ) − σβ 2
η .
M
L
kg(x + y) − g(x) − ∇g(x)T yk ≤ kyk2.
2
L
k∇f (xk + d) − ∇f (x) − ∇ f (x)dk ≤ kdk2.
2
2
Convex Optimization 15
Lecture 14
L 2 L 2
k∇f (xk+1)k ≤ k∇ f (xk ) ∇f (xk )k ≤ k∇ f (xk )−1k2 k∇f (xk )k2.
−1 2
2 2
L 2
k∇f (xk+1)k ≤ 2
k∇f (x k )k .
2m
2 2K
2m2 L L
k∇f (xK+1)k ≤ 2
k∇f (xK )k ≤ ··· ≤ 2
k∇f (xk )k ,
L 2m 2m
Convex Optimization 16
Lecture 14
Conclusions
The number of iterations until f (x) − f ∗ ≤ is bounded above by
f (x0) − f ∗ 0
+ log2 log2
γ
≈ 5 · 10−200
• In practice, the constants m, L (hence γ , 0) are usually unknown
• The analysis provides qualitative insight in the convergence properties
(i.e., explains two algorithm phases)
Convex Optimization 17