Newton Algorithm For Unconstrained Optimization: October 21, 2008

Lecture 14
Newton Algorithm for Unconstrained Optimization
October 21, 2008

Lecture 14
Outline
• Newton Method for System of Nonlinear Equations
• Newton’s Method for Optimization
• Classic Analysis
Convex Optimization 1
Lecture 14
Newton’s Method for System of Equations

• A numerical method for solving a system of equations
G(x) = 0, G : Rn → Rn
• When G is continuously differentiable, the classical Newton method is

based on a natural (local) approximation of G: linearization
• Given an iterate xk , the map G is approximated at xk by the following
linear map
L(x; xk ) = G(xk ) + JG(xk )(x − xk ),
where JG(x) is the Jacobian of G at x
• We have L(x; xk ) ≈ G(x)
• System G(x) = 0 is “replaced” by the system L(x; xk ) = 0
• The resulting solution is defining a new iterate xk+1,
xk+1 = xk − JG(xk )−1G(xk )
Lecture 14
Properties of Newton’s Method
xk+1 = xk − JG(xk )−1G(xk )
• Fast local convergence, but globally the method can fail

• When started far from a solution
Lecture 14
• Numerical instabilities occur when JG(x∗) is singular (or nearly singular)
• Two main properties that make the process work

• A linear model L(x; xk ) that provides good approximation of G near
xk , when xk is near a solution
• Solvability of linear equation L(x; xk ) = 0 when JG(xk ) is invertible
• These two properties are “guaranteed” when

• G is continuously differentiable and
• JG(x∗) is invertible (nonsingular)
Lecture 14
Convergence Rate Terminology
Definition 7.2.1 Let {xk } ⊆ Rn be a sequence converging to some

x∗ ∈ Rn. The convergence rate is said to be
• Q-linear if
kxk+1 − x∗k
lim sup ∗
<∞
k→∞ kxk − x k
• Q-superlinear if
kxk+1 − x∗k
lim sup ∗
=0
k→∞ kx k − x k
• Q-quadratic if
kxk+1 − x∗k
lim sup ∗ 2
<∞
k→∞ kxk − x k
• R-linear if
lim sup (kxk+1 − x∗k)1/k < 1
k→∞
Lecture 14
Unconstrained Minimization
minimize f (x)
Suppose that:
• The function f is convex and twice continuously differentiable over
domf
• The optimal value is attained: there exists x∗ such that
f (x∗) = inf f (x)

x
Newton Method can be applied to solve the corresponding optimality

condition
∇f (x∗) = 0,
resulting in xk+1 = xk − ∇f 2(xk )−1∇f (xk ).
This is known as pure Newton method
As discussed, in this form the method may not always converge.
Lecture 14
Newton’s direction
dk = −∇2f (xk )−1∇f (xk )
Interpretations:
• Second-order Taylor’s expansion at xk yields
T 1 T 2
2

f (xk + d) = f (xk ) + ∇f (xk ) d + d ∇ f (xk )d + o kdk
2
• The right-hand side without the small order term, provides a (quadratic)
approximation of f in a (small) neighborhood of xk
1
f (xk + d) ≈ f (xk ) + ∇f (xk )T d + dT ∇2f (xk )d
2
• Minimizing the quadratic approximation w/r to d yields:
∇f (xk )T + ∇2f (xk )d = 0
• Newton’s dk can be also viewed as solving linearized optimality condition
∇f (xk + d) ≈ ∇f (xk ) + ∇2f (xk )d = 0
Lecture 14
Newton Decrement
• Newton’s decrement at xk is defined by:

1/2
T 2 −1
λ(xk ) = ∇f (xk ) ∇ f (xk ) ∇f (xk )
• Provides a measure of the proximity of x to x∗

• Obtained by evaluating the difference between f (xk ) and the quadratic
approximation of f at xk evaluated at the optimal d (Newton’s direction)
1 1

f (xk ) − f (xk ) + ∇f (xk )T dk + dTk ∇2f (xk )dk = λ(xk )2
2 2
Properties:
• Equal to the norm of the Newton step in the quadratic Hessian norm
h i1/2
2
λ(xk ) = dk ∇ f (xk )dk = kdk k∇2f (xk )
• Affine invariant (unlike k∇f (x)k)
Lecture 14
Newton’s Method
Given a starting point x ∈ domf , error tolerance > 0, and parameters

σ ∈ (0, 1/2) and β ∈ (0, 1).
Repeat
1. Compute the Newton’s direction and decrement:
d := −∇2f (x)−1∇f (x), λ2 := ∇f (x)T ∇2f (x)−1∇f (x)
2. Stopping criterion: quit if λ2/2 ≤
3. Line search: Choose stepsize α by backtracking line search, i.e., starting
with α = 1 do
(a) If f (x + αd) < f (x) + σα∇f (x)T d go to Step 4
(b) Else α = βα and go to (b).

4. Update x := x + αd.
Lecture 14
Classical Convergence Analysis - Main Results

Assumption 1:
• The level set L0 = {x | f (x) ≤ f (x0)} is closed
• f strongly convex on L0 with a constant m
• ∇2f is Lipschitz continuous on L0, with a constant L > 0:
k∇2f (x) − ∇2f (y)k ≤ Lkx − yk2
(L measures how well f can be approximated by a quadratic function)
Analysis outline: there exists a constant η ∈ (0, 2m2/L) such that
• When k∇f (xk )k ≥ η , then f (xk+1) − f (xk ) ≤ −γ for γ = σβη 2 Mm2
• When k∇f (xk )k < η , then
2
L L

2
k∇f (xk+1)k2 ≤ 2
k∇f (xk )k2
2m 2m
Lecture 14
Two Phases of Netwon’s Method
Damped Newton phase (k∇f (x)k ≥ η ; far from x∗)

• Most iterations require backtracking steps
• Function value decreases by at least γ at each iteration
• This phase ends after at most (f (x0) − f ∗)/γ iterations
Quadratically convergent phase (k∇f (x)k < η ; locally near x∗)

• All iterations in this phase use stepsize α = 1
• The gradient k∇f (x)k converges to zero quadratically:
2l−k 2l−k
L L 1

2
k∇f (x l )k ≤ 2
k∇f (xk )k ≤ for l ≥ k
2m 2m 2
Lecture 14
Analysis of Newton Method
Theorem Let Assumption 1 hold. Then, there exists a constant η ∈

(0, 2m2/L) such that
• When k∇f (xk )k ≥ η , then f (xk+1) − f (xk ) ≤ −γ for γ = σβη 2 Mm2
• When k∇f (xk )k < η , then
2
L L

2
k∇f (xk+1)k2 ≤ 2
k∇f (xk )k2
2m 2m
Proof Let k∇f (xk )k ≥ η . Let estimate the stepsize αk at the end of the
backtracking line search. First note that {xk } ⊂ L0. Since f is strongly
convex, the level set L0 is bounded, hence ∇f is Lipschitz continuous over
L0 with some constant M .
By Q-approximation Lemma, we have for any α > 0
T α2 M
f (xk + αd) ≤ f (xk ) + α∇f (xk ) dk + kdk k2.
2
Lecture 14
1/2
Using the Newton decrement λk = ∇f (xk )T ∇2f (xk )−1∇f (xk )T , and
the fact dk = −∇2f (xk )−1∇f (xk ) we can write
α2 M
f (xk + αdk ) ≤ f (xk ) − αλ2k + kdk k2.
2
Note that
λ2k = ∇f (xk )T ∇2f (xk )−1∇f (xk ) = dTk ∇2f (xk )dk ≥ mkdk k2,
by strong convexity of f . Hence kdk k2 ≤ λ2k /m implying that
α2 M 2
f (xk + αdk ) ≤ f (xk ) − αλ2k
+ λk
2m
αM

≤ f (xk ) − α 1 − λ2k .
2m
Lecture 14
m
Thus the stepsize α = M satisfies the exit condition in the backtracking
m
line search (with σ ≤ 1/2), since for α = M we have
m 2 m
f (xk + αdk ) ≤ f (xk ) − λk < f (xk ) − σ λ2k .
2M M
m m
Therefore, the backtracking line search stops with some αk ≥ M
≥ βM .
Thus, when the line search is exited with step αk , we have
m 2
f (xk + αk dk ) < f (xk ) − σαk λ2k ≤ f (xk ) − σβ λk .
M
Since ∇2f (x) ≤ M I , we have
−1 1
λ2k T 2
= ∇f (xk ) ∇ f (xk ) T
∇f (xk ) ≥ k∇f (xk )k.
M
Lecture 14
Hence, when k∇f (xk )k ≤ η , we have
m 2
f (xk + αk dk ) < f (xk ) − σβ 2
η .
M
Suppose now k∇f (xk )k ≤ η .

Strong Q-lemma: For a continuously differentiable function g over Rn
with Lipschitz gradients with constant L, we have
L
kg(x + y) − g(x) − ∇g(x)T yk ≤ kyk2.
2
Applying this lemma to ∇f (x), we have
L
k∇f (xk + d) − ∇f (x) − ∇ f (x)dk ≤ kdk2.
2
2
Lecture 14
letting d be the Newton direction, we obtain
L 2 L 2
k∇f (xk+1)k ≤ k∇ f (xk ) ∇f (xk )k ≤ k∇ f (xk )−1k2 k∇f (xk )k2.
−1 2
2 2
Since mI ≤ ∇f 2(x), it follows
L 2
k∇f (xk+1)k ≤ 2
k∇f (x k )k .
2m
Hence, k∇f (xk+1)k ≤ η . Furthermore, for any K ≥ 0,
2 2K
2m2 L L

k∇f (xK+1)k ≤ 2
k∇f (xK )k ≤ ··· ≤ 2
k∇f (xk )k ,
L 2m 2m
showing the quadratic convergence rate.
Lecture 14
Conclusions
The number of iterations until f (x) − f ∗ ≤ is bounded above by
f (x0) − f ∗ 0

+ log2 log2
γ
• γ = σβη 2 Mm2 , 0 = 2m3/L2

• The second term is small (of the order of 6) and almost constant for
practical purposes:
six iterations of the quadratically convergent phase results in accuracy
≈ 5 · 10−200
• In practice, the constants m, L (hence γ , 0) are usually unknown
• The analysis provides qualitative insight in the convergence properties
(i.e., explains two algorithm phases)

Newton Algorithm For Unconstrained Optimization: October 21, 2008

Uploaded by

Copyright:

Available Formats

You might also like

Newton Algorithm For Unconstrained Optimization: October 21, 2008

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Newton Algorithm For Unconstrained Optimization: October 21, 2008

Uploaded by

Copyright:

Available Formats

Lecture 14

Newton Algorithm for Unconstrained Optimization

October 21, 2008

• Newton Method for System of Nonlinear Equations

• Newton’s Method for Optimization

Newton’s Method for System of Equations

• When G is continuously differentiable, the classical Newton method is

Properties of Newton’s Method

xk+1 = xk − JG(xk )−1G(xk )

• Fast local convergence, but globally the method can fail

• Numerical instabilities occur when JG(x∗) is singular (or nearly singular)

• Two main properties that make the process work

• These two properties are “guaranteed” when

Convergence Rate Terminology

Definition 7.2.1 Let {xk } ⊆ Rn be a sequence converging to some

f (x∗) = inf f (x)

Newton Method can be applied to solve the corresponding optimality

• Newton’s decrement at xk is defined by:

• Provides a measure of the proximity of x to x∗

Given a starting point x ∈ domf , error tolerance  > 0, and parameters

(a) If f (x + αd) < f (x) + σα∇f (x)T d go to Step 4

(b) Else α = βα and go to (b).

Classical Convergence Analysis - Main Results

Two Phases of Netwon’s Method

Damped Newton phase (k∇f (x)k ≥ η ; far from x∗)

Quadratically convergent phase (k∇f (x)k < η ; locally near x∗)

Analysis of Newton Method

Theorem Let Assumption 1 hold. Then, there exists a constant η ∈

by strong convexity of f . Hence kdk k2 ≤ λ2k /m implying that

Since ∇2f (x) ≤ M I , we have

Hence, when k∇f (xk )k ≤ η , we have

Suppose now k∇f (xk )k ≤ η .

Applying this lemma to ∇f (x), we have

letting d be the Newton direction, we obtain

Since mI ≤ ∇f 2(x), it follows

Hence, k∇f (xk+1)k ≤ η . Furthermore, for any K ≥ 0,

showing the quadratic convergence rate.

• γ = σβη 2 Mm2 , 0 = 2m3/L2

You might also like

Given a starting point x ∈ domf , error tolerance > 0, and parameters

• γ = σβη 2 Mm2 , 0 = 2m3/L2