7 Newton Raphson Method

Newton Raphson Method
• Better performance than the Steepest descent method due to the use of first and second derivative.
• However this happens when the initial guess is nearer to the minima.
• The functions must be in a form of 𝑓 𝑥 = 0
• Question: Can you combine any other method with Newton-Raphson so that its performance can
be improved.
• At each point, a quadratic approximation of the original function is used

𝑘 𝑘 𝑇 𝑘
1 𝑘 𝑇
𝑓 𝑥 ≈𝑓 𝑥 + 𝑥−𝑥 𝛻𝑓 𝑥 + 𝑥 − 𝑥 𝐹 𝑥𝑘 𝑥 − 𝑥𝑘
2
≜ 𝑞(𝑥)
Here, 𝐹 𝑥 𝑘 = 𝛻 2 𝑓(𝑥 𝑘 ) = Hessian
• Use the FONC: 𝛻𝑞 𝑥 = 0

⇒ 𝛻𝑓 𝑥 𝑘 + 𝐹 𝑥 𝑘 𝑥 − 𝑥 𝑘 = 0
This is Newton’s formula as previously discussed.
For 𝐹 𝑥 𝑘 > 0 at every point, it will converge to zero.
• The update equation

−1
𝑥 𝑘+1 = 𝑥 𝑘 − 𝐹 𝑥 𝑘 𝛻𝑓(𝑥 𝑘 )
• Here the order of the terms are important since these are matrix operations.
• If we start far from the solution the convergence is not guaranteed.

• Let us consider a function:
𝜙 𝛼 = 𝑓(𝑥 𝑘 + 𝛼𝑑 𝑘 ),
𝑘 𝑘 −1
where 𝑑 = −𝐹 𝑥 𝛻𝑓 𝑥 𝑘 = 𝑥 𝑘+1 − 𝑥 𝑘 is the search direction.
• Differentiating
𝑇 𝑘
𝜙′ 𝛼 = 𝛻𝑓 𝑥𝑘 + 𝛼𝑑 𝑘 𝑑
𝑇 𝑘 𝑇 𝑘
𝜙′ 𝛼 = 𝛻𝑓 𝑥𝑘 + 𝑘
𝛼𝑑 𝑑 ⇒ 𝜙′ 0 = 𝛻𝑓 𝑥 𝑘 𝑑
𝑘 𝑇 𝑘 −1
= −𝛻𝑓 𝑥 𝐹 𝑥 𝛻𝑓 𝑥 𝑘 < 0
𝒌 −𝟏
for 𝑭 𝒙 > 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎
• This means that the slope of 𝜙(𝛼) at 0 is negative => The function is decreasing.
• Hence it is possible to find an 𝛼 ∈ 0, 𝛼ത for which 𝜙 𝛼 < 𝜙(0)
• Which means 𝑓 𝑥 𝑘 + 𝛼𝑑𝑘 < 𝑓(𝑥 𝑘 )
𝒌 −𝟏
• Hence 𝑭 𝒙 > 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎 are necessary criteria for convergence.
𝑘 𝑇 𝑘 −1
−𝛻𝑓 𝑥 𝐹 𝑥 𝛻𝑓 𝑥𝑘 < 0
𝒌 −𝟏
for 𝑭 𝒙 < 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎
• This means that the slope of 𝜙(𝛼) at 0 is negative => The function is decreasing.
• Hence it is possible to find an 𝛼 ∈ 0, 𝛼ത for which
𝜙 𝛼 < 𝜙 0 ⇒ 𝑓 𝑥 𝑘 + 𝛼𝑑𝑘 < 𝑓(𝑥 𝑘 )
𝒌 −𝟏
• Which means Hence 𝑭 𝒙 > 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎 are necessary criteria for
convergence.
• The update equation can be modified to add a learning rate
𝑘+1 𝑘 𝑘 −1
𝑥 = 𝑥 − 𝛼𝐹 𝑥 𝛻𝑓 𝑥 𝑘
• Some disadvantages:
𝑘 −1
• The 𝐹 𝑥 matrix should be invertible.
• For a large n it becomes computationally expensive.
• We need to start at a sufficiently close range.
Levenberg Marquardt Modification
−1
𝑥 𝑘+1 = 𝑥 𝑘 − 𝛼𝐹 𝑥 𝑘 𝛻𝑓 𝑥 𝑘
𝑘 −1
• Disadvantage: The 𝐹 𝑥 matrix should be invertible.
−𝟏
Solution: 𝒙𝒌+𝟏 = 𝒙𝒌 − 𝜶 𝑭 𝐱 𝐤 + 𝝁𝒌 𝑰 𝜵𝒇 𝒙𝒌
• Where 𝜇𝑘 ≥ 0.
• If 𝜇𝑘 → 0, it approaches Newton’s Method
• If 𝜇𝑘 → ∞, it approaches Steepest descent.
Example
• Using Newton Raphson Method to minimize Powell function
𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4
Start with 𝑥0 = 3, −1,0,1

Example
𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4
1. Find Gradient
Example
𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4
2. Find the Hessian

Example
𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4
3. Start with 𝑥0 = 3, −1,0,1 and do iteration 1 and repeat until 𝛻𝑓 𝑥 = 0 or stopping criterion.
Newton’s Method for Curve Fitting
Least Square Method

• A number of data points, 𝑦, are given and
are required to fit them in a function 𝑞.
• Example: m data points collected over a
time duration as shown in the figure are
given.
• We need to fit them into a 3rd order
polynomial 𝑞 𝑡 = 𝑎𝑡 3 + 𝑏𝑡 2 + 𝑐𝑡 + 𝑑,
where 𝑎, 𝑏, 𝑐, 𝑑 are the unknown
coefficients.
• We need to determine 𝑎, 𝑏, 𝑐, 𝑑.
• The unknown vector 𝑥 = 𝑎, 𝑏, 𝑐, 𝑑 𝑇 = 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 𝑇 , we can take some initial assumption
𝑇
𝑥0 = 𝑥1 0 , 𝑥2 0 , 𝑥3 0 , 𝑥4 0
• And the function 𝑞(𝑡), can be written in terms of 𝑥 as
𝑝 𝑥 = 𝑥1 𝑡 3 + 𝑥2 𝑡 2 + 𝑥3 𝑡 + 𝑥4
• Define the error between the actual data and the estimated data as
𝑟𝑖 𝑥 = 𝑦𝑖 − 𝑝𝑖 𝑥 = 𝑦𝑖 − 𝑥1 𝑡𝑖3 − 𝑥2 𝑡𝑖2 − 𝑥3 𝑡𝑖 − 𝑥4
• The cost function that has to be minimized can be formulated as
𝑚
2
𝑓 𝑥 = ෍ 𝑟𝑖 𝑥
𝑖=1
• The optimization problem is now 𝑚
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ෍ 𝑟𝑖 𝑥 2
𝑖=1
• Define 𝑟 = 𝑟1 𝑟2 … 𝑟𝑚 𝑇 and 𝑓 𝑥 = 𝑟 𝑇 𝑟 = σ𝑚
𝑖=1 𝑟𝑖 𝑥
2.
• Now the gradient and the Hessian can be found as 𝑚

𝜕𝑓(𝑥) 𝜕𝑟𝑖
𝛻𝑓 𝑥 𝑗 = = ෍ 2𝑟𝑖
𝜕𝑥𝑗 𝜕𝑥𝑗
𝑖=1
𝜕𝑟1 𝜕𝑟1 𝜕𝑟1 𝜕𝑟1

…
𝜕𝑥1 𝜕𝑥2 𝜕𝑥3 𝜕𝑥𝑛
.
𝐷𝑒𝑓𝑖𝑛𝑖𝑛𝑔, 𝐽 𝑥 = .
.
𝜕𝑟𝑚 𝜕𝑟𝑚 𝜕𝑟𝑚 𝜕𝑟𝑚
…
𝜕𝑥1 𝜕𝑥2 𝜕𝑥3 𝜕𝑥𝑛 𝑚×𝑛
This is the Jacobian Matrix.
• Hessian
𝜕 𝜕𝑓
𝐹 𝑥 =
𝜕𝑥𝑘 𝜕𝑥𝑗
𝑚
𝜕 𝜕𝑟𝑖
= ෍ 2𝑟𝑖
𝑖=1
𝑚
𝜕𝑟𝑖 𝜕𝑟𝑖 𝜕 2 𝑟𝑖
=2 ෍ + 𝑟𝑖
𝜕𝑥𝑘 𝜕𝑥𝑗 𝜕𝑥𝑘 𝜕𝑥𝑗
𝑖=1
= 2( 𝐽𝑇 𝐽 + 𝑆)
𝜕 2 𝑟𝑖
• Here, S = S(x) = 𝑟𝑖 and it can be ignored as its contribution becomes negligible.
• Hence the update equation:

𝑥𝑘+1 = 𝑥𝑘 − 𝐽𝑇 𝐽 + 𝑆 −1 𝑇
𝐽 𝑟
OR
𝑥𝑘+1 = 𝑥𝑘 − 𝐽𝑇 𝐽 −1 𝐽𝑇 𝑟 Gauss Newton Method

OR
𝑥𝑘+1 = 𝑥𝑘 − 𝐽𝑇 𝐽 + 𝜇𝑘 𝐼 −1 𝐽𝑇 𝑟
Levenberg Marquardt Algorithm
Newton’s Method for Curve Fitting: Dimensions
• If there are 100 data points and 𝑥 = 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 𝑇
• Then the dimensions of the update equation will be
𝑇 −1 𝑇
𝑥𝑘+1 𝟒×𝟏 = 𝑥𝑘(𝟒×𝟏) − 𝐽𝟒×𝟏𝟎𝟎 𝐽𝟏𝟎𝟎×𝟒 + 𝜇𝑘 𝐼𝟒×𝟒 𝟒×𝟒 𝐽𝟒×𝟏𝟎𝟎 𝑟𝟏𝟎𝟎×𝟏

7 Newton Raphson Method

Uploaded by

Copyright:

Available Formats

You might also like

7 Newton Raphson Method

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

7 Newton Raphson Method

Uploaded by

Copyright:

Available Formats

Newton Raphson Method

Newton Raphson Method

• The functions must be in a form of 𝑓 𝑥 = 0

• At each point, a quadratic approximation of the original function is used

Here, 𝐹 𝑥 𝑘 = 𝛻 2 𝑓(𝑥 𝑘 ) = Hessian

• Use the FONC: 𝛻𝑞 𝑥 = 0

• The update equation

• If we start far from the solution the convergence is not guaranteed.

𝜙 𝛼 < 𝜙 0 ⇒ 𝑓 𝑥 𝑘 + 𝛼𝑑𝑘 < 𝑓(𝑥 𝑘 )

• The update equation can be modified to add a learning rate

• Using Newton Raphson Method to minimize Powell function

Start with 𝑥0 = 3, −1,0,1

• Using Newton Raphson Method to minimize Powell function

• Using Newton Raphson Method to minimize Powell function

2. Find the Hessian

• Using Newton Raphson Method to minimize Powell function

Least Square Method

• The unknown vector 𝑥 = 𝑎, 𝑏, 𝑐, 𝑑 𝑇 = 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 𝑇 , we can take some initial assumption

• And the function 𝑞(𝑡), can be written in terms of 𝑥 as

• The cost function that has to be minimized can be formulated as

• Now the gradient and the Hessian can be found as 𝑚

𝜕𝑟1 𝜕𝑟1 𝜕𝑟1 𝜕𝑟1

• Hence the update equation:

𝑥𝑘+1 = 𝑥𝑘 − 𝐽𝑇 𝐽 −1 𝐽𝑇 𝑟 Gauss Newton Method

• If there are 100 data points and 𝑥 = 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 𝑇

• Then the dimensions of the update equation will be

You might also like