7 Newton Raphson Method

Newton Raphson Method

Newton Raphson Method

• Better performance than the Steepest descent method due to the use of first and second derivative.

• However this happens when the initial guess is nearer to the minima.

• The functions must be in a form of 𝑓 𝑥 = 0

• Question: Can you combine any other method with Newton-Raphson so that its performance can
be improved.
Newton Raphson Method

• At each point, a quadratic approximation of the original function is used

𝑘 𝑘 𝑇 𝑘
1 𝑘 𝑇
𝑓 𝑥 ≈𝑓 𝑥 + 𝑥−𝑥 𝛻𝑓 𝑥 + 𝑥 − 𝑥 𝐹 𝑥𝑘 𝑥 − 𝑥𝑘
≜ 𝑞(𝑥)

Here, 𝐹 𝑥 𝑘 = 𝛻 2 𝑓(𝑥 𝑘 ) = Hessian

• Use the FONC: 𝛻𝑞 𝑥 = 0

⇒ 𝛻𝑓 𝑥 𝑘 + 𝐹 𝑥 𝑘 𝑥 − 𝑥 𝑘 = 0
This is Newton’s formula as previously discussed.
For 𝐹 𝑥 𝑘 > 0 at every point, it will converge to zero.
Newton Raphson Method

• The update equation

𝑥 𝑘+1 = 𝑥 𝑘 − 𝐹 𝑥 𝑘 𝛻𝑓(𝑥 𝑘 )
• Here the order of the terms are important since these are matrix operations.
Newton Raphson Method

• If we start far from the solution the convergence is not guaranteed.

• Let us consider a function:
𝜙 𝛼 = 𝑓(𝑥 𝑘 + 𝛼𝑑 𝑘 ),
𝑘 𝑘 −1
where 𝑑 = −𝐹 𝑥 𝛻𝑓 𝑥 𝑘 = 𝑥 𝑘+1 − 𝑥 𝑘 is the search direction.
• Differentiating
𝑇 𝑘
𝜙′ 𝛼 = 𝛻𝑓 𝑥𝑘 + 𝛼𝑑 𝑘 𝑑
Newton Raphson Method

𝑇 𝑘 𝑇 𝑘
𝜙′ 𝛼 = 𝛻𝑓 𝑥𝑘 + 𝑘
𝛼𝑑 𝑑 ⇒ 𝜙′ 0 = 𝛻𝑓 𝑥 𝑘 𝑑
𝑘 𝑇 𝑘 −1
= −𝛻𝑓 𝑥 𝐹 𝑥 𝛻𝑓 𝑥 𝑘 < 0
𝒌 −𝟏
for 𝑭 𝒙 > 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎
• This means that the slope of 𝜙(𝛼) at 0 is negative => The function is decreasing.
• Hence it is possible to find an 𝛼 ∈ 0, 𝛼ത for which 𝜙 𝛼 < 𝜙(0)
• Which means 𝑓 𝑥 𝑘 + 𝛼𝑑𝑘 < 𝑓(𝑥 𝑘 )
𝒌 −𝟏
• Hence 𝑭 𝒙 > 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎 are necessary criteria for convergence.
Newton Raphson Method

𝑘 𝑇 𝑘 −1
−𝛻𝑓 𝑥 𝐹 𝑥 𝛻𝑓 𝑥𝑘 < 0
𝒌 −𝟏
for 𝑭 𝒙 < 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎
• This means that the slope of 𝜙(𝛼) at 0 is negative => The function is decreasing.
• Hence it is possible to find an 𝛼 ∈ 0, 𝛼ത for which

𝜙 𝛼 < 𝜙 0 ⇒ 𝑓 𝑥 𝑘 + 𝛼𝑑𝑘 < 𝑓(𝑥 𝑘 )

𝒌 −𝟏
• Which means Hence 𝑭 𝒙 > 𝟎 and 𝛁𝒇 𝒙𝒌 ≠ 𝟎 are necessary criteria for
Newton Raphson Method

• The update equation can be modified to add a learning rate

𝑘+1 𝑘 𝑘 −1
𝑥 = 𝑥 − 𝛼𝐹 𝑥 𝛻𝑓 𝑥 𝑘

• Some disadvantages:
𝑘 −1
• The 𝐹 𝑥 matrix should be invertible.
• For a large n it becomes computationally expensive.
• We need to start at a sufficiently close range.
Levenberg Marquardt Modification
𝑥 𝑘+1 = 𝑥 𝑘 − 𝛼𝐹 𝑥 𝑘 𝛻𝑓 𝑥 𝑘
𝑘 −1
• Disadvantage: The 𝐹 𝑥 matrix should be invertible.
Solution: 𝒙𝒌+𝟏 = 𝒙𝒌 − 𝜶 𝑭 𝐱 𝐤 + 𝝁𝒌 𝑰 𝜵𝒇 𝒙𝒌
• Where 𝜇𝑘 ≥ 0.
• If 𝜇𝑘 → 0, it approaches Newton’s Method
• If 𝜇𝑘 → ∞, it approaches Steepest descent.

• Using Newton Raphson Method to minimize Powell function

𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4

Start with 𝑥0 = 3, −1,0,1


• Using Newton Raphson Method to minimize Powell function

𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4

1. Find Gradient

• Using Newton Raphson Method to minimize Powell function

𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4

2. Find the Hessian


• Using Newton Raphson Method to minimize Powell function

𝑓 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 = 𝑥1 + 10𝑥2 2 + 5 𝑥3 − 𝑥4 2 + 𝑥2 − 2𝑥3 4 + 10 𝑥1 − 𝑥4 4

3. Start with 𝑥0 = 3, −1,0,1 and do iteration 1 and repeat until 𝛻𝑓 𝑥 = 0 or stopping criterion.
Newton’s Method for Curve Fitting

Least Square Method

• A number of data points, 𝑦, are given and
are required to fit them in a function 𝑞.
• Example: m data points collected over a
time duration as shown in the figure are
• We need to fit them into a 3rd order
polynomial 𝑞 𝑡 = 𝑎𝑡 3 + 𝑏𝑡 2 + 𝑐𝑡 + 𝑑,
where 𝑎, 𝑏, 𝑐, 𝑑 are the unknown
• We need to determine 𝑎, 𝑏, 𝑐, 𝑑.
Newton’s Method for Curve Fitting

• The unknown vector 𝑥 = 𝑎, 𝑏, 𝑐, 𝑑 𝑇 = 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 𝑇 , we can take some initial assumption

𝑥0 = 𝑥1 0 , 𝑥2 0 , 𝑥3 0 , 𝑥4 0

• And the function 𝑞(𝑡), can be written in terms of 𝑥 as

𝑝 𝑥 = 𝑥1 𝑡 3 + 𝑥2 𝑡 2 + 𝑥3 𝑡 + 𝑥4

• Define the error between the actual data and the estimated data as
𝑟𝑖 𝑥 = 𝑦𝑖 − 𝑝𝑖 𝑥 = 𝑦𝑖 − 𝑥1 𝑡𝑖3 − 𝑥2 𝑡𝑖2 − 𝑥3 𝑡𝑖 − 𝑥4
Newton’s Method for Curve Fitting

• The cost function that has to be minimized can be formulated as

𝑓 𝑥 = ෍ 𝑟𝑖 𝑥
• The optimization problem is now 𝑚

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ෍ 𝑟𝑖 𝑥 2


• Define 𝑟 = 𝑟1 𝑟2 … 𝑟𝑚 𝑇 and 𝑓 𝑥 = 𝑟 𝑇 𝑟 = σ𝑚
𝑖=1 𝑟𝑖 𝑥
Newton’s Method for Curve Fitting

• Now the gradient and the Hessian can be found as 𝑚

𝜕𝑓(𝑥) 𝜕𝑟𝑖
𝛻𝑓 𝑥 𝑗 = = ෍ 2𝑟𝑖
𝜕𝑥𝑗 𝜕𝑥𝑗

𝜕𝑟1 𝜕𝑟1 𝜕𝑟1 𝜕𝑟1

𝜕𝑥1 𝜕𝑥2 𝜕𝑥3 𝜕𝑥𝑛
𝐷𝑒𝑓𝑖𝑛𝑖𝑛𝑔, 𝐽 𝑥 = .
𝜕𝑟𝑚 𝜕𝑟𝑚 𝜕𝑟𝑚 𝜕𝑟𝑚

𝜕𝑥1 𝜕𝑥2 𝜕𝑥3 𝜕𝑥𝑛 𝑚×𝑛
This is the Jacobian Matrix.
Newton’s Method for Curve Fitting

• Hessian
𝜕 𝜕𝑓
𝐹 𝑥 =
𝜕𝑥𝑘 𝜕𝑥𝑗

𝜕 𝜕𝑟𝑖
= ෍ 2𝑟𝑖
𝜕𝑥𝑘 𝜕𝑥𝑗

𝜕𝑟𝑖 𝜕𝑟𝑖 𝜕 2 𝑟𝑖
=2 ෍ + 𝑟𝑖
𝜕𝑥𝑘 𝜕𝑥𝑗 𝜕𝑥𝑘 𝜕𝑥𝑗

= 2( 𝐽𝑇 𝐽 + 𝑆)
Newton’s Method for Curve Fitting

𝜕 2 𝑟𝑖
• Here, S = S(x) = 𝑟𝑖 and it can be ignored as its contribution becomes negligible.
𝜕𝑥𝑘 𝜕𝑥𝑗

• Hence the update equation:

𝑥𝑘+1 = 𝑥𝑘 − 𝐽𝑇 𝐽 + 𝑆 −1 𝑇
𝐽 𝑟

𝑥𝑘+1 = 𝑥𝑘 − 𝐽𝑇 𝐽 −1 𝐽𝑇 𝑟 Gauss Newton Method


𝑥𝑘+1 = 𝑥𝑘 − 𝐽𝑇 𝐽 + 𝜇𝑘 𝐼 −1 𝐽𝑇 𝑟
Levenberg Marquardt Algorithm
Newton’s Method for Curve Fitting: Dimensions

• If there are 100 data points and 𝑥 = 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 𝑇

• Then the dimensions of the update equation will be

𝑇 −1 𝑇
𝑥𝑘+1 𝟒×𝟏 = 𝑥𝑘(𝟒×𝟏) − 𝐽𝟒×𝟏𝟎𝟎 𝐽𝟏𝟎𝟎×𝟒 + 𝜇𝑘 𝐼𝟒×𝟒 𝟒×𝟒 𝐽𝟒×𝟏𝟎𝟎 𝑟𝟏𝟎𝟎×𝟏

