Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

STA 5106 Computational Methods in Statistics I

STA 5106
Computational Methods
in Statistics I
Department of Statistics
Florida State University

Class 8
September 26, 2023

1
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Chapter 3

Non-Linear Statistical Methods

3.1 Introduction

2
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Non-linear Optimization
• This chapter is devoted to solving for the optimal points of
given functions, linear or nonlinear.

• An instance of nonlinear optimization problems in statistics


occurs in cases of maximum likelihood estimation.

• Example:
y = F(x, b) + ε
where x and y are random variables of interest, b is a constant
vector of parameters, and ε is a random error with the density
function given by fε.

• In many practical situations, we have the sampled values of y,


x, and our goal is to estimate b according to some criterion. 3
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Maximum Likelihood Estimation (MLE)


• Assuming sample size is n and different measurements are
independent of each other, b can be estimated as follows:
n
bˆ = arg max Õ f e ( yi - F ( xi , b))
b i =1

• The quantity on the right side of the above equation is called


the likelihood function.

• b̂ is called the maximum likelihood estimate (MLE) of b.


• In general, a value of b which maximizes the likelihood
function can be found out by seeking the roots of the first
derivative of the likelihood function.
4
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

3.2 Numerical Root Finding:


Scalar Case

5
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Goal
• We are given a real-valued function f(x) where x ∈ R and the
goal is to find the roots of f, i.e., find x∗ such that f(x∗) = 0.

• Due to the non-linearity of the cost function its roots are


found in an iterative way: an initial estimate is chosen and is
modified iteratively until some stopping criterion is met.

• There are many different methods for root-finding. They all


have the following three essential elements:
1. Some of determining the starting value, x0.
2. Given the i-th iterate some formula for calculating the
(i+1)-th iterate.
3. Some stopping criterion.
6
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Order of Convergence
• For each of these algorithms we are interested in finding their
rate of convergence towards the roots of the function.

• The rate of convergence is defined by a quantity called the


order of convergence.

• Definition 10 For a converging sequence {xi}, the order of


convergence is defined by β if,
Ei +1
lim b = K ,
i ®¥ E
i

where Ei = |xi − xi-1| and K is a constant.

7
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Simple Iterations
• Instead of solving for the roots of f we solve an equivalent
problem of finding the fixed point of another function g. g is
found in such a way that
f(x∗) = 0 ⇔ g(x∗) = x∗.

• The selection depends on which g is easier to handle and


solve for a fixed point. One choice which is always valid is
g(x) = x + f(x) .

• Let x0 be some starting value for the iterative search of the


roots of f. At the i-th stage of the algorithm, the next iterate is
given by the formula:
xi+1 = g(xi) = xi + f(xi) .
8
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Algorithm
• Algorithm: Given a function f(x) find its roots using simple
iterations:

• Algorithm 21 (Simple Iterations)


x(1) = x0;
gx = g(x(1));
i = 1;
while (abs(x(i) − gx) > ε)
gx = x(i);
x(i + 1) = g(x(i));
i = i + 1;
end

• The order of convergence in simple iterations is β = 1.


9
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Newton-Raphson’s Method
• The slow convergence of the simple iteration is avoided by
using the Newton-Raphson’s Method (1690).

• Newton-Raphson’s Method is one of the most popular


techniques used in numerical root finding.

• Stricter requirements: For Newton’s method to apply, f should


be continuously differentiable and
f′(x) ≠ 0
near the solution x∗.

• We will derive an iterative procedure for computing xi+1 given


xi.
10
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Basic Idea
• Approximate the function at a given point by a straight line.
In this case, the line is given by the line tangent to f at that
point.

11
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Newton-Raphson’s Method
• Assume xi is the current estimate. Then
0 - f ( xi )
f ' ( xi ) =
xi +1 - xi
• Therefore,
f ( xi )
xi +1 = xi - , ( f ' ( xi ) ¹ 0)
f ' ( xi )

• Newton-Raphson is one of the fastest known algorithms for


root finding in general situations.

• Its order of convergence is β = 2.


12
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Algorithm
• Algorithm: Given a function f(x) find its roots using Newton-
Raphson’s method:

• Algorithm 23 (Newton-Raphson Method)


x(1) = x0;
i = 1;
gx = x(i) − f(x(i))/fʹ(x(i));
while (abs(x(i) − gx) > ε)
gx = x(i);
x(i + 1) = x(i) − f(x(i))/fʹ(x(i));
i = i + 1;
end

13
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Important Application
• Newton-Raphson method is particularly useful for minimizing
(or maximizing) functions that are convex (or concave).

• Definition: A real-valued function f(x) defined on an interval


[a, b] is called convex, if for any two points x1 and x2 in [a, b]
and any t in [0, 1],
f(t x1 + (1–t) x2) ≤ t f(x1) + (1–t) f(x2)

• A function ƒ is said to be concave if −ƒ is convex. That is, if


for any two points x1 and x2 in [a, b] and any t in [0, 1],
f(t x1 + (1–t) x2) ≥ t f(x1) + (1–t) f(x2)

14
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Convex and Concave

15
Department of Statistics Florida State University
STA 5106 Computational Methods in Statistics I

Convex Condition
• Theorem: Assume f is twice continuously differentiable on
[a, b]. If f’’ ≥ 0, then f is convex.
Proof: For any two points x1 and x2 in [a, b] and any t in [0,
1], let s = t x1 + (1–t) x2. Then use Taylor series,
f(x1) = f(s) + f’(s)(x1–s) + f’’(ξ1)(x1–s)2/2,
f(x2) = f(s) + f’(s)(x2–s) + f’’(ξ2)(x2–s)2/2.
Then
t f(x1) + (1–t) f(x2)
= f(s) + t f’’(ξ1)(x1–s)2/2 + (1–t) f’’(ξ2)(x2–s)2/2.
As f’’ ≥ 0,
t f(x1) + (1–t) f(x2) ≥ f(s) = f(t x1 + (1–t) x2).
Therefore, f is convex. 16
Department of Statistics Florida State University

You might also like