Professional Documents
Culture Documents
ANN8
ANN8
Underfitting Overfitting
Error
Test error
Training error
d
F x = F x + F x x – x
dx x = x
2
1 d 2
+ --- F x x – x +
2 d x2
x = x
n
1 d n
+ ----- F x x – x +
n! d x n
x = x
Example
–x
F x = e
–x –0 –0 1 –0 2 1 –0 3
F x = e = e – e x – 0 + ---e x – 0 – -- e x – 0 +
2 6
1 2 1 3
F x = 1 – x + -- x – --- x +
2 6
F x F0 x = 1
F x F 1 x = 1 – x
1 2
F x F 2 x = 1 – x + --- x
2
Plot of Approximations
6
F2 x
3
2 F1 x
1
F0 x
-2 -1 0 1 2
Vector Case
F x = F x1 x 2 x n
F x = F x + F x x 1 – x 1 + F x x 2 – x 2
x1 x = x x2 x=x
2
1 2
++ F x
x – x n + ---
n
F x
x – x1
1
xn x = x 2 x
2 x x
=
1
2
1
+ --- F x
x 1 – x 1 x 2 – x 2 +
2 x 1 x 2 x = x
Matrix Form
T
F x = F x + F x x – x
x = x
1 T
+ --- x – x 2 F x
x – x +
2 x=x
Gradient Hessian
2 2 2
F x F x F x
F x 2
x1 x 1 x 2 x 1 x n
x1
2 2 2
F x F x F x F x
F x = x2 2 F x = x 2 x 1 2
x2 x 2 x n
2 2 2
F x
xn F x F x F x
x n x 1 x n x 2 2
xn
Directional Derivatives
First derivative (slope) of F(x) along xi axis: F x xi
2 2
Second derivative (curvature) of F(x) along xi axis: F x x i
T
p F x
First derivative (slope) of F(x) along vector p: -----------------------
p
T
Second derivative (curvature) of F(x) along vector p: p 2 F x p
------------------------------
2
p
Example
2 2
F x = x 1 + 2x 1 x2 + 2 x2
x = 0.5 p = 1
0 –1
F x
x1 2x 1 + 2x 2 1
F x = = =
x = x 2x 1 + 4x 2 1
F x
x2 x = x
x = x
1
T 1 – 1
p F x 1 0
----------------------- = ------------------------ = ------- = 0
p 1 2
–1
Plots
Directional
Derivatives
2
20
15
1
1.4
10
1.3
5
x2 0 1.0
0
2
0.5
1 2
-1
0.0
0 1
0
-1
x2 -2 -2
-1
x1
-2
-2 -1 0 1 2
x1
Minima
Strong Minimum
The point x* is a strong minimum of F(x) if a scalar >0 exists, such that F(x*) <
F(x* + x) for all x such that > ||x|| > 0.
Global Minimum
Weak Minimum
The point x* is a weak minimum of F(x) if it is not a strong minimum, and a scalar >0
exists, such that F(x*) F(x* + x) for all x such that > ||x|| > 0.
4 12
Scalar Example
F x = 3x – 7x – --- x + 6
2
Strong Maximum
6
2 Strong Minimum
Global Minimum
0
-2 -1 0 1 2
Vector Example
4 2 2 2
F x = x2 – x1 + 8x 1 x2 – x1 + x2 + 3 F x = x 1 – 1.5x 1 x2 + 2 x2 x1
2 2
1.5
1 1
0.5
0 0
-0.5
-1 -1
-1.5
-2 -2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1 0 1 2
12 8
6
8
4
2
0 0
2 2
1 2 1 2
0 1 0 1
0 0
-1 -1
-1 -1
-2 -2 -2 -2
4
F x = x2 – x1 + 8x 1 x2 – x1 + x2 + 3
-0.5
-1
-1.5
-2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Quadratic Functions
1 T T
F x = -- x Ax + d x + c (Symmetric A)
2
F x = Ax + d
2F x = A
• If the eigenvalues of the Hessian matrix are all positive,
the function will have a single strong minimum.
• If the eigenvalues are all negative, the function will have
a single strong maximum.
• If some eigenvalues are positive and other eigenvalues
are negative, the function will have a single saddle point.
• If the eigenvalues are all nonnegative, but some
eigenvalues are zero, then the function will either have a
weak minimum or will have no stationary point.
• If the eigenvalues are all nonpositive, but some
eigenvalues are zero, then the function will either have a
weak maximum or will have no stationary point.
Stationary point nature summary
xT Ax i Definiteness H Nature x*
0 Positive d. Minimum
Indefinite Saddlepoint
0
0 Negative semi-d. Ridge
Negative d. Maximum
0
Steepest Descent
2 2
F x = x1 + 2 x1 x 2 + 2x 2 + x1
x 0 = 0.5 = 0.1
0.5
F x
F x =
x1
=
2x 1 + 2x2 + 1 g0 = F x = 3
x= x0 3
2x 1 + 4x 2
F x
x2
-1
-2
-2 -1 0 1 2
F x = Ax + d
x k + 1 = xk – gk = x k – Ax k + d xk + 1 = I – A x k – d
Stability is determined
by the eigenvalues of
this matrix.
I – A zi = z i – Az i = z i – iz i = 1 – i z i
Stability Requirement:
2 2
1 – i 1 ---- ------------
i max
Example
0.851 0.526
A= 22
1 = 0.764 z
1 =
2 = 5.24 z
2 =
24 – 0.526 0.851
2 2
------------ = ---------- = 0.38
max 5.24
= 0.37 = 0.39
2 2
1 1
0 0
-1 -1
-2 -2
-2 -1 0 1 2 -2 -1 0 1 2