Professional Documents
Culture Documents
Chapter 5
Chapter 5
M L
D M
Machine Learning
Chapter 5
Linear Discriminant Functions
Introduction
Linear Discriminant Functions and Decision
Surface
Linear Separability
Learning
Gradient Decent Algorithm
Newton’s Method
g i : R d R ( x) (1 i c)
g i ( x) g j ( x) for all j i
Decide 2 if g (x) 0
g1 (x) w x w10
T
1
g 2 (x) w x w20
T
2
Decide 1 if g (x) 0
1
Decide 2 if g (x) 0 w0
1
w1 w2 wd
x1 x2 xd
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 7
Linear Discriminant FunctionsMIMA
Two-category case: Implementation
Decide 1 if g (x) 0
1
Decide 2 if g (x) 0 w0
w0 1
1
w1 w2 wd
x0=1
x1 x2 xd
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 8
Decision Surface MIMA
g (x) w x w0
T
g ( x) 0
xn w T
w
g ( x) w x
T
w0 0
R2 || w || 2
T w0 w
w x 0
x || w || || w ||
x
R1 2
Decide 1 if g (x) 0
w
x1 Decide 2 if g (x) 0
1. A linear discriminant
g (x) w x w0
T
0 the
g (x)divides
function
feature space
T by a
xn w w
g ( x) w x
T
hyperplane. w0 0
R2 2. The orientation
|| w ||2
of the
w w
x is0 determined
w Tsurface 0by
x2 the || w ||vector
normal || w || w.
x
R1 Decide 1 if gof(the
3. The location x) 0
w surface is determined by
x1 Decide 2 if
the bias w0.
g ( x ) 0
g (x) w x w0 T d-dimension
a0 y0
w0 a1 1 y1
R1
x2
Let a y
w x
x1
w ad yd
R2 (d+1)-dimension
Decision surface: aT y 0
Decide 1 if g (x) 0 Decide 1 if aT y 0
Decide 2 if g (x) 0 Decide 2 if aT y 0
R2 (d+1)-dimension
Decision surface: aT y 0
Pass through 1 if g (x) 0
the origin
Decide Decide 1 if aT y 0
of augmented space
Decide if g (x) 0 Decide 2 if aT y 0
2
g (x) w x w0 T
ay = 0
y2
R1
x2
w x1 a y1
R1
R2 1
y0
R2
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 13
Augmented Space MIMA
w0 1
a y
w x
By using this mapping, the problem of finding weight wector
w and threshold w0 is reduced to finding a single vector a.
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 14
Linear Separability MIMA
Two-Category Case
2 2
1
1
Two-Category Case
1 aT y i 0 if yi is labeled 2
Two-Category Case
a yi 0
T
i
then the samples are said to be
Linearly Separable
Separating Plane: a1 y1 a2 y2 an yn 0
Normalized Case
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 18
Solution Region in Weight Space MIMA
Function optimization.
a2
(a1, a2)
a1
a1
Taylor Expansion
f ( x x) f ( x) f ( x) T x O(x T x)
f : Rd R : A real-valued d-variate function
O(x x) :
T
The big oh order of x x
T
Taylor Expansion
f ( x x) f ( x) f ( x) T x O(x T x)
What happens if we set Δx to be negatively
proportional to the gradient at x, i.e.,
Basic strategy
To minimize some function f(.), the general gradient
descent techniques work in the following iterative
way:
T
x
x1 x2 xd
steepest if dx x f
df ( x f )T dx 0 if dx x f
steepest decent if dx x f
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 26
Gradient Decent Algorithm MIMA
f(x) x2
xf
(x1, x2)
go this way
x1
Basin of
x*
Attraction
x2
x0
1 T
Paraboloid f (x) c a x x Qx
T
2
We can find the global minimum of a paraboloid
by setting its gradient to zero.
x f (x) |x x k a Qx k 0
1
x* Q a
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 29
Newton’s Method MIMA
1 T
f (x k x) f (x k ) g x x Q k x
T
k
2
Taylor Series Expansion
1 T
f (x k x) f (x k ) g x x Q k x T
k
2
g k x f ( x) x x
k
Hessian Matrix
2 f 2 f
2
2 f x1 x1 xd
Qk
xxT xxk f
2
f
2
x x xd2
d 1 xxk
1 T
f (x k x) f (x k ) g x x Q k x T
k
2
Set x f g k Q k x 0
x k 1 x k Q k 1g k 0
Discriminant functions
Linear Discriminant Functions and Decision
Surface
The general setting, one function for each class
The two-category case
Minimization of criterion/objection function
Linear Separability
Learning
Gradient descent
xk xk 1 f ( xk 1 )
Newton’s method
x k 1 x k Q k 1g k 0
Any question?