Professional Documents
Culture Documents
Support Vector Machine: Dr. Shaifu Gupta
Support Vector Machine: Dr. Shaifu Gupta
Any of these
would be fine..
..but which is
best?
▪ Lots of possible solutions for w, b.
Support Vector Machines: Slide 6
Margin
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1 Define margin of
a linear
classifier as the
width that the
boundary could
be increased by
before hitting a
datapoint.
Support Vector Machines: Slide 7
Margin
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1 The maximum
margin linear
classifier is the
linear classifier
with maximum
margin.
This is the simplest
kind of SVM
Linear SVM
Support Vector Machines: Slide 8
Support vectors
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1 The maximum
margin linear
classifier is the
Support
linear classifier
Vectors are
those with the
datapoints maximum
that the margin.
margin pushes
up against Linear SVM
Support Vector Machines: Slide 9
”
+1 M = Margin Width
=
la ss
t C zone
e dic
“Pr -1”
s s=
=
+b 1 t Cla ne
x ic zo
w
x
=
+b 0 - Pred
w =
+b 1
“
w x
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
Claim: The vector w is perpendicular to Plane. Why?
But…
Functional and Geometric margin
● Scale weights and biases
w -> 2*w
b -> 2 *b
Lagrange multipliers
Generalized Lagrangian
primal
Primal and dual forms of optimization
Optimal margin solution for SVM
primal
Optimal margin solution for SVM
Continued...
Dual Optimization Problem Obtained
Soft Margin Classification
Soft Margin Classification
• If the training data is not linearly
separable, slack variables ξi can be
added to allow misclassification of
difficult or noisy examples.
• Allow some errors
• Still, try to minimize training set errors,
and to place hyperplane “far” from each
class (large margin)
Formulation
Kernels
Mapping attributes (d-dimension) to features (D-dimension)
Basis function expansion
●
SVM formulation with kernels
●
Polynomial Kernel
●
Polynomial Kernel
Polynomial Kernel
Gaussian Kernel / Radial basis function
●
Combining kernels
●
Sum of kernels
Scaling of kernel
Product of kernels
Mercer’s Theorem
● For K to be a valid (Mercer) kernel, it is necessary and sufficient that for any
{x (1), . . . , x(m)}, (m < ∞), the corresponding kernel matrix is symmetric
positive semi-definite.
Multiclass classification
● One vs Rest
● One vs All