Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Support Vector Machine

Dr. Shaifu Gupta


shaifu.gupta@iitjammu.ac.in
Classifiers
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1

How would you


classify this
data?

Copyright © 2001, 2003,


Andrew W. Moore Support Vector Machines: Slide 2
Classifiers
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1

How would you


classify this
data?

Support Vector Machines: Slide 3


Classifiers
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1

How would you


classify this
data?

Support Vector Machines: Slide 4


Classifiers
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1

How would you


classify this
data?

Support Vector Machines: Slide 5


Classifiers
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1

Any of these
would be fine..

..but which is
best?
▪ Lots of possible solutions for w, b.
Support Vector Machines: Slide 6
Margin
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1 Define margin of
a linear
classifier as the
width that the
boundary could
be increased by
before hitting a
datapoint.
Support Vector Machines: Slide 7
Margin
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1 The maximum
margin linear
classifier is the
linear classifier
with maximum
margin.
This is the simplest
kind of SVM
Linear SVM
Support Vector Machines: Slide 8
Support vectors
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1 The maximum
margin linear
classifier is the
Support
linear classifier
Vectors are
those with the
datapoints maximum
that the margin.
margin pushes
up against Linear SVM
Support Vector Machines: Slide 9

+1 M = Margin Width
=
la ss
t C zone
e dic
“Pr -1”
s s=
=
+b 1 t Cla ne
x ic zo
w
x
=
+b 0 - Pred
w =
+b 1

w x

• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
Claim: The vector w is perpendicular to Plane. Why?

Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines: Slide 10


 
Specifying a line and margin

+1 Plus-
=
la ss Plane
Classifier
C e
dict zon Boundary
Minus-Plane
e
“Pr -1”
s s=
t Cla ne
ic zo
Pred

How do we represent this mathematically?


Functional margin and Geometric margin
● Functional margin

Large functional margin represents confident and correct prediction.

But…
Functional and Geometric margin
● Scale weights and biases

w -> 2*w
b -> 2 *b

NO effect on actual prediction result, functional margin scaled

Scale, functional margin arbitrarily large without anything meaningful


Geometric margin (refer to Andrew Ng notes)
Optimization problem
This optimization problem can be solved using commercial quadratic
programming (QP) code
Lagrange duality

Solve constrained optimization problems

Lagrange multipliers
Generalized Lagrangian

primal
Primal and dual forms of optimization

● Optimization problems viewed from either of two perspectives,


○ primal problem 
○ dual problem

● Lagrangian dual problem is obtained by


○ forming Lagrangian of minimization problem by using nonnegative Lagrange
multipliers to add the constraints to the objective function,
○ solving for primal variable values that minimize the original objective function.
Lagrange dual problem

● under certain conditions, we will have d* = p* (solution of dual form =


primal form), so that we can solve the dual problem in lieu of the primal
problem

● there must exist w* , α* , β* so that w* is solution to primal problem, α*,


β* are solution to dual problem, and moreover p* = d* = L(w* , α* , β* ).

● Moreover if, w* , α* and β* satisfy Karush-Kuhn-Tucker (KKT)


conditions then it is a solution to primal and dual problem.
Karush-Kuhn-Tucker (KKT) conditions
Karush-Kuhn-Tucker (KKT) conditions

KKT dual complementarity condition:

 
Optimal margin solution for SVM

primal
Optimal margin solution for SVM
Continued...
Dual Optimization Problem Obtained
Soft Margin Classification
Soft Margin Classification
• If the training data is not linearly
separable, slack variables ξi can be
added to allow misclassification of
difficult or noisy examples.
• Allow some errors
• Still, try to minimize training set errors,
and to place hyperplane “far” from each
class (large margin)
Formulation
Kernels

   
Mapping attributes (d-dimension) to features (D-dimension)
Basis function expansion
●  
SVM formulation with kernels
●  

 
Polynomial Kernel
●  
Polynomial Kernel
Polynomial Kernel

 
Gaussian Kernel / Radial basis function
●  
Combining kernels
●  
Sum of kernels
Scaling of kernel
Product of kernels
Mercer’s Theorem
● For K to be a valid (Mercer) kernel, it is necessary and sufficient that for any
{x (1), . . . , x(m)}, (m < ∞), the corresponding kernel matrix is symmetric
positive semi-definite.
Multiclass classification
● One vs Rest
● One vs All

You might also like