Support Vector Machine: Dr. Shaifu Gupta

Support Vector Machine
Dr. Shaifu Gupta

shaifu.gupta@iitjammu.ac.in
Classifiers
x f yest
f(x,w,b) = sign(w. x -
denotes +1
b)
denotes -1
How would you

classify this
data?
Copyright © 2001, 2003,

Andrew W. Moore Support Vector Machines: Slide 2
Classifiers
x f yest
denotes +1
b)
denotes -1
How would you

classify this
data?
Support Vector Machines: Slide 3

Classifiers
x f yest
denotes +1
b)
denotes -1
How would you

classify this
data?

Classifiers
x f yest
denotes +1
b)
denotes -1
How would you

classify this
data?

Classifiers
x f yest
denotes +1
b)
denotes -1
Any of these
would be fine..
..but which is
best?
▪ Lots of possible solutions for w, b.
Margin
x f yest
denotes +1
b)
denotes -1 Define margin of
a linear
classifier as the
width that the
boundary could
be increased by
before hitting a
datapoint.
Margin
x f yest
denotes +1
b)
denotes -1 The maximum
margin linear
classifier is the
linear classifier
with maximum
margin.
This is the simplest
kind of SVM
Linear SVM
Support vectors
x f yest
denotes +1
b)
denotes -1 The maximum
margin linear
classifier is the
Support
linear classifier
Vectors are
those with the
datapoints maximum
that the margin.
margin pushes
up against Linear SVM
”
+1 M = Margin Width
=
la ss
t C zone
e dic
“Pr -1”
s s=
=
+b 1 t Cla ne
x ic zo
w
x
=
+b 0 - Pred
w =
+b 1
“
w x
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
Claim: The vector w is perpendicular to Plane. Why?
Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines: Slide 10

Specifying a line and margin
”
+1 Plus-
=
la ss Plane
Classifier
C e
dict zon Boundary
Minus-Plane
e
“Pr -1”
s s=
t Cla ne
ic zo
Pred
“
How do we represent this mathematically?

Functional margin and Geometric margin
● Functional margin
Large functional margin represents confident and correct prediction.
But…
Functional and Geometric margin
● Scale weights and biases
w -> 2*w
b -> 2 *b
NO effect on actual prediction result, functional margin scaled
Scale, functional margin arbitrarily large without anything meaningful

Geometric margin (refer to Andrew Ng notes)
Optimization problem
This optimization problem can be solved using commercial quadratic
programming (QP) code
Lagrange duality
Solve constrained optimization problems
Lagrange multipliers
Generalized Lagrangian
primal
Primal and dual forms of optimization
● Optimization problems viewed from either of two perspectives,

○ primal problem
○ dual problem
● Lagrangian dual problem is obtained by

○ forming Lagrangian of minimization problem by using nonnegative Lagrange
multipliers to add the constraints to the objective function,
○ solving for primal variable values that minimize the original objective function.
Lagrange dual problem
● under certain conditions, we will have d* = p* (solution of dual form =

primal form), so that we can solve the dual problem in lieu of the primal
problem
● there must exist w* , α* , β* so that w* is solution to primal problem, α*,

β* are solution to dual problem, and moreover p* = d* = L(w* , α* , β* ).
● Moreover if, w* , α* and β* satisfy Karush-Kuhn-Tucker (KKT)

conditions then it is a solution to primal and dual problem.
Karush-Kuhn-Tucker (KKT) conditions
Karush-Kuhn-Tucker (KKT) conditions
KKT dual complementarity condition:

Optimal margin solution for SVM
primal
Optimal margin solution for SVM
Continued...
Dual Optimization Problem Obtained
Soft Margin Classification
Soft Margin Classification
• If the training data is not linearly
separable, slack variables ξi can be
added to allow misclassification of
difficult or noisy examples.
• Allow some errors
• Still, try to minimize training set errors,
and to place hyperplane “far” from each
class (large margin)
Formulation
Kernels

Mapping attributes (d-dimension) to features (D-dimension)
Basis function expansion
●
SVM formulation with kernels
●

Polynomial Kernel
●
Polynomial Kernel
Polynomial Kernel

Gaussian Kernel / Radial basis function
●
Combining kernels
●
Sum of kernels
Scaling of kernel
Product of kernels
Mercer’s Theorem
● For K to be a valid (Mercer) kernel, it is necessary and sufficient that for any
{x (1), . . . , x(m)}, (m < ∞), the corresponding kernel matrix is symmetric
positive semi-definite.
Multiclass classification
● One vs Rest
● One vs All

Support Vector Machine: Dr. Shaifu Gupta

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Support Vector Machine: Dr. Shaifu Gupta

Uploaded by

Copyright:

Available Formats

Support Vector Machine

Dr. Shaifu Gupta

How would you

Copyright © 2001, 2003,

How would you

Support Vector Machines: Slide 3

How would you

Support Vector Machines: Slide 4

How would you

Support Vector Machines: Slide 5

Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines: Slide 10

How do we represent this mathematically?

Large functional margin represents confident and correct prediction.

NO effect on actual prediction result, functional margin scaled

Scale, functional margin arbitrarily large without anything meaningful

Solve constrained optimization problems

● Optimization problems viewed from either of two perspectives,

● Lagrangian dual problem is obtained by

● under certain conditions, we will have d* = p* (solution of dual form =

● there must exist w* , α* , β* so that w* is solution to primal problem, α*,

● Moreover if, w* , α* and β* satisfy Karush-Kuhn-Tucker (KKT)

KKT dual complementarity condition:

You might also like