Support Vector Machines Optimization Objective: Machine Learning

Support Vector
Machines
Optimization
objective
Machine Learning
Alternative view of logistic regression
If , we want ,
If , we want ,
Andrew Ng
Alternative view of logistic regression
Cost of example:
If (want ): If (want ):
Close approximation :
Gives SVM computational
advantage
Andrew Ng
Support vector machine
Logistic regression:
Removing m will not degrade

Support vector machine: ple
Optimal value much ?
Exam
In Logistic Regression we care

more about 2nd term i.e. B
In SVM we care more about first term i.e.

A
Andrew Ng
SVM hypothesis
Hypothesis:
Andrew Ng
Andrew Ng
Support Vector
Machines
Large Margin
Intuition
Machine Learning
Support Vector Machine
-1 1 -1 1
If , we want (not just )

If , we want (not just )
Andrew Ng
SVM Decision Boundary
Whenever :
Optimization objective
Of SVM
-1 1
Whenever :
Suppose we have set C to large value, Now in order to minimize

this equation we have to make term (under box) to zero. Which
can be done by setting costi() = 0, which can be achieved by
manipulating (Theta’ x)
-1 1
Andrew Ng
SVM Decision Boundary: Linearly separable case
x2
x1
Large margin classifier

Andrew Ng
Large margin classifier in presence of outliers
LMC disadvantage x2
x1
Andrew Ng
Support Vector
Machines
The mathematics
behind large margin
classification (optional)
Machine Learning
Vector Inner Product
Projecting u on to v
Will also give same
result
Andrew Ng
SVM Decision Boundary
Optimization objective
Of SVM
Andrew Ng
If we set theta = 0 then we are allowing
Decision boundaries that pass through
SVM Decision Boundary The origin
Andrew Ng
Andrew Ng
Support Vector
Machines
Kernels I
Machine Learning
Non-linear Decision Boundary
x2
x1
Is there a different / better choice of the features ?

Andrew Ng
Kernel
Given , compute new feature depending
on proximity to landmarks
x2
x1
Andrew Ng
Kernels and Similarity
Andrew Ng
f1 will be maximum i.e. 1,
when x1 = 3 and x2 = 5
Example:
Andrew Ng
For points near l(1), l(2) (since theta1 = theta2 = 1),
We Predict y = 1 otherwise we predict y = 0
x2
x1
x is close to l(1) and far

away from l(2) and l(3)
Andrew Ng
Support Vector
Machines
Kernels II
Machine Learning
Choosing the landmarks
Given :
x2
x1 landmarks are choosen exactly

same as the training example
Predict if
Where to get ?
Andrew Ng
SVM with Kernels
Given
choose
Given example :
For training example :
Andrew Ng
SVM with Kernels
Hypothesis: Given , compute features
Predict “y=1” if
Training:
Andrew Ng
SVM parameters:
C( ). Large C: Lower bias, high variance.
Small C: Higher bias, low variance.
Large : Features vary more smoothly.

Higher bias, lower variance.
Small : Features vary less smoothly.

Lower bias, higher variance.
Andrew Ng
Andrew Ng
Support Vector
Machines
Using an SVM
Machine Learning
Use SVM software package (e.g. liblinear, libsvm, …) to solve for
parameters .
Need to specify:
Choice of parameter C.
Choice of kernel (similarity function):
E.g. No kernel (“linear kernel”)
Predict “y = 1” if
Gaussian kernel:
, where .
Need to choose .
Andrew Ng
Kernel (similarity) functions:
function f = kernel(x1,x2)
x1 x2
return
Note: Do perform feature scaling before using the Gaussian kernel.
Andrew Ng
Other choices of kernel
Note: Not all similarity functions make valid kernels.
(Need to satisfy technical condition called “Mercer’s Theorem” to make
sure SVM packages’ optimizations run correctly, and do not diverge).
Many off-the-shelf kernels available:

- Polynomial kernel:
- More esoteric: String kernel, chi-square kernel, histogram

intersection kernel, …
Andrew Ng
Andrew Ng
Multi-class classification
Many SVM packages already have built-in multi-class classification

functionality.
Otherwise, use one-vs.-all method. (Train SVMs, one to distinguish
from the rest, for ), get
Pick class with largest
Andrew Ng
Logistic regression vs. SVMs
number of features ( ), number of training examples
If is large (relative to ):
Use logistic regression, or SVM without a kernel (“linear kernel”)
If is small, is intermediate:
Use SVM with Gaussian kernel
If is small, is large:
Create/add more features, then use logistic regression or SVM without
a kernel
Neural network likely to work well for most of these settings, but may be
slower to train.
Andrew Ng

Support Vector Machines Optimization Objective: Machine Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Support Vector Machines Optimization Objective: Machine Learning

Uploaded by

Copyright:

Available Formats

Support Vector

Removing m will not degrade

In Logistic Regression we care

In SVM we care more about first term i.e.

If , we want (not just )

Suppose we have set C to large value, Now in order to minimize

Large margin classifier

Is there a different / better choice of the features ?

x is close to l(1) and far

x1 landmarks are choosen exactly

For training example :

Large : Features vary more smoothly.

Small : Features vary less smoothly.

Note: Do perform feature scaling before using the Gaussian kernel.

Many off-the-shelf kernels available:

- More esoteric: String kernel, chi-square kernel, histogram

Many SVM packages already have built-in multi-class classification

You might also like