Professional Documents
Culture Documents
Support Vector Machines Optimization Objective: Machine Learning
Support Vector Machines Optimization Objective: Machine Learning
Machines
Optimization
objective
Machine Learning
Alternative view of logistic regression
If , we want ,
If , we want ,
Andrew Ng
Alternative view of logistic regression
Cost of example:
If (want ): If (want ):
Close approximation :
Gives SVM computational
advantage
Andrew Ng
Support vector machine
Logistic regression:
Exam
Andrew Ng
SVM hypothesis
Hypothesis:
Andrew Ng
Andrew Ng
Support Vector
Machines
Large Margin
Intuition
Machine Learning
Support Vector Machine
-1 1 -1 1
Andrew Ng
SVM Decision Boundary
Whenever :
Optimization objective
Of SVM
-1 1
Whenever :
-1 1
Andrew Ng
SVM Decision Boundary: Linearly separable case
x2
x1
LMC disadvantage x2
x1
Andrew Ng
Support Vector
Machines
The mathematics
behind large margin
classification (optional)
Machine Learning
Vector Inner Product
Projecting u on to v
Will also give same
result
Andrew Ng
SVM Decision Boundary
Optimization objective
Of SVM
Andrew Ng
If we set theta = 0 then we are allowing
Decision boundaries that pass through
SVM Decision Boundary The origin
Andrew Ng
Andrew Ng
Support Vector
Machines
Kernels I
Machine Learning
Non-linear Decision Boundary
x2
x1
x1
Andrew Ng
Kernels and Similarity
Andrew Ng
f1 will be maximum i.e. 1,
when x1 = 3 and x2 = 5
Example:
Andrew Ng
For points near l(1), l(2) (since theta1 = theta2 = 1),
We Predict y = 1 otherwise we predict y = 0
x2
x1
Andrew Ng
Support Vector
Machines
Kernels II
Machine Learning
Choosing the landmarks
Given :
x2
Andrew Ng
SVM with Kernels
Given
choose
Given example :
Andrew Ng
SVM with Kernels
Hypothesis: Given , compute features
Predict “y=1” if
Training:
Andrew Ng
SVM parameters:
C( ). Large C: Lower bias, high variance.
Small C: Higher bias, low variance.
Andrew Ng
Andrew Ng
Support Vector
Machines
Using an SVM
Machine Learning
Use SVM software package (e.g. liblinear, libsvm, …) to solve for
parameters .
Need to specify:
Choice of parameter C.
Choice of kernel (similarity function):
E.g. No kernel (“linear kernel”)
Predict “y = 1” if
Gaussian kernel:
, where .
Need to choose .
Andrew Ng
Kernel (similarity) functions:
function f = kernel(x1,x2)
x1 x2
return
Andrew Ng
Other choices of kernel
Note: Not all similarity functions make valid kernels.
(Need to satisfy technical condition called “Mercer’s Theorem” to make
sure SVM packages’ optimizations run correctly, and do not diverge).
Andrew Ng