Slide 9 - SVM

Support'Vector'
Machines'
Op2miza2on'
objec2ve'
Machine'Learning'
Alterna(ve*view*of*logis(c*regression*
If'''''''''','we'want'''''''''''''''','
If'''''''''','we'want'''''''''''''''','
Andrew'Ng'
Alterna(ve*view*of*logis(c*regression*
Cost'of'example:'
If'''''''''''(want'''''''''''''''):' If'''''''''''(want'''''''''''''''):'
Andrew'Ng'
We're just gonna get rid of these 1/m terms and this should give the
same optimal value of Theta. Because 1/m is just as constant.
We should end up with the same optimal value for theta.

Support*vector*machine*
Logis2c'regression:'
Support'vector'machine:'
Andrew'Ng'
Support'Vector'
Machines'
Large'Margin'
Intui2on'
Machine'Learning'
Support*Vector*Machine*
I1' 1' I1' 1'

If'''''''''','we'want'''''''''''''''''(not'just''''''')'
If'''''''''','we'want''''''''''''''''''(not'just''''''')'
Andrew'Ng'
if C is large
SVM*Decision*Boundary*
Whenever'''''''''''''''':'
I1' 1'
Whenever'''''''''''''''':'
I1' 1'
Andrew'Ng'
SVM is a Large marging Classifier
Large*margin*classifier*in*presence*of*outliers*
x2'
x1' Better decision

boundary
Andrew'Ng'
Support'Vector'
Machines'
Kernels'I'
Machine'Learning'
NonDlinear*Decision*Boundary*
x2'
x1'
Is'there'a'different'/'beRer'choice'of'the'features''''''''''''''''''''''''''''?'
Andrew'Ng'
Kernel*
Given''''','compute'new'feature'depending''
on'proximity'to'landmarks''
x2'
x1'
Andrew'Ng'
Support'Vector'
Machines'
Kernels'II'
Machine'Learning'
Choosing*the*landmarks*
Given'''':'
'
x2'
'
x1'
Predict'''''''''''''if'
Where'to'get''''''''''''''''''''''''''''''''?'
Andrew'Ng'
SVM*with*Kernels*
Given'
choose'
Given'example''''':'
'
'
For'training'example'''''''''''''''''''':''
Andrew'Ng'
SVM*with*Kernels*
Hypothesis:'Given'''','compute'features'
Predict'“y=1”'if'
Training:'
Solve this minimization problem to get the parameters theta of your SVM
use off the shelf software packages that people have developed to
minimize this cost function, and so those software packages already
embody these numerical optimization tricks Andrew'Ng'
SVM*parameters:*
C'(''''''''').''''Large'C:'Lower'bias,'high'variance.'
''''''''''''''''''''Small'C:'Higher'bias,'low'variance.'
'' '''''''''Large''''':'Features'''''vary'more'smoothly.'
' 'Higher'bias,'lower'variance.'
''''''''''''''''''Small''''':'Features'''''vary'less'smoothly.'
' 'Lower'bias,'higher'variance.'
Andrew'Ng'
Support'Vector'
Machines'
Using'an'SVM'
Machine'Learning'
Use'SVM'so]ware'package'(e.g.'liblinear,'libsvm,'…)'to'solve'for'
parameters''''.'
Need'to'specify:'
Choice'of'parameter'C.'
Choice'of'kernel'(similarity'func2on):'
E.g.'No'kernel'(“linear'kernel”)'
Predict'“y'='1”'if''
Gaussian'kernel:'
'
''''''''''''''''''''''''''''''''''''''''''''','where''''''''''''''''''''''.''
Need'to'choose''''''.'
Andrew'Ng'
Kernel*(similarity)*func(ons:*
function f = kernel(x1,x2)
x1 x2'
return
Note:'Do'perform'feature'scaling'before'using'the'Gaussian'kernel.'
Andrew'Ng'
Other*choices*of*kernel*
Note:'Not'all'similarity'func2ons'''''''''''''''''''''''''''''''make'valid'kernels.'
(Need'to'sa2sfy'technical'condi2on'called'“Mercer’s'Theorem”'to'make'
sure'SVM'packages’'op2miza2ons'run'correctly,'and'do'not'diverge).'
Many'offItheIshelf'kernels'available:'
I  Polynomial'kernel:'
I  More'esoteric:'String'kernel,'chiIsquare'kernel,'histogram'
intersec2on'kernel,'…'
'
Andrew'Ng'
Mul(Dclass*classifica(on*
Many'SVM'packages'already'have'builtIin'mul2Iclass'classifica2on'
func2onality.'
Otherwise,'use'oneIvs.Iall'method.'(Train'''''''SVMs,'one'to'dis2nguish'
'''''''''''''from'the'rest,'for'''''''''''''''''''''''''''''),'get''
Pick'class''''with'largest'
Andrew'Ng'
Logis(c*regression*vs.*SVMs*
''''''''number'of'features'(''''''''''''''''''''),''''''''''''number'of'training'examples'
If'''''is'large'(rela2ve'to'''''):'
Use'logis2c'regression,'or'SVM'without'a'kernel'(“linear'kernel”)'
If'''''is'small,'''''''is'intermediate:'
Use'SVM'with'Gaussian'kernel'
If'''''is'small,''''''is'large:'
Create/add'more'features,'then'use'logis2c'regression'or'SVM'
without'a'kernel'
Neural'network'likely'to'work'well'for'most'of'these'secngs,'but'may'be'
slower'to'train.'
Andrew'Ng'

Slide 9 - SVM

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slide 9 - SVM

Uploaded by

Copyright:

Available Formats

Support'Vector'

We should end up with the same optimal value for theta.

I1' 1' I1' 1'

x1' Better decision

You might also like