Introduction To Machine Introduction To Machine Learning Learning

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Introduction to Machine

Learning
Lecture 12
Support Vector Machines

Albert Orriols i Puig


aorriols@salle.url.edu
i l @ ll l d

Artificial Intelligence – Machine Learning


Enginyeria
g y i Arquitectura
q La Salle
Universitat Ramon Llull
Recap of Lecture 11
† 1st g
generation NN: Perceptrons
p and others

† Also multi-layer percetrons

Artificial Intelligence Machine Learning Slide 2


Recap of Lecture 11
† 2nd g
generation NN
„ Some people figure it out how to adapt the weights of internal
layers
aye s

„ Seemed to be very powerful and able to solve almost anything


„ The reality showed that this was not exactly true
Artificial Intelligence Machine Learning Slide 3
Today’s Agenda

† Moving
g to SVM
† Linear SVM
„ The separable case
„ The non-separable case
† Non Linear SVM
Non-Linear

Artificial Intelligence Machine Learning Slide 4


Introduction
† SVM ((Vapnik,
p , 1995))
„ Clever type of perceptron
„ IInstead
t d off hand-coding
h d di th the llayer off non-adaptive
d ti ffeatures,
t each
h
training example is used to create a new feature using a fixed
recipe
ec pe
„ A clever optimization technique is used to select the best
subset o
of features
eatu es
† Many NNs researchers switched to SVM in the 1990s
because they work better
† Here, we’ll take a slow path into SVM concepts

Artificial Intelligence Machine Learning Slide 5


Shattering Points with Oriented Hyperplanes
† Remember the idea
„ I want to build hyperplanes that separate points of two classes
„ In a two-dimensional space Æ lines
† E.g.: Linear Classifier

‰ Which is the best separating line?

‰ Remember, a hyperplane is
represented
t d by
b th
the equation
ti

WX + b = 0

Artificial Intelligence Machine Learning Slide 6


Linear SVM
† I want the line that maximizes the margin
g between
examples of both classes!

Support Vectors

Artificial Intelligence Machine Learning Slide 7


Linear SVM
† In more detail
„ Let’s assume two classes
¾ yi = {-1
{-1, 1}
„ Each example described by
a set of features x (x is a
vector; for clarity, we will
mark vectors in bold in the
remainder of the slides)
† The problem can be formulated as follows
„ All training must satisfy
((in the separable case))

„ This can be combined

Artificial Intelligence Machine Learning Slide 8


Linear SVM
† What are the support
pp vectors?
„ Let’s find the points that lay on the hyper plane H1
„ Their perpendicular distance to the origin is

„ Let’s find the points that lay on the hyper plane H2


„ Their perpendicular distance to the origin is

The margin is:

Artificial Intelligence Machine Learning Slide 9


Linear SVM
† Therefore,, the problem
p is
„ Find the hyper plane that minimizes

„ Subject to

† But let us change to the Lagrange formulation because


„ The constraints will be placed on the Lagrange multipliers
themselves (easier to handle)
„ Training data will appear only in form of dot products between
vectors

Artificial Intelligence Machine Learning Slide 10


Linear SVM
† The Lagrangian
g g formulation comes to be

„ Where αi are the Lagrange multipliers


† So now we need to
So,
„ Minimize Lp w.r.t w, b
„ Simultaneously require that the derivatives of Lp w.r.t to α
vanish
„ All subject to the constraints αi ≥ 0

Artificial Intelligence Machine Learning Slide 11


Linear SVM
† Transformation to the dual problem
p
„ This is a convex problem
„ W can equivalently
We i l tl solve
l th
the d
duall problem
bl

† That is, maximize LD

„ W.r.t αi
„ Subject to constraints
„ And with αi ≥ 0

Artificial Intelligence Machine Learning Slide 12


Linear SVM

† This is a quadratic programming problem. You can solve


it with many methods such as gradient descent
„ We’ll not see these methods in class

Artificial Intelligence Machine Learning Slide 13


The Non-Separable case
† What if I can not separate
p the two classes

„ We will not be able to solve the Lagrangian formulation


proposed
„ Any idea?

Artificial Intelligence Machine Learning Slide 14


The Non-Separable Case
† Just relax the constraints by
yppermitting
g some errors

Artificial Intelligence Machine Learning Slide 15


The Non-Separable Case
† That means that the Lagrangian
g g is rewritten
„ We change the objective
function
u c o too be minimized
ed to
o
„ Therefore, we are maximizing the margin and minimizing the error
„ C iis a constant
t t to
t be
b chosen
h b
by th
the user
† The dual problem becomes

„ Subject to and

Artificial Intelligence Machine Learning Slide 16


Non-Linear SVM
† What happens
pp if the decision function is a linear function of
the data?

† In our equations
equations, data appears in form of dot products xi · xj
† Wouldn’t you like to have polynomials, logarithmics, …
functions to fit the data?

Artificial Intelligence Machine Learning Slide 17


Non-Linear SVM
† The kernel trick
„ Map the data into a higher-dimensional space
„ Mercer theorem: any continuous, symmetric, positive semi-
definite kernel function K(x, y) can be expressed as a dot
product in a high
high-dimensional
dimensional space
† Now, we have a kernel function
† An example
† All we have talked about still holds when using the
kernel function
† The only difference is that now my function will be

Artificial Intelligence Machine Learning Slide 18


Non-Linear SVM
† Some typical kernels

† A visual
i l example
l off a polynomial
l i l kernel
k l with
ith p=3
3

Artificial Intelligence Machine Learning Slide 19


Some Further Issues
† We have to classify
y data
„ Described by nominal attributes and continuous attributes
„ P b bl with
Probably ith missing
i i values
l
„ That may have more than two classes
† How SVM deal with them?
„ SVM defined over continuous attributes
attributes. No problem!
„ Nominal attributes Æ Map into continuous space
„ Multiple classes Æ Build SVM
S that discriminate each pair off
classes

Artificial Intelligence Machine Learning Slide 20


Some Further Issues
† I’ve seen lots of formulas… But I want to p
program
g a SVM
builder. How I get my SVM?
„ We have already mentioned that there are many methods to
solve the quadratic programming problem
„ Many algorithms designed for SVM
„ One of the most significant: Sequential Minimal Optimization
„ C
Currently,
l there
h are many new algorithms
l ih

Artificial Intelligence Machine Learning Slide 21


Next Class

† Association Rules

Artificial Intelligence Machine Learning Slide 22


Introduction to Machine
Learning
Lecture 12
Support Vector Machines

Albert Orriols i Puig


aorriols@salle.url.edu
i l @ ll l d

Artificial Intelligence – Machine Learning


Enginyeria
g y i Arquitectura
q La Salle
Universitat Ramon Llull

You might also like