Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Linear Support Vector Machine

Classification
Linear SVM

Huiping Cao

Huiping Cao, Slide 1/26


Linear Support Vector Machine

Support Vector Machines

ort Vector Machines


Find a linear hyperplane (decision boundary) that will separate the
data.

a linear hyperplane (decision boundary) that will separate the data


Huiping Cao, Slide 2/26
Linear Support Vector Machine

Support Vector Machines

pport Vector Machines


One possible solution.

B1

One Possible Solution


CS479/579: Data Mining Slide-3
Huiping Cao, Slide 3/26
Linear Support Vector Machine

Support Vector Machines

pport Vector Machines


Another possible solution.

B2

Another possible solution


CS479/579: Data Mining Slide-4
Huiping Cao, Slide 4/26
Linear Support Vector Machine

Support Vector Machines


pport Vector Machines
Other possible solutions.

B2

ther possible solutions


CS479/579: Data Mining Slide-5
Huiping Cao, Slide 5/26
Linear Support Vector Machine

Support Vector Machines

Which one is better? B1 or B2?


pportHow
Vector Machines
do you define better? The bigger the margin, the better.

B1

B2

Which one is better? B1 or B2?


How do you define better?
Huiping Cao, Slide 6/26
Linear Support Vector Machine

Support Vector Machines

upport Vector
Find Machines
hyperplane that maximizes the margin; so B1 is better than
B2
B1

B2

b21
b22

margin
b11

b12

Find hyperplane maximizes the margin => B1 is better than B2


CS479/579: Data Mining Slide-7
Huiping Cao, Slide 7/26
Linear Support Vector Machine

Support Vector Machines


Support Vector Machines
B1

+b=0

+ b = −1

 
w • x + b = +1

b11

  b12
⎧ 1 if w • x + b ≥ 1 2
= ⎨   Margin = !
⎩− 1 if w • x + b ≤ −1 || w ||
B1 : w |x + b
CS479/579: Data Mining
=0 Slide-8

b11 : w| x + b = 1
b12 : w| x + b = −1
Huiping Cao, Slide 8/26
Linear Support Vector Machine

Support Vector Machines


! !
w • x + b = +1

B1
 
w• x + b = 0
d  
! ! X1  
w • x + b = !1
X2  
!
w

b11

b12

The margin of the decision boundaries:


w| (x1 − x2 ) = 2
w| (x1 − x2 ) = kwk × k(x1 − x2 )k × cos(θ) , where k · k is the
norm of a vector.
kwk × k(x1 − x2 )k × cos(θ) = kwk × d, where d is the length
of vector (x1 − x2 ) in the direction of vector w
Thus, kwk × d = 2
2
Margin: d = kwk

Huiping Cao, Slide 9/26


Linear Support Vector Machine

Learn a linear SVM Model

The training phase of SVM is to estimate the parameters w


and b.

The parameters must follow two conditions:


(
1, if w| xi + b ≥ 1
yi =
−1, if w| xi + b ≤ −1

Huiping Cao, Slide 10/26


Linear Support Vector Machine

Formulate the problem – rationale

2
We want to maximize margin: d = kwk
kwk2
which is equivalent to minimize L(w) = 2 .
Subjected to the following constraints:
(
1 if w| xi + b ≥ 1
yi =
−1 if w| xi + b ≤ −1

which is equivalent to

yi (w| xi + b) ≥ 1, i = 1, · · · , N

Huiping Cao, Slide 11/26


Linear Support Vector Machine

Formulate the problem

Formalized to the following constrained optimization problem.


Objective function:

kwk2
Minimize
2
subject to
yi (w| xi + b) ≥ 1, i = 1, · · · , N
Convex optimization problem:
Objective function is quadratic
Constraints are linear
Can be solved using the standard Lagrange multiplier
method.

Huiping Cao, Slide 12/26


Linear Support Vector Machine

Lagrange formulation
Lagrangian for the optimization problem (take into account
the constraints by rewriting the objective function).
N
kwk2 X
LP (w, b, λ) = − λi (yi (w| xi + b) − 1)
2
i=1
minimize w.r.t. w and b and maximize w.r.t. each λi ≥ 0
where λi : Lagrange multiplier
To minimize the Lagrangian, take the derivative of
LP (w, b, λ) w.r.t. w and b and set them to 0:
N
∂LP X
= 0 =⇒ w = λi yi xi
∂w
i=1
N
∂LP X
= 0 =⇒ λi yi = 0
∂b
i=1
Huiping Cao, Slide 13/26
Linear Support Vector Machine

Substituting, LP (w, b, λ) to LD (λ)

Solving LP (w, b, λ) is still difficult because it solves a large


number of parameters w, b, and λi .
Idea: Transform Lagrangian into a function of the Lagrange
multipliers only by substituting w and b in LP (w, b, λ), we
get (the dual problem)
N N N
1 XX
λi λj yi yj x|i xj
X
LD (λ) = λi −
2
i=1 i=1 j=1

Details: see the next slide.


It is very nice to have LD (λ) because it is a simple quadratic
form in the vector λ.

Huiping Cao, Slide 14/26


Linear Support Vector Machine

Substituting details, get the dual problem LD (λ)

N N N N
kwk2 1 1 X 1 XX
λi yi x|i ) · ( λi λj yi yj x|i xj
X
= w| w = ( λj yj xj ) = (1)
2 2 2 i=1 j=1
2 i=1 j=1

N
X N
X N
X N
X
− λi (yi (w| xi + b) − 1) = − λi yi w| xi − λi yi b + λi (2)
i i=1 i=1 i=1
N N N
λj yj x|j )xi +
X X X
=− λi yi ( λi (3)
i=1 j=1 i=1
N N X
N
λi λj yi yj x|i xj
X X
= λi − (4)
i=1 i=1 j=1

Add both sides, get


N N N
1 XX
λi λj yi yj x|i xj
X
LD (λ) = λi −
2
i=1 i=1 j=1

Huiping Cao, Slide 15/26


Linear Support Vector Machine

Finalized LD (λ)

N N N
1 XX
λi λj yi yj x|i xj
X
LD (λ) = λi −
2
i=1 i=1 j=1

Maximize w.r.t. each λ


subject to
λi ≥ 0 for i = 1, 2, · · · , N
and N
P
i=1 λi yi = 0
Solve LD (λ) using quadratic programming (QP). We get all the λi
(Out of the scope).

Huiping Cao, Slide 16/26


Linear Support Vector Machine

Solve LD (λ) – QP

We are maximizing LD (λ)


N N N
X 1 XX
maxλ ( λi − λi λj yi yj x|i xj )
2
i=1 i=1 j=1

subject to constraints
(1) λi ≥ 0 for i = 1, 2, · · · , N and
PN
(2) i=1 λi yi = 0.
Translate the objective to minimization because QP packages
generally come with minimization.
N N N
1 XX X
minλ ( λi λj yi yj x|i xj − λi )
2
i=1 j=1 i=1

Huiping Cao, Slide 17/26


Linear Support Vector Machine

Solve LD (λ) – QP

y1 y1 x|1 x1 y1 y2 x|1 x2 y1 yN x|1 xN


 
···
1  y2 y1 x|2 x1 y2 y2 x|2 x2 ··· y2 yN x|2 xN 
minλ ( λ|   λ + (−1)| λ
2  ··· ··· ··· ···  | {z }
yN y1 x|N x1 yN y2 x|N x2 ··· yN yN x|N xN linear
| {z }
quadratic coefficients

subject to
y| λ = 0
| {z }
linear constraint

0
|{z} ≤λ≤ ∞
|{z}
lower bounds upper bounds

Let Q represent the matrix with the quadratic coefficients


minλ ( 21 λ| Qλ + (−1)| λ) subject to y| λ = 0; λ ≥ 0

Huiping Cao, Slide 18/26


Linear Support Vector Machine

Lagrange multiplier

QP solves λ = λ1 , λ2 , · · · , λN , where most of them are zeros.


Karush-Kuhn-Tucker (KKT) conditions

λi ≥ 0
The constraint (zero form with extreme value)

λi (yi (w| xi + b) − 1) = 0

Either λi is zero
Or yi (w| xi + b) − 1) = 0
Support vector xi : yi (w| xi + b − 1) = 0 and λi > 0
Training instances that do not reside along these hyperplanes
have λi = 0
Huiping Cao, Slide 19/26
Linear Support Vector Machine

Get w and b

w and b depend on support vectors xi and its class label yi .


w value
N
X
w= λ i y i xi
i=1

b value
b = yi − w| xi
Idea:
Given a support vector (xi , yi ), we have yi (w| xi + b) − 1 = 0.
Multiply yi on both sides, → yi2 (w| xi + b) − yi = 0.
yi2 = 1 because yi = 1 or yi = −1.
Then, (w| xi + b) − yi = 0.

Huiping Cao, Slide 20/26


Linear Support Vector Machine

Get w and b – Example


xi1 xi2 yi λi
0.4 0.5 1 100
0.5 0.6 -1 100
0.9 0.4 -1 0
0.7 0.9 -1 0
0.17 0.05 1 0
0.4 0.35 1 0
0.9 0.8 -1 0
0.2 0 1 0

Solve λ using quadratic programming packages


w| = (w1 , w2 )
P2
w1 = i=1 λi yi xi1 = 100 ∗ 1 ∗ 0.4 + 100 ∗ (−1) ∗ 0.5 = −10
P2
w2 = i=1 λi yi xi2 = 100 ∗ 1 ∗ 0.5 + 100 ∗ (−1) ∗ 0.6 = −10
b = 1 − w| x1 = 1 − ((−10) ∗ 0.4 + (−10) ∗ (0.5)) = 10
Huiping Cao, Slide 21/26
Linear Support Vector Machine

SVM: given a test data point z

|
yz = sign(w| z + b) = sign(( N
P
i=1 λi yi xi )z + b)

if yz = 1, the test instance is classified as positive class

if yz = −1, the test instance is classified as negative class

Huiping Cao, Slide 22/26


Linear Support Vector Machine

SVM – discussions
Support Vector Machines
What is the problem if not linearly separable?
  Whatif the problem is not linearly separable?

CS479/579: Data Mining Slide-11

Huiping Cao, Slide 23/26


Linear Support Vector Machine

SVM – discussions
Nonlinear Support Vector Machines

Nonlinear separable   What if decision boundary is not linear?

CS479/579: Data Mining Slide-13

Do not work in X space. Transform the data into higher dimensional Z space
such that the data are linearly separable.

N N N
1 XX
λi λj yi yj x|i xj
X
LD (λ) = λi −
i=1
2 i=1 j=1


N N N
1 XX
λi λj yi yj z|i zj
X
LD (λ) = λi −
i=1
2 i=1 j=1

Apply linear SVM, support vectors live in Z space


Huiping Cao, Slide 24/26
Linear Support Vector Machine

Quadratic programming packages – Octave

https://www.gnu.org/software/octave/doc/interpreter/Quadratic-
Programming.html
Solve the quadratic program

minx (0.5x| ∗ H ∗ x + x| ∗ q)
subject to

 A∗x= b
lb <= x <= ub
Alb <= Ain ∗ x <= Aub

Huiping Cao, Slide 25/26


Linear Support Vector Machine

Quadratic programming packages (MATLAB)

Optimization toolbox in MATLAB:


1
minx ( x| Hx + f| x)
2
such that 
 A·x ≤ b,
Aeq · x = beq,
lb ≤ x ≤ ub.

Huiping Cao, Slide 26/26

You might also like