Classification: Linear SVM

Linear Support Vector Machine
Classification
Linear SVM
Huiping Cao
Huiping Cao, Slide 1/26

Support Vector Machines
ort Vector Machines

Find a linear hyperplane (decision boundary) that will separate the
data.
a linear hyperplane (decision boundary) that will separate the data

pport Vector Machines

One possible solution.
B1
One Possible Solution

CS479/579: Data Mining Slide-3

Another possible solution.
B2
Another possible solution


Other possible solutions.
B2
ther possible solutions

Which one is better? B1 or B2?

pportHow
Vector Machines
do you define better? The bigger the margin, the better.
B1
B2
Which one is better? B1 or B2?

How do you define better?
upport Vector
Find Machines
hyperplane that maximizes the margin; so B1 is better than
B2
B1
B2
b21
b22
margin
b11
b12
Find hyperplane maximizes the margin => B1 is better than B2


B1
+b=0
+ b = −1
 
w • x + b = +1
b11
  b12
⎧ 1 if w • x + b ≥ 1 2
= ⎨   Margin = !
⎩− 1 if w • x + b ≤ −1 || w ||
B1 : w |x + b
CS479/579: Data Mining
=0 Slide-8
b11 : w| x + b = 1
b12 : w| x + b = −1

! !
w • x + b = +1
B1
 
w• x + b = 0
d
! ! X1
w • x + b = !1
X2
!
w
b11
b12
The margin of the decision boundaries:

w| (x1 − x2 ) = 2
w| (x1 − x2 ) = kwk × k(x1 − x2 )k × cos(θ) , where k · k is the
norm of a vector.
kwk × k(x1 − x2 )k × cos(θ) = kwk × d, where d is the length
of vector (x1 − x2 ) in the direction of vector w
Thus, kwk × d = 2
2
Margin: d = kwk

Learn a linear SVM Model
The training phase of SVM is to estimate the parameters w

and b.
The parameters must follow two conditions:

(
1, if w| xi + b ≥ 1
yi =
−1, if w| xi + b ≤ −1

Formulate the problem – rationale
2
We want to maximize margin: d = kwk
kwk2
which is equivalent to minimize L(w) = 2 .
Subjected to the following constraints:
(
1 if w| xi + b ≥ 1
yi =
−1 if w| xi + b ≤ −1
which is equivalent to
yi (w| xi + b) ≥ 1, i = 1, · · · , N

Formulate the problem
Formalized to the following constrained optimization problem.

Objective function:
kwk2
Minimize
2
subject to
yi (w| xi + b) ≥ 1, i = 1, · · · , N
Convex optimization problem:
Objective function is quadratic
Constraints are linear
Can be solved using the standard Lagrange multiplier
method.

Lagrange formulation
Lagrangian for the optimization problem (take into account
the constraints by rewriting the objective function).
N
kwk2 X
LP (w, b, λ) = − λi (yi (w| xi + b) − 1)
2
i=1
minimize w.r.t. w and b and maximize w.r.t. each λi ≥ 0
where λi : Lagrange multiplier
To minimize the Lagrangian, take the derivative of
LP (w, b, λ) w.r.t. w and b and set them to 0:
N
∂LP X
= 0 =⇒ w = λi yi xi
∂w
i=1
N
∂LP X
= 0 =⇒ λi yi = 0
∂b
i=1
Substituting, LP (w, b, λ) to LD (λ)
Solving LP (w, b, λ) is still difficult because it solves a large

number of parameters w, b, and λi .
Idea: Transform Lagrangian into a function of the Lagrange
multipliers only by substituting w and b in LP (w, b, λ), we
get (the dual problem)
N N N
1 XX
λi λj yi yj x|i xj
X
LD (λ) = λi −
2
i=1 i=1 j=1
Details: see the next slide.

It is very nice to have LD (λ) because it is a simple quadratic
form in the vector λ.

Substituting details, get the dual problem LD (λ)
N N N N
kwk2 1 1 X 1 XX
λi yi x|i ) · ( λi λj yi yj x|i xj
X
= w| w = ( λj yj xj ) = (1)
2 2 2 i=1 j=1
2 i=1 j=1
N
X N
X N
X N
X
− λi (yi (w| xi + b) − 1) = − λi yi w| xi − λi yi b + λi (2)
i i=1 i=1 i=1
N N N
λj yj x|j )xi +
X X X
=− λi yi ( λi (3)
i=1 j=1 i=1
N N X
N
X X
= λi − (4)
i=1 i=1 j=1
Add both sides, get

N N N
1 XX
X
LD (λ) = λi −
2
i=1 i=1 j=1

Finalized LD (λ)
N N N
1 XX
X
LD (λ) = λi −
2
i=1 i=1 j=1
Maximize w.r.t. each λ

subject to
λi ≥ 0 for i = 1, 2, · · · , N
and N
P
i=1 λi yi = 0
Solve LD (λ) using quadratic programming (QP). We get all the λi
(Out of the scope).

Solve LD (λ) – QP
We are maximizing LD (λ)

N N N
X 1 XX
maxλ ( λi − λi λj yi yj x|i xj )
2
i=1 i=1 j=1
subject to constraints
(1) λi ≥ 0 for i = 1, 2, · · · , N and
PN
(2) i=1 λi yi = 0.
Translate the objective to minimization because QP packages
generally come with minimization.
N N N
1 XX X
minλ ( λi λj yi yj x|i xj − λi )
2
i=1 j=1 i=1

Solve LD (λ) – QP
y1 y1 x|1 x1 y1 y2 x|1 x2 y1 yN x|1 xN

 
···
1  y2 y1 x|2 x1 y2 y2 x|2 x2 ··· y2 yN x|2 xN 
minλ ( λ|   λ + (−1)| λ
2  ··· ··· ··· ···  | {z }
yN y1 x|N x1 yN y2 x|N x2 ··· yN yN x|N xN linear
| {z }
quadratic coefficients
subject to
y| λ = 0
| {z }
linear constraint
0
|{z} ≤λ≤ ∞
|{z}
lower bounds upper bounds
Let Q represent the matrix with the quadratic coefficients

minλ ( 21 λ| Qλ + (−1)| λ) subject to y| λ = 0; λ ≥ 0

Lagrange multiplier
QP solves λ = λ1 , λ2 , · · · , λN , where most of them are zeros.

Karush-Kuhn-Tucker (KKT) conditions
λi ≥ 0
The constraint (zero form with extreme value)
λi (yi (w| xi + b) − 1) = 0
Either λi is zero
Or yi (w| xi + b) − 1) = 0
Support vector xi : yi (w| xi + b − 1) = 0 and λi > 0
Training instances that do not reside along these hyperplanes
have λi = 0
Get w and b
w and b depend on support vectors xi and its class label yi .

w value
N
X
w= λ i y i xi
i=1
b value
b = yi − w| xi
Idea:
Given a support vector (xi , yi ), we have yi (w| xi + b) − 1 = 0.
Multiply yi on both sides, → yi2 (w| xi + b) − yi = 0.
yi2 = 1 because yi = 1 or yi = −1.
Then, (w| xi + b) − yi = 0.

Get w and b – Example

xi1 xi2 yi λi
0.4 0.5 1 100
0.5 0.6 -1 100
0.9 0.4 -1 0
0.7 0.9 -1 0
0.17 0.05 1 0
0.4 0.35 1 0
0.9 0.8 -1 0
0.2 0 1 0
Solve λ using quadratic programming packages

w| = (w1 , w2 )
P2
w1 = i=1 λi yi xi1 = 100 ∗ 1 ∗ 0.4 + 100 ∗ (−1) ∗ 0.5 = −10
P2
w2 = i=1 λi yi xi2 = 100 ∗ 1 ∗ 0.5 + 100 ∗ (−1) ∗ 0.6 = −10
b = 1 − w| x1 = 1 − ((−10) ∗ 0.4 + (−10) ∗ (0.5)) = 10
SVM: given a test data point z
|
yz = sign(w| z + b) = sign(( N
P
i=1 λi yi xi )z + b)
if yz = 1, the test instance is classified as positive class
if yz = −1, the test instance is classified as negative class

SVM – discussions
What is the problem if not linearly separable?
  Whatif the problem is not linearly separable?

SVM – discussions
Nonlinear Support Vector Machines
Nonlinear separable   What if decision boundary is not linear?
Do not work in X space. Transform the data into higher dimensional Z space
such that the data are linearly separable.
N N N
1 XX
X
LD (λ) = λi −
i=1
2 i=1 j=1
→
N N N
1 XX
λi λj yi yj z|i zj
X
LD (λ) = λi −
i=1
2 i=1 j=1
Apply linear SVM, support vectors live in Z space

Quadratic programming packages – Octave
https://www.gnu.org/software/octave/doc/interpreter/Quadratic-
Programming.html
Solve the quadratic program
minx (0.5x| ∗ H ∗ x + x| ∗ q)
subject to

 A∗x= b
lb <= x <= ub
Alb <= Ain ∗ x <= Aub


Quadratic programming packages (MATLAB)
Optimization toolbox in MATLAB:

1
minx ( x| Hx + f| x)
2
such that 
 A·x ≤ b,
Aeq · x = beq,
lb ≤ x ≤ ub.


Classification: Linear SVM

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classification: Linear SVM

Uploaded by

Copyright:

Available Formats

Linear Support Vector Machine

Huiping Cao, Slide 1/26

Support Vector Machines

ort Vector Machines

a linear hyperplane (decision boundary) that will separate the data

Support Vector Machines

pport Vector Machines

One Possible Solution

Support Vector Machines

pport Vector Machines

Another possible solution

Support Vector Machines

ther possible solutions

Support Vector Machines

Which one is better? B1 or B2?

Which one is better? B1 or B2?

Support Vector Machines

Find hyperplane maximizes the margin => B1 is better than B2

Support Vector Machines

Support Vector Machines

The margin of the decision boundaries:

Huiping Cao, Slide 9/26

Learn a linear SVM Model

The training phase of SVM is to estimate the parameters w

The parameters must follow two conditions:

Huiping Cao, Slide 10/26

Formulate the problem – rationale

Huiping Cao, Slide 11/26

Formulate the problem

Formalized to the following constrained optimization problem.

Huiping Cao, Slide 12/26

Substituting, LP (w, b, λ) to LD (λ)

Solving LP (w, b, λ) is still difficult because it solves a large

Details: see the next slide.

Huiping Cao, Slide 14/26

Substituting details, get the dual problem LD (λ)

Add both sides, get

Huiping Cao, Slide 15/26

Maximize w.r.t. each λ

Huiping Cao, Slide 16/26

We are maximizing LD (λ)

Huiping Cao, Slide 17/26

y1 y1 x|1 x1 y1 y2 x|1 x2 y1 yN x|1 xN

Let Q represent the matrix with the quadratic coefficients

Huiping Cao, Slide 18/26

QP solves λ = λ1 , λ2 , · · · , λN , where most of them are zeros.

w and b depend on support vectors xi and its class label yi .

Huiping Cao, Slide 20/26

Get w and b – Example

Solve λ using quadratic programming packages

SVM: given a test data point z

if yz = 1, the test instance is classified as positive class

if yz = −1, the test instance is classified as negative class

Huiping Cao, Slide 22/26

CS479/579: Data Mining Slide-11

Huiping Cao, Slide 23/26

Nonlinear separable  What if decision boundary is not linear?

CS479/579: Data Mining Slide-13

Apply linear SVM, support vectors live in Z space

Quadratic programming packages – Octave

Huiping Cao, Slide 25/26

Quadratic programming packages (MATLAB)

Nonlinear separable   What if decision boundary is not linear?