Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

CSE 473

Pattern Recognition

Instructor:
Dr. Md. Monirul Islam
Linear Classifier

2
Recall from Previous Lectures

x2

x1
Recall from Previous Lectures

x2

x1
Recall from Previous Lectures

x2

x1w1+x2w2+w0=0

x1
Recall from Previous Lectures

x2
w.x + w0 =0
wtx + w0 =0

x = [x1, x2]
w = [w1, w2]

x1
The Perceptron Algorithm
– Use training file to learn w
– Steps
– Initialize w
– Classify all training samples using current w
– Find new w using

w(t  1)  w(t )   t   x x
xY

– Repeat until w converges

7
Variants of Perceptron Algorithm (1)
T
w (t ) x ( t )  0

update
w (t  1)  w (t )   x ( t ) ,
x ( t )  1

T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,
x (t )   2

w (t  1)  w (t ) otherwise No Update

– It is a reward and punishment type of algorithm

8
Variants of Perceptron Algorithm (2)

 initialize weight vector w(0)


 define pocket ws.and history hs
 generate next w(t+1). If it is better than w(t),
store w(t+1) in ws and change the hs

– It is pocket algorithm

9
Convergence Proof of Perceptron
Algorithm

is equivalent to

10
Generalization of Perceptron Algorithm
for M- Class case
• For each training vector of class ωi, construct
T T T T T T T
xi , j  [0 ,,0 , x ,,0 , x ,0 ]
ith location jth location

(l+1)M dimension

• Concatenate the weight vectors:


T T T T T T
w  [ w1 , w2 , wi ,, w j , wM ]
11
Generalization of Perceptron Algorithm
for M- Class case
T T T T T T T
xi , j  [0 ,,0 , x ,,0 , x ,0 ]

w  [ w1T , w2T , wiT ,, wTj , wM


T T
]

• Use a single Perceptron to solve


• Parameters:
• (l+1)M feature dimension
• All N(M-1) training vectors to be on positive side
• This reorganization is known as Kesler’s construction
12
Two-Class case again

Let, we have N
training samples
Two-Class case again

Let, we have N
training samples

• Find a linear hyperplane (decision boundary) that will separate the data
Two-Class case again
B1

• One Possible Solution


Two-Class case again

B2

• Another possible solution


Two-Class case again

B
2

• Other possible solutions


Two-Class case again
B1

B2

• Which one is better? B1 or B2?


• How do you define better?
Two-Class case again
B1

B2

b21
b22

b11

b12
• Find hyperplane maximizes the margin => B1 is better than B2
Two-Class case again
B1

 
w x  b  0
Two-Class case again
B1

 
w x  b  0
 
 
w  x  b  1
w  x  b  1

b11

b12
Two-Class case again
B1

 
w x  b  0
 
 
w  x  b  1
w  x  b  1

b11

b12
2
Margin   2
|| w ||
Two-Class case again
B1

 
w x  b  0
 
 
w  x  b  1
w  x  b  1

b11

  b12
  1 if w  x  b  1 2
f (x)     Margin   2
 1 if w  x  b   1 || w ||
Two-Class case again
2
• We want to maximize: Margin   2
|| w ||

– subject to the following constraints:

 
1 if w  x i  b  1
yi    
 1 if w  x i  b   1
Two-Class case again
2
• We want to maximize: Margin   2
|| w ||

– subject to the following constraints:

 
1 if w  x i  b  1
yi     i
 1 if w  x i  b   1
Two-Class case again
2
• We want to maximize: Margin   2
|| w ||

– subject to the following constraints:

 
1 if w  x i  b  1
yi     i
 1 if w  x i  b   1

The classifier is known as


Support Vector Machine
Support Vector Machine
2
• We want to maximize: Margin   2
|| w ||
Support Vector Machine
2
• We want to maximize: Margin   2
|| w ||
– Which is equivalent to minimizing:  2
|| w ||
L( w ) 
2
Support Vector Machine
2
• We want to maximize: Margin   2
|| w ||
– Which is equivalent to minimizing:  2
|| w ||
L( w ) 
2
– subject to the following constraints:

 
1 if w  x i  b  1
• yi    
 1 if w  x i  b   1
Support Vector Machine
 
The Expression 1 if w  x i  b  1
yi    
 1 if w  x i  b   1

 
can be written as y i ( w  x i  b)  1
Support Vector Machine
 
The Expression 1 if w  x i  b  1
yi    
 1 if w  x i  b   1

 
can be written as y i ( w  x i  b)  1

• We can say :
 2
– minimize: || w ||
L( w ) 
2
– Subject to:
 
y i ( w  x i  b)  1
Support Vector Machine
 
The Expression 1 if w  x i  b  1
yi    
 1 if w  x i  b   1

 
can be written as y i ( w  x i  b)  1

• We can say :
 2
– minimize: || w ||
L( w ) 
2
– Subject to:
   
y i ( w  x i  b)  1 or y i ( w  x i  b) - 1  0
Support Vector Machine
 2
|| w ||
• L( w )  is a quadratic equation
2

• Solving for w and b is not easy


Support Vector Machine
 2
|| w ||
• L( w )  is a quadratic equation
2

• Solving for w and b is not easy

• What happens if w =0?


Support Vector Machine
 2
|| w ||
• L( w )  is a quadratic equation
2

• Solving for w and b is not easy

• What happens if w =0?

 
y i ( w  x i  b) - 1  0 may be infeasible
Support Vector Machines
 2
– minimize: || w ||
L( w ) 
2
– Subject to:  
y i ( w  x i  b)  1 i

• Use Lagrange function:


 2 N
|| w ||
Lp    i  yi ( w.xi  b)  1
2 i 1
Support Vector Machines
• Lagrange function:
 2 N
|| w || 
Lp    i  yi ( w.xi  b)  1
2 i 1

• New constraints are:

L p
 0
w
L p
0
b
Support Vector Machines
• Lagrange function:
 2 N
|| w || 
Lp    i  yi ( w.xi  b)  1
2 i 1

• New constraints are:

L p  N 
  0  w   i yi xi
w i 1

L p N

b
0   i yi  0
i 1
Support Vector Machines
• Lagrange function:
 2 N
|| w || 
Lp    i  yi ( w.xi  b)  1
2 i 1

• constraints are:

 N 
w   i yi xi Still not solvable, many
i 1 variables
N

 y
i 1
i i 0
Support Vector Machines
• Use Lagrange function:
 2 N
|| w || 
Lp    i  yi ( w.xi  b)  1
2 i 1

• constraints are:
From Karush-Kuhn_Tucker
Transform,
 N 
w   i yi xi
i 1 i  0

i  yi ( w.xi  b)  1  0
N

 y
i 1
i i 0
Support Vector Machines

i  0 : non - negative

i  yi ( w.xi  b)  1  0
Support Vector Machines
B1

i  0
Support
Vectors

i  0

i  yi ( w.xi  b)  1  0

You might also like