PR-January20-10 Online Trial

CSE 473
Pattern Recognition
Instructor:
Dr. Md. Monirul Islam
Linear Classifier
2
Recall from Previous Lectures
x2
x1
x2
x1
x2
x1w1+x2w2+w0=0
x1
x2
w.x + w0 =0
wtx + w0 =0
x = [x1, x2]
w = [w1, w2]
x1
The Perceptron Algorithm
– Use training file to learn w
– Steps
– Initialize w
– Classify all training samples using current w
– Find new w using
w(t  1)  w(t )   t   x x
xY
– Repeat until w converges
7
Variants of Perceptron Algorithm (1)
T
w (t ) x ( t )  0
update
w (t  1)  w (t )   x ( t ) ,
x ( t )  1
T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,
x (t )   2
w (t  1)  w (t ) otherwise No Update
– It is a reward and punishment type of algorithm
8
Variants of Perceptron Algorithm (2)
 initialize weight vector w(0)

 define pocket ws.and history hs
 generate next w(t+1). If it is better than w(t),
store w(t+1) in ws and change the hs
– It is pocket algorithm
9
Convergence Proof of Perceptron
Algorithm
is equivalent to
10
Generalization of Perceptron Algorithm
for M- Class case
• For each training vector of class ωi, construct
T T T T T T T
xi , j  [0 ,,0 , x ,,0 , x ,0 ]
ith location jth location
(l+1)M dimension
• Concatenate the weight vectors:

T T T T T T
w  [ w1 , w2 , wi ,, w j , wM ]
11
Generalization of Perceptron Algorithm
for M- Class case
T T T T T T T
xi , j  [0 ,,0 , x ,,0 , x ,0 ]
w  [ w1T , w2T , wiT ,, wTj , wM

T T
]
• Use a single Perceptron to solve

• Parameters:
• (l+1)M feature dimension
• All N(M-1) training vectors to be on positive side
• This reorganization is known as Kesler’s construction
12
Two-Class case again
Let, we have N
training samples
Let, we have N
training samples
• Find a linear hyperplane (decision boundary) that will separate the data
B1
• One Possible Solution

B2
• Another possible solution

B
2
• Other possible solutions

B1
B2
• Which one is better? B1 or B2?

• How do you define better?
B1
B2
b21
b22
b11
b12
• Find hyperplane maximizes the margin => B1 is better than B2
B1
 
w x  b  0
B1
 
w x  b  0
 
 
w  x  b  1
w  x  b  1
b11
b12
B1
 
w x  b  0
 
 
w  x  b  1
w  x  b  1
b11
b12
2
Margin   2
|| w ||
B1
 
w x  b  0
 
 
w  x  b  1
w  x  b  1
b11
  b12
  1 if w  x  b  1 2
f (x)     Margin   2
 1 if w  x  b   1 || w ||
2
• We want to maximize: Margin   2
|| w ||
– subject to the following constraints:
 
1 if w  x i  b  1
yi    
 1 if w  x i  b   1
2
|| w ||
 
1 if w  x i  b  1
yi     i
 1 if w  x i  b   1
2
|| w ||
 
1 if w  x i  b  1
yi     i
 1 if w  x i  b   1
The classifier is known as

Support Vector Machine
2
|| w ||
2
|| w ||
– Which is equivalent to minimizing:  2
|| w ||
L( w ) 
2
2
|| w ||
– Which is equivalent to minimizing:  2
|| w ||
L( w ) 
2
 
1 if w  x i  b  1
• yi    
 1 if w  x i  b   1
 
The Expression 1 if w  x i  b  1
yi    
 1 if w  x i  b   1
 
can be written as y i ( w  x i  b)  1
 
yi    
 1 if w  x i  b   1
 
• We can say :
 2
– minimize: || w ||
L( w ) 
2
– Subject to:
 
y i ( w  x i  b)  1
 
yi    
 1 if w  x i  b   1
 
• We can say :
 2
L( w ) 
2
– Subject to:
   
y i ( w  x i  b)  1 or y i ( w  x i  b) - 1  0
 2
|| w ||
• L( w )  is a quadratic equation
2
• Solving for w and b is not easy

 2
|| w ||
2
• What happens if w =0?

 2
|| w ||
2
• What happens if w =0?
 
y i ( w  x i  b) - 1  0 may be infeasible
Support Vector Machines
 2
L( w ) 
2
– Subject to:  
y i ( w  x i  b)  1 i
• Use Lagrange function:

 2 N
|| w ||
Lp    i  yi ( w.xi  b)  1
2 i 1
• Lagrange function:
 2 N
|| w || 
Lp    i  yi ( w.xi  b)  1
2 i 1
• New constraints are:
L p
 0
w
L p
0
b
 2 N
|| w || 
Lp    i  yi ( w.xi  b)  1
2 i 1
• New constraints are:
L p  N 
  0  w   i yi xi
w i 1
L p N
b
0   i yi  0
i 1
 2 N
|| w || 
Lp    i  yi ( w.xi  b)  1
2 i 1
• constraints are:
 N 
w   i yi xi Still not solvable, many
i 1 variables
N
 y
i 1
i i 0
• Use Lagrange function:
 2 N
|| w || 
Lp    i  yi ( w.xi  b)  1
2 i 1
• constraints are:
From Karush-Kuhn_Tucker
Transform,
 N 
w   i yi xi
i 1 i  0

i  yi ( w.xi  b)  1  0
N
 y
i 1
i i 0
i  0 : non - negative

i  yi ( w.xi  b)  1  0
B1
i  0
Support
Vectors
i  0

i  yi ( w.xi  b)  1  0

PR-January20-10 Online Trial

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PR-January20-10 Online Trial

Uploaded by

Copyright:

Available Formats

CSE 473

– Repeat until w converges

– It is a reward and punishment type of algorithm

 initialize weight vector w(0)

• Concatenate the weight vectors:

w  [ w1T , w2T , wiT ,, wTj , wM

• Use a single Perceptron to solve

• One Possible Solution

• Another possible solution

• Other possible solutions

• Which one is better? B1 or B2?

– subject to the following constraints:

– subject to the following constraints:

– subject to the following constraints:

The classifier is known as

• Solving for w and b is not easy

• Solving for w and b is not easy

• What happens if w =0?

• Solving for w and b is not easy

• What happens if w =0?

• Use Lagrange function:

• New constraints are:

• New constraints are:

You might also like