Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

SVM Incremental Learning, Adaptation

and Optimization
Chris Diehl Gert Cauwenberghs
Applied Physics Laboratory ECE Department
Johns Hopkins University Johns Hopkins University
Chris.Diehl@jhuapl.edu gert@jhu.edu

July 23, 2003


IJCNN – Incremental Learning
Workshop
Motivation

l What is the objective of machine learning?


Ø To identify a model from the data that generalizes well
l When learning incrementally, the search for a
good model involves repeatedly
Ø Selecting a hypothesis class (HC)
Ø Searching the HC by minimizing an objective function over
the model parameter space
Ø Evaluating the resulting model’s generalization performance
Contributions

l Unified framework for incremental learning and


adaptation of SVM classifiers that supports
Ø Learning and unlearning of individual or multiple examples
Ø Adaptation of the current SVM to changes in regularization
and kernel parameters
Ø Generalization performance assessment through exact and
approximate leave-one-out error estimation
l Generalization of incremental learning algorithm
presented in
G. Cauwenberghs and T. Poggio, “Incremental and
Decremental SVM Learning,” Advances in Neural
Information Processing Systems (NIPS 2000), vol. 13, 2001.
SVM Learning for Classification

l Quadratic Programming Problem (Dual Form)


N N N
min W =
0≤α i ≤ C
1
2 ∑ α Q α − ∑α
i , j =1
i ij j
i =1
i + b∑ yiα i
i =1

( )
N
Qij = yi y j K xi , x j θ f ( x ) = ∑ yiα i K ( xi , x θ ) + b
i =1
l Model Selection
Ø Repeatedly select (C,θ )
Ø Solve the quadratic program
Ø Perform cross-validation (approximate or exact)
Karush-Kuhn-Tucker (KKT) Conditions

 > 0 ∀i ∈ R
∂W 
gi = = yi f ( xi ) − 1 = = 0 ∀i ∈ S
∂α i  < 0 ∀i ∈ E

∂W N
h= = ∑ y jα j = 0
∂b j =1

R : Reserve Vectors S : Margin Vectors E : Error Vectors


(α = 0 ) (0 ≤ α ≤ C ) (α = C )
y =1
y =1

f ( x) = 1

f ( x) = − 1
Incrementing Additional Examples into the
Solution
( )
f ( x ) = ∑ yiα i K ( xi , x θ ) + ∑ y jα j K x j , x θ + b
i∈E j∈S

(
f '( x) = ∑ yiα i K ( xi , x θ ) + ∑ y j (α j + ∆α j ) K xj , x θ +
i∈E j∈S
)
∑ y ∆α K ( x , x θ ) + b + ∆b
k ∈U
k k k
U : Unlearned Vectors
(0 ≤ α < C )
Strategy
Preserve the KKT conditions y =1
for the examples in R, S and
E while incrementing the y =1
examples in U into the y =1
solution f (x) = 1

f ( x) = − 1
Part 1: Computing Coefficient and Margin
Sensitivities

l For small perturbations of the unlearned vector


coefficients, the error and reserve vectors will not
change category
l Need to modify {∆α k ∀ k ∈ S } and ∆b so that the margin
vectors remain on the margin
l Transformation from original SVM to final SVM will be
controlled via the perturbation parameter p
Part 1: Computing Coefficient and Margin
Sensitivities

 ∆α k  ∆b
l The coefficient sensitivities  β k = ∀k ∈ S  and β =
 ∆p  ∆p
will be computed by enforcing the constraints
 ∆g i  ∆h
γ i = = 0 ∀i ∈ S  and =0
 ∆p  ∆p
l Margin sensitivities {γ i ∀i ∈ E , R, U } can then be
computed based on the coefficient sensitivities
Part 2: Bookkeeping

l Fundamental Assumption
Ø No examples change category (e.g. margin to error vector)
during a perturbation
l Perturbation Process
Ø Using the coefficient and margin sensitivities, compute the
smallest ∆p that leads to a category change
− Margin vectors: track changes in α
− Error, reserve, unlearned vectors: track changes in g
Ø Update the example categories and classifier parameters
Ø Recompute the coefficient and margin sensitivities and
repeat the process until p = 1.
Decremental Learning and Cross-Validation

l The perturbation process is fully reversible!


l This allows one the option of unlearning individual or
multiple examples to perform exact leave-one-out
(LOO) or k-fold cross-validation
l The LOO approximation based on Vapnik and
Chapelle’s span rule is easily computed using the
margin sensitivities
Regularization and Kernel Parameter
Perturbation Strategies

l Regularization Parameter Perturbation


Ø Involves incrementing/decrementing error vector coefficients
Ø Bookkeeping changes slightly since the regularization
parameters change during a perturbation of the SVM
l Kernel Parameter Perturbation
Ø First modify the kernel parameter
Ø Then correct violations of the KKT conditions
Ø Becomes increasingly expensive as the number of margin
vectors in the original SVM increases

You might also like