SVM Incremental Learning, Adaptation and Optimization - IJCNN 2003 Presentation

SVM Incremental Learning, Adaptation
and Optimization
Chris Diehl Gert Cauwenberghs
Applied Physics Laboratory ECE Department
Johns Hopkins University Johns Hopkins University
Chris.Diehl@jhuapl.edu gert@jhu.edu
July 23, 2003

IJCNN – Incremental Learning
Workshop
Motivation
l What is the objective of machine learning?

Ø To identify a model from the data that generalizes well
l When learning incrementally, the search for a
good model involves repeatedly
Ø Selecting a hypothesis class (HC)
Ø Searching the HC by minimizing an objective function over
the model parameter space
Ø Evaluating the resulting model’s generalization performance
Contributions
l Unified framework for incremental learning and

adaptation of SVM classifiers that supports
Ø Learning and unlearning of individual or multiple examples
Ø Adaptation of the current SVM to changes in regularization
and kernel parameters
Ø Generalization performance assessment through exact and
approximate leave-one-out error estimation
l Generalization of incremental learning algorithm
presented in
G. Cauwenberghs and T. Poggio, “Incremental and
Decremental SVM Learning,” Advances in Neural
Information Processing Systems (NIPS 2000), vol. 13, 2001.
SVM Learning for Classification
l Quadratic Programming Problem (Dual Form)

N N N
min W =
0≤α i ≤ C
1
2 ∑ α Q α − ∑α
i , j =1
i ij j
i =1
i + b∑ yiα i
i =1
( )
N
Qij = yi y j K xi , x j θ f ( x ) = ∑ yiα i K ( xi , x θ ) + b
i =1
l Model Selection
Ø Repeatedly select (C,θ )
Ø Solve the quadratic program
Ø Perform cross-validation (approximate or exact)
Karush-Kuhn-Tucker (KKT) Conditions
 > 0 ∀i ∈ R
∂W 
gi = = yi f ( xi ) − 1 = = 0 ∀i ∈ S
∂α i  < 0 ∀i ∈ E

∂W N
h= = ∑ y jα j = 0
∂b j =1
R : Reserve Vectors S : Margin Vectors E : Error Vectors

(α = 0 ) (0 ≤ α ≤ C ) (α = C )
y =1
y =1
f ( x) = 1
f ( x) = − 1
Incrementing Additional Examples into the
Solution
( )
f ( x ) = ∑ yiα i K ( xi , x θ ) + ∑ y jα j K x j , x θ + b
i∈E j∈S
(
f '( x) = ∑ yiα i K ( xi , x θ ) + ∑ y j (α j + ∆α j ) K xj , x θ +
i∈E j∈S
)
∑ y ∆α K ( x , x θ ) + b + ∆b
k ∈U
k k k
U : Unlearned Vectors
(0 ≤ α < C )
Strategy
Preserve the KKT conditions y =1
for the examples in R, S and
E while incrementing the y =1
examples in U into the y =1
solution f (x) = 1
f ( x) = − 1
Part 1: Computing Coefficient and Margin
Sensitivities
l For small perturbations of the unlearned vector

coefficients, the error and reserve vectors will not
change category
l Need to modify {∆α k ∀ k ∈ S } and ∆b so that the margin
vectors remain on the margin
l Transformation from original SVM to final SVM will be
controlled via the perturbation parameter p
Part 1: Computing Coefficient and Margin
Sensitivities
 ∆α k  ∆b
l The coefficient sensitivities  β k = ∀k ∈ S  and β =
 ∆p  ∆p
will be computed by enforcing the constraints
 ∆g i  ∆h
γ i = = 0 ∀i ∈ S  and =0
 ∆p  ∆p
l Margin sensitivities {γ i ∀i ∈ E , R, U } can then be
computed based on the coefficient sensitivities
Part 2: Bookkeeping
l Fundamental Assumption
Ø No examples change category (e.g. margin to error vector)
during a perturbation
l Perturbation Process
Ø Using the coefficient and margin sensitivities, compute the
smallest ∆p that leads to a category change
− Margin vectors: track changes in α
− Error, reserve, unlearned vectors: track changes in g
Ø Update the example categories and classifier parameters
Ø Recompute the coefficient and margin sensitivities and
repeat the process until p = 1.
Decremental Learning and Cross-Validation
l The perturbation process is fully reversible!

l This allows one the option of unlearning individual or
multiple examples to perform exact leave-one-out
(LOO) or k-fold cross-validation
l The LOO approximation based on Vapnik and
Chapelle’s span rule is easily computed using the
margin sensitivities
Regularization and Kernel Parameter
Perturbation Strategies
l Regularization Parameter Perturbation

Ø Involves incrementing/decrementing error vector coefficients
Ø Bookkeeping changes slightly since the regularization
parameters change during a perturbation of the SVM
l Kernel Parameter Perturbation
Ø First modify the kernel parameter
Ø Then correct violations of the KKT conditions
Ø Becomes increasingly expensive as the number of margin
vectors in the original SVM increases

SVM Incremental Learning, Adaptation and Optimization - IJCNN 2003 Presentation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SVM Incremental Learning, Adaptation and Optimization - IJCNN 2003 Presentation

Uploaded by

Copyright:

Available Formats

SVM Incremental Learning, Adaptation

July 23, 2003

l What is the objective of machine learning?

l Unified framework for incremental learning and

l Quadratic Programming Problem (Dual Form)

R : Reserve Vectors S : Margin Vectors E : Error Vectors

l For small perturbations of the unlearned vector

l The perturbation process is fully reversible!

l Regularization Parameter Perturbation

You might also like