Professional Documents
Culture Documents
(Suykens J.) A Short Introduction To Support Vecto
(Suykens J.) A Short Introduction To Support Vecto
(Suykens J.) A Short Introduction To Support Vecto
Overview
Disadvantages of classical neural nets
SVM properties and standard SVM classifier
Related kernelbased learning methods
Use of the kernel trick (Mercer Theorem)
LS-SVMs: extending the SVM framework
Towards a next generation of universally applicable models?
The problem of learning and generalization
x1 w1
x2 w2
w
x3 w 3
xn n
b
1
h()
Classical MLPs
h()
Multilayer Perceptron (MLP) properties:
cost function
MLP
SVM
weights
weights
0.9
2.5
2
0.8
1.5
0.7
x2
x2
0.6
0.5
0.5
0
0.4
0.5
0.3
0.2
1.5
0.1
0.1
0.2
0.3
0.4
0.5
x1
0.6
0.7
0.8
0.9
2.5
2.5
1.5
0.5
x1
0.5
1.5
(x)
w1
y(x)
w nh
+
+
+
+
x
x
x
x
x
x
nh (x)
x
+
x
x
Dual space:
x x
x
x
Input space
Non-parametric:
RN
Pestimate
#sv
y(x) = sign[ i=1 iyiK(x, xi) + b]
K(x, x1 )
1
Feature space
y(x)
#sv
K(x, x#sv )
Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003
Classifier:
y(x) = sign[wT (x) + b]
with () : Rn Rnh a mapping to a high dimensional feature space
(which can be infinite dimensional!)
For separable data, assume
T
w (xi) + b +1, if yi = +1
yi[wT (xi) + b] 1, i
T
w (xi) + b 1, if yi = 1
Optimization problem (non-separable case):
N
X
1 T
yi[wT (xi) + b] 1 i
min J (w, ) = w w + c
i s.t.
i 0, i = 1, ..., N
w,b,
2
i=1
Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003
N
X
i{yi[wT (xi) + b] 1 + i}
ii
i=1
i=1
N
X
w,b,
One obtains
L
w
=0 w=
N
X
iyi(xi)
i=1
L
b
L
i
=0
N
X
iyi = 0
i=1
= 0 0 i c, i = 1, ..., N
N
N
X
1 X
max Q() =
yiyj K(xi, xj ) ij +
j s.t.
2 i,j=1
j=1
iyi = 0
i=1
0 i c, i
PN
y(x) = sign[ i=1 i yi K(x, xi) + b]
LS-SVMs
Gaussian processes
1910-1920: Moore
1940: Aronszajn
1951: Krige
1970: Parzen
1971: Kimeldorf & Wahba
Kriging
cos xz
xT z
=
kxk2kzk2
(x)T (z)
K(x, z)
p
=
=p
k(x)k2k(z)k2
K(x, x) K(z, z)
10
Solving linear systems; link with Gaussian processes, regularization networks and kernel
versions of Fisher discriminant analysis.
Extensions to unsupervised learning: kernel PCA (and related methods of kernel PLS,
CCA), density estimation ( clustering).
11
SVM
LS-SVM
Robust Kernel
Kernel
Robust Linear
Linear
FDA
PCA
PLS
CCA
Classifiers
Regression
Clustering
Recurrent
Research issues:
Large scale methods
Adaptive processing
Robustness issues
Statistical aspects
Application-specific kernels
12
Dual space
Nystrom method
Kernel PCA
Density estimate
Entropy criteria
Regression
Eigenfunctions
SV selection
13
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0.2
0.2
0.4
0.6
10
x
0
10
0.4
10
x
0
40
50
60
70
80
10
yk , yk
yk
2
0
100
200
300
400
500
600
700
800
10
20
900
discrete time k
Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003
30
90
discrete time k
14
2.5
1.5
1.5
0.5
0.5
0.5
0.5
1.5
1.5
2.5
2.5
1.5
0.5
0.5
1.5
2.5
2.5
2.5
2.5
2.5
1.5
1.5
0.5
0.5
0.5
0.5
1.5
1.5
2.5
2.5
1.5
0.5
0.5
1.5
2.5
2.5
1.5
0.5
0.5
1.5
1.5
0.5
0.5
1.5
15
16
Random variables x X, y Y R
Draw i.i.d. samples from (unknown) probability distribution (x, y)
Generalization error:
Z
E[f ] =
L(y, f (x))(x, y)dxdy
X,Y
N
1 X
Loss function L(y, f (x)); empirical error EN [f ] =
L(yi, f (xi))
N i=1
17
approximation error
(E[fH] E[f])
18
Interdisciplinary challenges
replacements
http://www.esat.kuleuven.ac.be/sista/natoasi/ltp2002.html
neural networks
data mining
linear algebra
pattern recognition
mathematics
SVM & kernel methods
machine learning
optimization
statistics
signal processing
19
&
www.esat.kuleuven.ac.be/sista/lssvmlab/
Introductory papers:
C.J.C. Burges (1998) A tutorial on support vector machines for pattern recognition, Knowledge Discovery
and Data Mining, 2(2), 121-167.
A.J. Smola, B. Scholkopf (1998) A tutorial on support vector regression, NeuroCOLT Technical Report
NC-TR-98-030, Royal Holloway College, University of London, UK.
T. Evgeniou, M. Pontil, T. Poggio (2000) Regularization networks and support vector machines, Advances
in Computational Mathematics, 13(1), 150.
K.-R. M
uller, S. Mika, G. Ratsch, K. Tsuda, B. Scholkopf (2001) An introduction to kernel-based learning
algorithms, IEEE Transactions on Neural Networks, 12(2), 181-201.
Introduction to SVM and kernelbased learning Johan Suykens ESANN 2003
20