Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 7

Reproducing Kernel Hilbert Spaces


Lecturer: Peter Bartlett Scribe: Chunhui Gu
1 Reproducing Kernel Hilbert Spaces
1.1 Hilbert Space and Kernel
An inner product u, v) can be
1. a usual dot product: u, v) = v

w =

i
v
i
w
i
2. a kernel product: u, v) = k(v, w) = (v)

(w) (where (u) may have innite dimensions)


However, an inner product , ) must satisfy the following conditions:
1. Symmetry
u, v) = v, u) u, v A
2. Bilinearity
u +v, w) = u, w) +v, w) u, v, w A, , R
3. Positive deniteness
u, u) 0, u A
u, u) = 0 u = 0
Now we can dene the notion of a Hilbert space.
Denition. A Hilbert Space is an inner product space that is complete and separable with respect to the
norm dened by the inner product.
Examples of Hilbert spaces include:
1. The vector space R
n
with a, b) = a

b, the vector dot product of a and b.


2. The space l
2
of square summable sequences, with inner product x, y) =

i=1
x
i
y
i
3. The space L
2
of square integrable functions (i.e.,
_
s
f(x)
2
dx < ), with inner product f, g) =
_
s
f(x)g(x)dx
Denition. k(, ) is a reproducing kernel of a Hilbert space H if f H, f(x) = k(x, ), f()).
1
2 Reproducing Kernel Hilbert Spaces
A Reproducing Kernel Hilbert Space (RKHS) is a Hilbert space H with a reproducing kernel whose span
is dense in H. We could equivalently dene an RKHS as a Hilbert space of functions with all evaluation
functionals bounded and linear.
For instance, the L
2
space is a Hilbert space, but not an RKHS because the delta function which has the
reproducing property
f(x) =
_
s
(x u)f(u)du
does not satisfy the square integrable condition, that is,
_
s
(u)
2
du ,< ,
thus the delta function is not in L
2
.
Now let us dene a kernel.
Denition. k : A A R is a kernel if
1. k is symmetric: k(x, y) = k(y, x).
2. k is positive semi-denite, i.e., x
1
, x
2
, ..., x
n
A, the Gram Matrix K dened by K
ij
= k(x
i
, x
j
) is
positive semi-denite. (A matrix M R
nn
is positive semi-denite if a R
n
, a

Ma 0.)
Here are some properties of a kernel that are worth noting:
1. k(x, x) 0. (Think about the Gram matrix of n = 1)
2. k(u, v)
_
k(u, u)k(v, v). (This is the Cauchy-Schwarz inequality.)
To see why the second property holds, we consider the case when n = 2:
Let a =
_
k(v, v)
k(u, v)
_
. The Gram matrix K =
_
k(u, u) k(u, v)
k(v, u) k(v, v)
_
_ 0 a

Ka 0
[k(v, v)k(u, u) k(u, v)
2
]k(v, v) 0.
By the rst property we know k(v, v) 0, so k(v, v)k(u, u) k(u, v)
2
.
1.2 Build an Reproducing Kernel Hilbert Space (RKHS)
Given a kernel k, dene the reproducing kernel feature map : A R
X
as:
(x) = k(, x)
Consider the vector space:
span((x) : x A) = f() =
n

i=1

i
k(, x
i
) : n N, x
i
A,
i
R
For f =

i
k(, u
i
) and g =

i
k(, v
i
), dene f, g) =

i,j

i

j
k(u
i
, v
j
).
Note that:
f, k(, x)) =

i
k(x, u
i
) = f(x), i.e., k has the reproducing property.
We show that f, g) is an inner product by checking the following conditions:
Reproducing Kernel Hilbert Spaces 3
1. Symmetry: f, g) =

i,j

i

j
k(u
i
, v
j
) =

i,j

j

i
k(v
j
, u
i
) = g, f)
2. Bilinearity: f, g) =

i
g(u
i
) =

j

j
f(v
j
)
3. Positive deniteness: f, f) =

K 0 with equality i f = 0.
From 3 we can also derive:
1. f, g)
2
f, f)g, g)
Proof. a R, af + g, af + g) = a
2
f, f) + 2af, g) + g, g) 0. This implies that the quadratic
expression has a non-positive discriminant. Therefore, f, g)
2
f, f)g, g) 0
2. [f(x)[
2
= k(, x), f)
2
k(x, x)f, f), which implies that if f, f) = 0 then f is identically zero.
Now we have dened an inner product space , ). Complete it to give the Hilbert space.
Denition. For a (compact) A R
d
, and a Hilbert space H of functions f : A R, we say H is a
Reproducing Kernel Hilbert Space if k : A R, s.t.
1. k has the reproducing property, i.e., f(x) = f(), k(, x))
2. k spans H = spank(, x) : x A
1.3 Mercers Theorem
Another way to characterize a symmetric positive semi-denite kernel k is via the Mercers Theorem.
Theorem 1.1 (Mercers). Suppose k is a continuous positive semi-denite kernel on a compact set A, and
the integral operator T
k
: L
2
(A) L
2
(A) dened by
(T
k
f)() =
_
X
k(, x)f(x)dx
is positive semi-denite, that is, f L
2
(A),
_
X
k(u, v)f(u)f(v)dudv 0
Then there is an orthonormal basis
i
of L
2
(A) consisting of eigenfunctions of T
k
such that the correspond-
ing sequence of eigenvalues
i
are non-negative. The eigenfunctions corresponding to non-zero eigenvalues
are continuous on A and k(u, v) has the representation
k(u, v) =

i=1

i
(u)
i
(v)
where the convergence is absolute and uniform, that is,
lim
n
sup
u,v
[k(u, v)
n

i=1

i
(u)
i
(v)[ = 0
4 Reproducing Kernel Hilbert Spaces
To take an analogue in the nite case, that is, A = x
1
, . . . , x
n
. Let K
ij
= k(x
i
, x
j
), and f : A R
n
with
f
i
= f(x
i
). Then,
T
k
f =
n

i=1
k(, x
i
)f
i
f, f

Kf 0 K _ 0 K =

i
v
i
v

i
Hence,
k(x
i
, x
j
) = K
ij
= (V V

)
ij
=
n

k=1

k
v
ki
v
kj
=
n

k=1

k
(x
i
)
k
(x
j
)
k
(x
i
) = (v
k
)
i
We summarize several equivalent conditions on continuous, symmetric k dened on compact A:
1. Every Gram matrix is positive semi-denite.
2. T
k
is positive semi-denite.
3. k can be expressed as k(u, v) =

i
(u)
i
(v).
4. k is the reproducing kernel of an RKHS of functions on A.

You might also like