Professional Documents
Culture Documents
2 Gaussian Process PDF
2 Gaussian Process PDF
2 Gaussian Process PDF
Gaussian Process
Sungjoon Choi
February 9, 2018
Seoul National University
Table of contents
1. Introduction
2. Gaussian process
1
Introduction
Introduction
2
Most of the contents are from
Prof. Songhwai Oh’s slides.
2
Introduction
3
Introduction
4
Introduction
5
Gaussian process
Gaussian process
6
Gaussian process
7
Gaussian process regression
8
Gaussian process regression
9
Weight space view
Linear regression
• f (x) = xT w
• y (x) = f (x) + where ∼ N (0, σn2 ).
• Suppose we have collected n input-output pairs D = {(xi , yi )}ni=1 .
• Define X = [xT T T
1 ; · · · ; xn ] . Then
n
Y
p(y|X , w) = p(yi |xi , w)
i=1
n
Y 1 (yi − xT
i w)
2
= p exp(− 2
)
2πσn2 2σn
i=1
1 1
= 2 n/2
exp(− 2 ky − X T wk2 )
(2πσn ) 2σ n
= N (y; X T w, σn2 I).
• Goal of linear regression is to find w such that
• ky − X T wk2 is minimized.
• Solution: ŵ = (XX T )−1 X y
10
Bayesian formulation
p(y|X , w)p(w)
p(w|y, X ) =
p(y|X )
11
Bayesian formulation
• Bayesian formulation:
p(y|X , w)p(w)
p(w|y, X ) =
p(y|X )
1 T T T 1 T −1
∝ exp (y − X w) (y − X w) exp − w Σ p w
2σ 2 2
n
1
∝ exp (w − w̄)T A(w − w̄)
2
where w̄ = σ12 A−1 X y and A = σ12 XX T + Σ−1 p .
n n
• Hence,
P(w|y, X ) = N (w̄, A−1 ).
• Computing the analytic form a posterior distribution is not alway
possible.
• In fact, there are not many cases and the priors that enable the
analytic posterior forms are known as conjugate priors.
12
Bayesian formulation
1 1
ŵMAP = ( XX T + Σ−1 −1
p ) Xy
σn2 σn2
13
Bayesian formulation
14
Kernel trick
• Then
1 T −1 T −1
f∗ |x∗ , X , y ∼ N φ(x ∗ ) A Φy, φ(x∗ ) A φ(x ∗ ) ,
σn2
where A = 1
σn2 ΦΦ
T
+ Σ−1
p and A ∈ R
N×N
.
• If N 1, then inverting A ∈ RN×N could be computationally
intractable.
15
Kernel trick
σn−2 A−1 Φ(K + σn2 I )(K + σn2 I )−1 = A−1 AΣp Φ(K + σn2 I )−1
σn−2 A−1 Φ = Σp Φ(K + σn2 I )−1
16
Kernel trick
• Predictive distribution of f∗ :
17
Kernel trick
f∗ |x∗ , X , y ∼ N (µ∗ , Σ∗ )
2 −1
where µ∗ = φT ∗ Σp Φ(K + σ I ) y and
2 −1 T
Σ∗ = φ T T
∗ Σp φ∗ − φ∗ Σp Φ(K + σn I ) Φ Σp φ∗
18
Kernel trick
f∗ |x∗ , X , y ∼ N (µ∗ , Σ∗ )
19
Function space view
Function space view
20
Function space view
g (x) = φ(x)T w
where w ∼ N (0, Σp ).
• Is g (x) a Gaussian process?
• Yes!
• E(g (x)) = φ(x)T E(w) = 0
• E(g (x)g (x0 )) = φ(x)T E(wwT )φ(x0 ) = φ(x)T Σp φ(x)0 = k(x, x0 )
• Hence, for [g (x1 ), . . . , g (xn )] are jointly Gaussian.
• Therefore, g (x) is a Gaussian process.
21
Function space view
22
Function space view
23
Function space view
By conditioning, we get
where
µ∗ = K (x∗ , X )K (X , X )−1 f
and
σ∗ = k(x∗ , x∗ ) − K (x∗ , X )K (X , X )−1 K (X , x∗ ).
24
Function space view
25
Function space view
By conditioning, we get
where
µ∗ = K (x∗ , X )(K (X , X ) + σn2 I )−1 y
and
σ∗ = k(x∗ , x∗ ) − K (x∗ , X )(K (X , X ) + σn2 I )−1 K (X , x∗ ).
26
Function space view
27
Comments on Gaussian process regression
28
Gaussian process latent variable
model (GPLVM)
Gaussian process latent variable model
29
Gaussian process latent variable model
30
Gaussian process latent variable model
• Notations
−y1 −
. n×d
• observed data: Y = ..− ∈ R
−yn −
−x1 −
. n×q
• latent data: X = ..− ∈ R
−xn −
• linear mapping matrix W ∈ Rd×q ⇒ Y = XW T
• a(i) : i-th row vector of A
• ai : i-th column vector of A
• YY T : matrix inner product ⇐ kernel trick can be used.
• Y T Y : covariance matrix
31
Gaussian process latent variable model
• Linear mapping
32
Gaussian process latent variable model
• Probabilistic PCA
• Prior distribution:
n
Y
p(X ) = N (x(i) |0, I )
i=1
• Likelihood:
n
Y
p(Y |X , W ) = N (y(i) |W x(i) , β −1 I )
i=1
• Marginal likelihood:
Z
p(Y |W ) = p(Y |X , W )p(X )dX
X
n
Y
= N (y(i) |0, WW T + β −1 I )
i=1
33
Gaussian process latent variable model
• Probabilistic PCA
• Marginal likelihood
n
Y
p(Y |W ) = N (y(i) |0, WW T + β −1 I )
i=1
34
Gaussian process latent variable model
• Probabilistic PCA
• Solution
∂ ln p(Y |W )
= 0 ⇒ Ŵ = Uq LV T
∂W
where Uq and Λq are first q eigenvectors and eigenvalues of Y T Y ,
L = (Λq − β −1 I )1/2 , and V is an arbitrary rotation matrix.
• Note that Uq is the solution of standard PCA.
• L = (Λq − β −1 I )1/2 is a scaling diagonal matrix.
35
Gaussian process latent variable model
• Nonlinear mapping
36
Gaussian process latent variable model
• Likelihood:
n
Y
p(Y |X , W ) = N (y(i) |W x(i) , β −1 I )
i=1
• Marginal likelihood:
Z
p(Y |X ) = p(Y |X , W )p(W )dW
W
d
Y
= N (yj |0, XX T + β −1 I )
j=1
37
Gaussian process latent variable model
• Likelihood:
n
Y
p(Y |X , W ) = N (y(i) |W x(i) , β −1 I )
i=1
• Marginal likelihood:
Z
p(Y |W ) = p(Y |X , W )p(X )dX
X
n
Y
= N (y(i) |0, WW T + β −1 I )
i=1
38
Gaussian process latent variable model
• Likelihood:
n
Y
p(Y |X , W ) = N (y(i) |W x(i) , β −1 I )
i=1
• Marginal likelihood:
Z
p(Y |X ) = p(Y |X , W )p(W )dW
W
d
Y
= N (yj |0, XX T + β −1 I )
j=1
39
Gaussian process latent variable model
40
Gaussian process latent variable model
41
Gaussian process latent variable model
42
Gaussian process latent variable model
p(Y |X ) = N (Y |0, K )
43
Gaussian process latent variable model
• GPLVM likelihood:
n
Y
p(Y |X ) = N (y(i) |0, K )
i=1
44
Gaussian process latent variable model
• ln p(Y |X ) = − d2 ln |K | − nd
2
ln 2π − 12 trace(K −1 YY T )
• Partial derivatives:
∂L ∂L ∂K
• ∂x
= ∂K ∂x
∂L ∂L ∂K
• ∂θ
= ∂K ∂θ
45
Questions?
45
Backup slides
References i