Professional Documents
Culture Documents
GP Gpss15 Session1
GP Gpss15 Session1
GP Gpss15 Session1
Neil D. Lawrence
GPSS
14th September 2015
Book
?
Outline
(y − µ)2
!
1
p(y|µ, σ ) = √
2
exp −
2πσ2 2σ2
4
= N y|µ, σ2
2
p(h|µ, σ2 )
0
0 1 2
h, height/m
(y − µ)2
!
1
N y|µ, σ2 = √ exp −
2πσ 2 2σ2
σ2 is the variance of the density and µ is
the mean.
Two Important Gaussian Properties
Sum of Gaussians
Sum of Gaussians
Sum of Gaussians
Sum of Gaussians
Scaling a Gaussian
Scaling a Gaussian
Scaling a Gaussian
2 data points
best fit line
y
50 60 70 80 90 100
x
A linear regression between x and y.
Regression Examples
4 y = mx + c
3
y
0
0 1 2 3 4 5
x
5
4
c y = mx + c
3
y
m
1
0
0 1 2 3 4 5
x
5
4
c y = mx + c
3
y
m
1
0
0 1 2 3 4 5
x
5
4
c y = mx + c
3
y
m
1
0
0 1 2 3 4 5
x
5
4 y = mx + c
3
y
0
0 1 2 3 4 5
x
5
4 y = mx + c
3
y
0
0 1 2 3 4 5
x
5
4 y = mx + c
3
y
0
0 1 2 3 4 5
x
y = mx + c
point 1: x = 1, y = 3
3=m+c
point 2: x = 3, y = 1
1 = 3m + c
point 3: x = 2, y = 2.5
2.5 = 2m + c
6 A PHILOSOPHICAL ESSAY ON PROBABILITIES.
point 1: x = 1, y = 3
3 = m + c + 1
point 2: x = 3, y = 1
1 = 3m + c + 2
point 3: x = 2, y = 2.5
2.5 = 2m + c + 3
Underdetermined System
y
y1 = mx1 + c 2
1
0
0 1 2 3
x
Underdetermined System
y
x 2
1
0
0 1 2 3
x
Underdetermined System
y
2
1
0
0 1 2 3
x
Underdetermined System
y
2
1
0
0 1 2 3
x
Underdetermined System
y
2
1
0
0 1 2 3
x
Underdetermined System
y
2
1
0
0 1 2 3
x
Underdetermined System
y
2
1
0
0 1 2 3
x
Underdetermined System
y
2
1
0
0 1 2 3
x
Underdetermined System
y
2
1
0
0 1 2 3
x
Underdetermined System
y
2
1
0
0 1 2 3
x
Underdetermined System
y
c ∼ N (0, 4) , 2
1
we find a distribution of solu- 0
tions. 0 1 2 3
x
Probability for Under- and Overdetermined
yi = w> xi,: + i
p(y|c)p(c)
p(c|y) =
p(y)
Bayes Update
2 p(c) = N (c|0, α1 )
0
-3 -2 -1 0 1 2 3 4
Figure: A Gaussian prior combinesc with a Gaussian likelihood for a
Gaussian posterior.
Bayes Update
2 p(c) = N (c|0, α1 )
p(y|m, c, x, σ2 ) = N y|mx + c, σ2
0
-3 -2 -1 0 1 2 3 4
Figure: A Gaussian prior combinesc with a Gaussian likelihood for a
Gaussian posterior.
Bayes Update
2 p(c) = N (c|0, α1 )
p(y|m, c, x, σ2 ) = N y|mx + c, σ2
1 p(c|y, m, x, σ2 ) =
y−mx
N c| 1+σ2 /α1 , (σ−2 + α−1
1
)−1
0
-3 -2 -1 0 1 2 3 4
Figure: A Gaussian prior combinesc with a Gaussian likelihood for a
Gaussian posterior.
Stages to Derivation of the Posterior
yi = w> xi,: + i
Multivariate Regression Likelihood
yi = w> xi,: + i
yi = w> xi,: + i
1 1 >
p(w) = p exp − w w
(2πα) 2 2α
Two Dimensional Gaussian
p(w)
p(h)
h/m w/kg
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
Joint Distribution
p(h)
w/kg
p(w)
h/m
p(h, w) = p(h)p(w)
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Sampling Two Dimensional Variables
Marginal Distributions
Joint Distribution
p(h)
w/kg
p(w)
h/m
Independent Gaussians
p(w, h) = p(w)p(h)
Independent Gaussians
1 (w − µ1 )2 (h − µ2 )2
1
p(w, h) = q exp − +
2 σ 2 σ 2
q
2 2
2πσ1 2πσ2 1 2
Independent Gaussians
1 1
p(y) = 1
exp − (y − µ) > −1
D (y − µ)
|2πD| 2 2
Correlated Gaussian
1 1
p(y) = 1
exp − (y − µ) > −1
D (y − µ)
|2πD| 2 2
Correlated Gaussian
1 1 >
p(y) = 1
exp − (R y − R µ)
> > −1
D (R>
y − R>
µ)
|2πD| 2 2
Correlated Gaussian
1 1
p(y) = 1
exp − (y − µ) >
RD−1 >
R (y − µ)
|2πD| 2 2
this gives a covariance matrix:
1 1
p(y) = 1
exp − (y − µ) > −1
C (y − µ)
|2πC| 2 2
this gives a covariance matrix:
C = RDR>
Recall Univariate Gaussian Properties
n
n n
X X X
yi ∼ N µi , σ2i
i=1 i=1 i=1
Recall Univariate Gaussian Properties
n
n n
X X X
yi ∼ N µi , σ2i
i=1 i=1 i=1
n
n n
X X X
yi ∼ N µi , σ2i
i=1 i=1 i=1
n
n n
X X X
yi ∼ N µi , σ2i
i=1 i=1 i=1
wy ∼ N wµ, w2 σ2
Multivariate Consequence
I If
x ∼ N µ, Σ
Multivariate Consequence
I If
x ∼ N µ, Σ
I And
y = Wx
Multivariate Consequence
I If
x ∼ N µ, Σ
I And
y = Wx
I Then
y ∼ N Wµ, WΣW>
Sampling a Function
Multi-variate Gaussians
1
2
0.8
1
0.6
0
fi
j
0.4
-1
0.2
-2
0 5 10 15 20 25 0
(a) A 25 dimensional
i correlated ran- (b) colormap ishowing correlations
dom variable (values ploted against between dimensions.
index)
1
2
0.8
1
0.6
0
fi
j
0.4
-1
0.2
-2
0 5 10 15 20 25 0
(a) A 25 dimensional
i correlated ran- (b) colormap ishowing correlations
dom variable (values ploted against between dimensions.
index)
1
2
0.8
1
0.6
0
fi
0.4
-1
0.2
-2
0 5 10 15 20 25 0
(a) A 25 dimensional
i correlated ran- (b) colormap showing correlations
dom variable (values ploted against between dimensions.
index)
1
2
0.8
1
0.6
0
fi
0.4
-1
0.2
-2
0 5 10 15 20 25 0
(a) A 25 dimensional
i correlated ran- (b) colormap showing correlations
dom variable (values ploted against between dimensions.
index)
1
2
0.8
1
0.6
0
fi
0.4
-1
0.2
-2
0 5 10 15 20 25 0
(a) A 25 dimensional
i correlated ran- (b) colormap showing correlations
dom variable (values ploted against between dimensions.
index)
1
2
0.8
1
0.6
0
fi
0.4
-1
0.2
-2
0 5 10 15 20 25 0
(a) A 25 dimensional
i correlated ran- (b) colormap showing correlations
dom variable (values ploted against between dimensions.
index)
1
2
0.8
1
0.6
0
fi
0.4
-1
0.2
-2
0 5 10 15 20 25 0
(a) A 25 dimensional
i correlated ran- (b) colormap showing correlations
dom variable (values ploted against between dimensions.
index)
1 1 0.96587
0
fi
-1 0.96587 1
-2
0 5 10 15 20 25
(a) A 25 dimensional
i correlated ran- (b) correlation between f1 and f2 .
dom variable (values ploted against
index)
1
1 0.96587
0
f1
0.96587 1
-1
-1 0 1
f2
1
1 0.96587
0
f1
0.96587 1
-1
-1 0 1
f2
1
1 0.96587
0
f1
0.96587 1
-1
-1 0 1
f2
1
1 0.96587
0
f1
0.96587 1
-1
-1 0 1
f2
1
1 0.57375
0
f1
0.57375 1
-1
-1 0 1
f5
1
1 0.57375
0
f1
0.57375 1
-1
-1 0 1
f5
1
1 0.57375
0
f1
0.57375 1
-1
-1 0 1
f5
1
1 0.57375
0
f1
0.57375 1
-1
-1 0 1
f5
µ = K∗,f K−1
f,f f
kx − x0 k22
k (x, x ) = α exp −
0
2`2
I Covariance matrix is
built using the inputs to
the function x.
I For the example above it
was based on Euclidean
distance.
I The covariance function
is also know as a kernel.
Covariance Functions
Where did this covariance matrix come from?
kx − x0 k22
k (x, x ) = α exp −
0
2`2
I Covariance matrix is
3
built using the inputs to
2
the function x.
1
I For the example above it 0
was based on Euclidean -1
distance. -2
I The covariance function -3
-1 -0.5 0 0.5 1
is also know as a kernel.
Covariance Functions
Where did this covariance matrix come from?
2
||xi −x j ||
k xi , x j = α exp − 2`2
x1 = −3.0, x1 = −3.0
(−3.0−−3.0)2
k1,1 = 1.00 × exp − 2×2.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.00
x1 = −3.0, x1 = −3.0
(−3.0−−3.0)2
k1,1 = 1.00 × exp − 2×2.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.00
x2 = 1.20, x1 = −3.0
(1.20−−3.0)2
k2,1 = 1.00 × exp − 2×2.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.00
x2 = 1.20, x1 = −3.0
0.110
(1.20−−3.0)2
k2,1 = 1.00 × exp − 2×2.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.00 0.110
x2 = 1.20, x1 = −3.0
0.110
(1.20−−3.0)2
k2,1 = 1.00 × exp − 2×2.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.00 0.110
x2 = 1.20, x2 = 1.20
0.110
(1.20−1.20)2
k2,2 = 1.00 × exp − 2×2.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.00 0.110
x2 = 1.20, x2 = 1.20
0.110 1.00
(1.20−1.20)2
k2,2 = 1.00 × exp − 2×2.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.00 0.110
x3 = 1.40, x1 = −3.0
0.110 1.00
(1.40−−3.0)2
k3,1 = 1.00 × exp − 2×2.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.00 0.110
x3 = 1.40, x1 = −3.0
0.110 1.00
(1.40−−3.0)2
k3,1 = 1.00 × exp − 2×2.002 0.0889
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x1 = −3.0
0.110 1.00
(1.40−−3.0)2
k3,1 = 1.00 × exp − 2×2.002 0.0889
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x2 = 1.20
0.110 1.00
(1.40−1.20)2
k3,2 = 1.00 × exp − 2×2.002 0.0889
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x2 = 1.20
0.110 1.00
(1.40−1.20)2
k3,2 = 1.00 × exp − 2×2.002 0.0889 0.995
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x2 = 1.20
0.110 1.00 0.995
(1.40−1.20)2
k3,2 = 1.00 × exp − 2×2.002 0.0889 0.995
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x3 = 1.40
0.110 1.00 0.995
(1.40−1.40)2
k3,3 = 1.00 × exp − 2×2.002 0.0889 0.995
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x3 = 1.40
0.110 1.00 0.995
(1.40−1.40)2
k3,3 = 1.00 × exp − 2×2.002 0.0889 0.995 1.00
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x3 = 1.40
(1.40−1.40)2
k3,3 = 1.00 × exp − 2×2.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
x1 = −3, x1 = −3
(−3−−3)2
k1,1 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.0
x1 = −3, x1 = −3
(−3−−3)2
k1,1 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.0
x2 = 1.2, x1 = −3
(1.2−−3)2
k2,1 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.0
x2 = 1.2, x1 = −3 0.11
(1.2−−3)2
k2,1 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.0 0.11
x2 = 1.2, x1 = −3 0.11
(1.2−−3)2
k2,1 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.0 0.11
(1.2−1.2)2
k2,2 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.0 0.11
(1.2−1.2)2
k2,2 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.0 0.11
(1.4−−3)2
k3,1 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
1.0 0.11
0.089
(1.4−−3)2
k3,1 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
0.089
(1.4−−3)2
k3,1 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
0.089
(1.4−1.2)2
k3,2 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
0.089 1.0
(1.4−1.2)2
k3,2 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
0.089 1.0
(1.4−1.2)2
k3,2 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
0.089 1.0
(1.4−1.4)2
k3,3 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
2
||xi −x j ||
k xi , x j = α exp − 2`2
x4 = 2.0, x4 = 2.0
(2.0−2.0)2
k4,4 = 1.0 × exp − 2×2.02
2
||xi −x j ||
k xi , x j = α exp − 2`2
x1 = −3.0, x1 = −3.0
(−3.0−−3.0)2
k1,1 = 4.00 × exp − 2×5.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
4.00
x1 = −3.0, x1 = −3.0
(−3.0−−3.0)2
k1,1 = 4.00 × exp − 2×5.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
4.00
x2 = 1.20, x1 = −3.0
(1.20−−3.0)2
k2,1 = 4.00 × exp − 2×5.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
4.00
x2 = 1.20, x1 = −3.0
2.81
(1.20−−3.0)2
k2,1 = 4.00 × exp − 2×5.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
4.00 2.81
x2 = 1.20, x1 = −3.0
2.81
(1.20−−3.0)2
k2,1 = 4.00 × exp − 2×5.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
4.00 2.81
x2 = 1.20, x2 = 1.20
2.81
(1.20−1.20)2
k2,2 = 4.00 × exp − 2×5.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
4.00 2.81
x2 = 1.20, x2 = 1.20
2.81 4.00
(1.20−1.20)2
k2,2 = 4.00 × exp − 2×5.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
4.00 2.81
x3 = 1.40, x1 = −3.0
2.81 4.00
(1.40−−3.0)2
k3,1 = 4.00 × exp − 2×5.002
2
||xi −x j ||
k xi , x j = α exp − 2`2
4.00 2.81
x3 = 1.40, x1 = −3.0
2.81 4.00
(1.40−−3.0)2
k3,1 = 4.00 × exp − 2×5.002 2.72
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x1 = −3.0
2.81 4.00
(1.40−−3.0)2
k3,1 = 4.00 × exp − 2×5.002 2.72
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x2 = 1.20
2.81 4.00
(1.40−1.20)2
k3,2 = 4.00 × exp − 2×5.002 2.72
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x2 = 1.20
2.81 4.00
(1.40−1.20)2
k3,2 = 4.00 × exp − 2×5.002 2.72 4.00
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x2 = 1.20
2.81 4.00 4.00
(1.40−1.20)2
k3,2 = 4.00 × exp − 2×5.002 2.72 4.00
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x3 = 1.40
2.81 4.00 4.00
(1.40−1.40)2
k3,3 = 4.00 × exp − 2×5.002 2.72 4.00
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x3 = 1.40
2.81 4.00 4.00
(1.40−1.40)2
k3,3 = 4.00 × exp − 2×5.002 2.72 4.00 4.00
2
||xi −x j ||
k xi , x j = α exp − 2`2
x3 = 1.40, x3 = 1.40
(1.40−1.40)2
k3,3 = 4.00 × exp − 2×5.002
I Basis function
φ(x)
0.5
maps data into a
“feature space” in
which a linear sum 0
is a non linear -8 -6 -4 -2 0 2 4 6 8
function. x
Figure: A set of radial basis functions with width
` = 2 and location parameters µ = [−4 0 4]> .
Basis Function Representations
w = [w1 , . . . , wm ]> .
Functions derived 2
using: 1
0
f (x)
m
-1
X
f (x) = wk φk (x),
k=1
-2
where elements of w -8 -6 -4 -2 0 2 4 6 8
are independently x
sampled from a Figure: Functions sampled using the basis set from
Gaussian density, figure 3. Each line is a separate sample, generated
by a weighted sum of the basis set. The weights, w
wk ∼ N (0, α) . are sampled from a Gaussian density with variance
α = 1.
Direct Construction of Covariance Matrix
f = Φw.
Direct Construction of Covariance Matrix
f = Φw.
w ∼ N (0, αI)
Direct Construction of Covariance Matrix
f = Φw.
w ∼ N (0, αI)
f = Φw.
w ∼ N (0, αI)
f is Gaussian distributed.
Expectations
I We have
hfi = Φ hwi .
I We have
hfi = Φ hwi .
I Prior mean of w was zero giving
hfi = 0.
I We have
hfi = Φ hwi .
I Prior mean of w was zero giving
hfi = 0.
I Prior covariance of f is
D E
K = ff> − hfi hfi>
D E D E
ff> = Φ ww> Φ> ,
giving
K = αΦΦ> .
or in sum notation
m
X
k xi , x j = α φk (xi ) φk x j
k=1
Covariance between Two Points
or in sum notation
m
X
k xi , x j = α φk (xi ) φk x j
k=1
or in sum notation
m
X
k xi , x j = α φk (xi ) φk x j
k=1
xi − µk 2 + x j − µk 2
m
X
k xi , x j = α exp − .
2`2
k=1
Covariance Functions
2
x − µk
2
φk (x) = exp −
`2
−1
µ = 0
1
Covariance Functions
3
2 2
x − µk
2
φk (x) = exp − 1
`2
0
-1
−1
-2
µ = 0
-3
1
-3 -2 -1 0 1 2 3
Selecting Number and Location of Basis
I Need to choose
1. location of centers
2. number of basis functions
Restrict analysis to 1-D input, x.
I Consider uniform spacing over a region:
k xi , x j = αφk (xi )> φk (x j )
Selecting Number and Location of Basis
I Need to choose
1. location of centers
2. number of basis functions
Restrict analysis to 1-D input, x.
I Consider uniform spacing over a region:
m
X
k xi , x j = α φk (xi )φk (x j )
k=1
Selecting Number and Location of Basis
I Need to choose
1. location of centers
2. number of basis functions
Restrict analysis to 1-D input, x.
I Consider uniform spacing over a region:
m
(xi − µk )2 (x j − µk )2
!
X
k xi , x j = α exp − exp −
2`2 2`2
k=1
Selecting Number and Location of Basis
I Need to choose
1. location of centers
2. number of basis functions
Restrict analysis to 1-D input, x.
I Consider uniform spacing over a region:
m
(xi − µk )2 (x j − µk )2
X
k xi , x j = α exp −
−
2`2 2`2
k=1
Selecting Number and Location of Basis
I Need to choose
1. location of centers
2. number of basis functions
Restrict analysis to 1-D input, x.
I Consider uniform spacing over a region:
xi + x2j − 2µk xi + x j + 2µ2k
2
m
X
k xi , x j = α exp − ,
2` 2
k=1
Uniform Basis Functions
µk = a + ∆µ · (k − 1).
Uniform Basis Functions
µk = a + ∆µ · (k − 1).
m
X x2i + x2j
k xi , x j =α ∆µ
0
exp −
2`2
k=1
2 a + ∆µ · (k − 1) xi + x j + 2 a + ∆µ · (k − 1) 2
!
− .
2`2
Uniform Basis Functions
µk = a + ∆µ · (k − 1).
m
X x2i + x2j
k xi , x j =α ∆µ
0
exp −
2`2
k=1
2 a + ∆µ · (k − 1) xi + x j + 2 a + ∆µ · (k − 1) 2
!
− .
2`2
I Take
µ1 = a and µm = b so b = a + ∆µ · (m − 1)
Infinite Basis Functions
I Take
µ1 = a and µm = b so b = a + ∆µ · (m − 1)
I This implies
b − a = ∆µ(m − 1)
Infinite Basis Functions
I Take
µ1 = a and µm = b so b = a + ∆µ · (m − 1)
I This implies
b − a = ∆µ(m − 1)
and therefore
b−a
m= +1
∆µ
Infinite Basis Functions
I Take
µ1 = a and µm = b so b = a + ∆µ · (m − 1)
I This implies
b − a = ∆µ(m − 1)
and therefore
b−a
m= +1
∆µ
I Take limit as ∆µ → 0 so m → ∞
Infinite Basis Functions
I Take
µ1 = a and µm = b so b = a + ∆µ · (m − 1)
I This implies
b − a = ∆µ(m − 1)
and therefore
b−a
m= +1
∆µ
I Take limit as ∆µ → 0 so m → ∞
2 2
Z b x2i + x2j 2 µ − 12 xi + x j − 12 xi + x j !
k(xi , x j ) = α 0
exp − + dµ,
a 2`2 2`2
√
where α = α0 π`2 .
Infinite Feature Space
kx − x0 k22
k (x, x ) = α exp −
0
2`2
I Covariance matrix is
built using the inputs to
the function x.
I For the example above it
was based on Euclidean
distance.
I The covariance function
is also know as a kernel.
Covariance Functions
Where did this covariance matrix come from?
kx − x0 k22
k (x, x ) = α exp −
0
2`2
I Covariance matrix is
3
built using the inputs to
2
the function x.
1
I For the example above it 0
was based on Euclidean -1
distance. -2
I The covariance function -3
-1 -0.5 0 0.5 1
is also know as a kernel.
Covariance Functions
2
x − µk
2
φk (x) = exp −
`2
−1
µ = 0
1
Covariance Functions
3
2 2
x − µk
2
φk (x) = exp − 1
`2
0
-1
−1
-2
µ = 0
-3
1
-3 -2 -1 0 1 2 3
References I
P. S. Laplace. Essai philosophique sur les probabilités. Courcier, Paris, 2nd edition, 1814. Sixth edition of 1840 translated
and repreinted (1951) as A Philosophical Essay on Probabilities, New York: Dover; fifth edition of 1825 reprinted
1986 with notes by Bernard Bru, Paris: Christian Bourgois Éditeur, translated by Andrew Dale (1995) as
Philosophical Essay on Probabilities, New York:Springer-Verlag.
R. M. Neal. Bayesian Learning for Neural Networks. Springer, 1996. Lecture Notes in Statistics 118.
C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006.
[Google Books] .
C. K. I. Williams. Computation with infinite neural networks. Neural Computation, 10(5):1203–1216, 1998.