Professional Documents
Culture Documents
State Space Representation of Gaussian Processes
State Space Representation of Gaussian Processes
Simo Särkkä
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 1 / 51
Contents
1 Gaussian Processes
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 2 / 51
Definition of Gaussian Process: Spatial Case
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 4 / 51
Definition of Gaussian Process: Temporal and
Spatial-Temporal Cases
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 5 / 51
Modeling with Gaussian processes
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 7 / 51
Fourier Transform
The Fourier transform of function f (x) : Rd 7→ R is
Z
F[f ](i ω) = f (x) exp(− i ω T x) dx.
Rd
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 10 / 51
Representations of Temporal Gaussian Processes
Moment representation in terms of mean and covariance function
m(t) = E[f(t)]
K(t, t 0 ) = E[(f(t) − m(t)) (f(t 0 ) − m(t 0 ))T ].
Spectral representation in terms of spectral density function
S(ωt ) = E[f̃(i ωt ) f̃ T (− i ωt )].
Path or state space representation as solution to a stochastic
differential equation:
df = A f dt + L dβ,
where f ← (f, df/dt, . . .) and β(t) is a vector of Wiener processes, or
equivalently, but more informally
df
= A f + L w,
dt
where w(t) is white noise.
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 12 / 51
Representations of Temporal Gaussian Processes
Sample functions
3
1
x(t)
−1
−2
0 1 2 3 4 5 6 7 8 9 10
time t
Covariance function Spectral density
1 2
0.8
1.5
0.6
S(ω)
C(τ)
1
0.4
0.5
0.2
0 0
−5 0 5 −5 0 5
τ = t−t’ ω
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 13 / 51
Scalar 1d Gaussian Processes
Example of Gaussian process: Ornstein-Uhlenbeck process
m(t) = 0
k(t, t 0 ) = exp(−λ|t − t 0 |)
Path representation: stochastic differential equation (SDE)
df (t) = −λ f (t) dt + dβ(t),
where β(t) is a Brownian motion, or more informally
df (t)
= −λ f (t) + w (t),
dt
where w (t) is a formal white noise process.
The equation has the solution
Z t
f (t) = exp(−λ t) f (0) + exp(−λ(t − s)) w (s) ds.
0
Has the covariance k(t, t 0 ) at stationary state.
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 14 / 51
Markov Property of Scalar 1d Gaussian Process
dm(t)
= −λ m(t)
dt
dP(t)
= −2λ P(t) + 2λ.
dt
Due to the Markov property, these statistics are sufficient for
computations, i.e., the full covariance k(t, t 0 ) is not needed.
The related inference algorithms, which utilize the Markov property
are the Kalman filter and Rauch-Tung-Striebel smoother.
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 15 / 51
Spectral Representation of 1d Gaussian Process
The spectral density of f(t) can be obtained by computing Fourier
transform of the covariance k(t, t 0 ) = k(t − t 0 ):
Z ∞
2λ
S(ω) = exp(−λ|τ |) exp(− i ω τ ) dτ = 2 .
−∞ ω + λ2
Alternatively, we can take Fourier transform of the original equation
df (t)/dt = −λf (t) + w (t), which yields
(i ω) f˜(i ω) = −λ f˜(i ω) + w̃ (i ω),
and further
w̃ (ω)
f˜(i ω) =
(i ω) + λ
The spectral density is by definition S(ω) = |f˜(i ω)|2 :
|w̃ (i ω)|2 2λ
S(ω) = = 2
((i ω) + λ)((− i ω) + λ) ω + λ2
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 16 / 51
Vector Valued 1d Gaussian Processes
Vector valued 1d Gaussian processes correspond to linear
time-invariant state space models with white noise input w(t):
df(t)
= A f(t) + L w(t).
dt
Can be solved in terms of the matrix exponential exp(t A):
Z t
f(t) = exp(t A) f(0) + exp((t − s) A) L w(s) ds.
0
The covariance function and spectral density at stationary state are
P∞ exp((t 0 − t) A)T , if t 0 ≥ t
0
K(t, t ) =
exp((t − t 0 ) A) P∞ , if t 0 < t.
S(ω) = (A + i ω I)−1 L Q LT (A − i ω I)−T .
where P∞ is the solution to the equation
A P∞ + P∞ AT + L Q LT = 0.
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 17 / 51
Converting Covariance Functions to State Space Models
or actually
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 20 / 51
Kalman Filter and Rauch-Tung-Striebel Smoother
Kalman filter and RTS smoother can be used for efficiently
computing posteriors of models in form
df(t)
= A f(t) + L w(t)
dt
yi = H f(ti ) + ei ,
fi = Ui fi−1 + vi
yi = H fi + ei
Many Gaussian process regression problems with differentiable
covariance function can be efficiently solved with KF & RTS
With n measurements, complexity of KF/RTS is O(n), when the
brute-force GP solution is O(n3 ).
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 21 / 51
Example: Matérn Covariance Function
21−ν √ τ ν √ τ
k(τ ) = σ 2 2ν Kν 2ν ,
Γ(ν) l l
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 22 / 51
Example: Matérn Covariance Function (cont.)
Example (1D Matérn covariance function (cont.))
The spectral density can be factored as
q
S(ω) = (p+1)
,
(λ + i ω) (λ − i ω)(p+1)
where ν = p + 1/2.
The transfer function of the corresponding stable part is
1
G (i ω) = .
(λ + i ω)(p+1)
For integer values of p (ν = 1/2, 3/2, . . .), we can expand this. For
example, if p = 0 (ν = 1/2), we get the Ornstein–Uhlenbeck process
df (t)
= −λ f (t) + w (t)
dt
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 23 / 51
Example: Matérn Covariance Function (cont.)
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 24 / 51
Example: Exponentiated Quadratic Covariance Function
We can factor the result into stable and unstable parts, and further
convert into an n-dimensional state space model.
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 25 / 51
Example: Comparison of Exponentiated Quadratic and
Matérn
Matérn (ν = 3/2)
f (t)
Squared exponential
t
Matérn
SE (exact)
C(t − t0 )
SE (n = 2)
S(ω) SE (n = 4)
SE (n = 6)
−3l 3l −2π 2π
Covariance Function Spectral Density
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 26 / 51
Inference in Practice
Conventional GP regression:
1 Evaluate the covariance function at the training and test set points.
2 Use GP regression formulas to compute the posterior process statistics.
3 Use the mean function as the prediction.
State-space GP regression:
1 Form the state space model.
2 Run Kalman filter through the measurement sequence.
3 Run RTS smoother through the filter results.
4 Use the smoother mean function as the prediction.
→ Matern 5/2 animation
→ EQ animation
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 27 / 51
Benefits of SDE Representation
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 28 / 51
Representations of Spatial Gaussian Processes
m(x) = E[f(x)]
K(x, x0 ) = E[(f(x) − m(x)) (f(x0 ) − m(x0 ))T ].
S(ω x ) = E[f̃(i ω x ) f̃ T (− i ω x )]
A f(x) = w (x),
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 30 / 51
Representations of Spatial Gaussian Processes
Sample Function
−1
−2
5
5
0
0
−5 −5
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 31 / 51
Stochastic Partial Differential Equation Representation of
Spatial Gaussian Process
∂ 2 f (x1 , x2 ) ∂ 2 f (x1 , x2 )
+ − λ2 f (x1 , x2 ) = w (x1 , x2 ),
∂x12 ∂x22
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 32 / 51
Spectral and Covariance Representation of Spatial
Gaussian Process
By taking Fourier transform of the example SPDE we get
w̃ (i ω1 , i ω2 )
f˜(i ω1 , i ω2 ) = 2 .
ω1 + ω22 + λ2
The spectral density is then
1
S(ω1 , ω2 ) =
(ω12 + ω22 + λ2 )2
By inverse Fourier transform we get the covariance function
|x − x0 |
k(x, x0 ) = K1 (λ|x − x0 |)
2λ
where K1 is the modified Bessel function.
A special case of Matern class covariance functions - so called Whittle
covariance function.
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 33 / 51
Converting Covariance Functions to SPDEs
Ax = F −1 [A(i ω)].
Ax f (x) = w (x),
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 34 / 51
Benefits of SPDE Representation
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 35 / 51
From Temporal to Spatio-Temporal Processes
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 37 / 51
Representations of Spatio-Temporal Gaussian Processes
Moment representation in terms of mean and covariance function
m(x, t) = E[f(x, t)]
K(x, x0 ; t, t 0 ) = E[(f(x, t) − m(x, t)) (f(x, t) − m(x, t 0 ))T ].
Spectral representation in terms of spectral density function
S(ω x , ωt ) = E[f̃(i ω x , i ωt ) f̃ T (− i ω x , − i ωt )].
As infinite-dimensional state space model or stochastic differential
equation (SDE):
df(x, t) = A f(x, t) dt + L dβ(x, t).
or more informally
∂f(x, t)
= A f(x, t) + L w(x, t).
∂t
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 38 / 51
Spatio-Temporal Gaussian SPDEs
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 39 / 51
GP Regression as Infinite Linear Model
f ∼ N (0, K0 )
y = H f + e,
m̂ = K0 HT (H K0 HT + Σ)−1 y
K̂ = K0 − K0 HT (H K0 HT + Σ)−1 H K0 .
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 40 / 51
GP Regression as Infinite Linear Model (cont.)
× H C0 (x, x0 ),
∂f(x, t)
= A f(x, t) + L w(x, t),
∂t
where A and L are linear operators in x-variable and w(·) is a
time-space white noise.
The mild solution to the equation is:
Z t
f(x, t) = U(t) f(x, 0) + U(t − s) L w(x, s) ds.
0
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 42 / 51
Infinite-Dimensional Kalman Filtering and Smoothing
q(i ω x )
S(ω x , ωt ) = .
b0 (i ω x ) + b1 (i ω x ) (i ωt ) + · · · + (i ωt )N
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 44 / 51
Conversion of Spatio-Temporal Covariance into
Infinite-Dimensional State Space Model (cont.)
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 45 / 51
Example: 2D Matérn covariance function
21−ν √ r ν √ r
k(r ) = σ 2 2ν Kν 2ν .
Γ(ν) l l
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 46 / 51
Example: 2D Matérn covariance function (cont.)
Example (2D Matérn covariance function (cont.))
p
The denominator roots are (iωt ) = ± λ2 − ||iω x ||2 , which gives
1
G (iω x , iωt ) = p (ν+d/2) .
iωt + λ2 − ||iω x ||2
Example realization:
x
t
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 47 / 51
Benefits of Stochastic Evolution Equation Representation
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 48 / 51
Conclusion
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 50 / 51
References
Hartikainen, J. and Särkkä, S. (2010). Kalman Filtering and Smoothing
Solutions to Temporal Gaussian Process Regression Models. Proc. of MLSP.
Lindgren, F., Rue, H., and Lindström, J. (2011). An explicit link between
Gaussian fields and Gaussian Markov random fields: the stochastic partial
differential equation approach. JRSS B, 73(4):423–498.
Särkkä , S. (2011). Linear operators and stochastic partial differential
equations in Gaussian process regression. Proc. of ICANN.
Särkkä, S. and Hartikainen, J. (2012). Infinite-Dimensional Kalman Filtering
Approach to Spatio-Temporal Gaussian Process Regression. AISTATS 2012.
Särkkä, S., Solin, A. and Hartikainen, J. (2013). Spatio-Temporal Learning
via Infinite-Dimensional Bayesian Filtering and Smoothing. IEEE Signal
Processing Magazine (to appear).
Solin, A., Särkkä, S. (2013). Hilbert Space Methods for Reduced-Rank
Gaussian Process Regression. Submitted.
S. Särkkä (2013). Bayesian Filtering and Smoothing. Cambridge University
Press. (Forthcoming).
Simo Särkkä (Aalto University) State Space Representation of GPs June 12th, 2013 51 / 51