Professional Documents
Culture Documents
4 Summary
4 Summary
4 Summary
Summary
Sungjoon Choi
February 9, 2018
Seoul National University
Table of contents
1. Measure theory
2. Probability
3. Random variable
4. Random process
5. Functional analysis
6. Gaussian process
2
Measure theory
• properties of σ-field B
1. U ∈ B (entire set is included.)
2. Bi ∈ B ⇒ ∩∞ i=1 Bi ∈ B (closed under countable intersection)
3. 2U is a σ-field.
4. B is either finite or uncountable, never denumerable.
5. B and C are σ-fields ⇒ B ∩ C is a σ-field but B ∪ C is not.
• B = {∅, {a}, {b, c}, {a, b, c}}
• C = {∅, {a, b}, {c}, {a, b, c}}
• B ∩ C = {∅, {a, b, c}}
(this is a σ-field)
• B ∪ C = {∅, {a}, {c}, {a, b}, {b, c}, {a, b, c}}
(this is not a σ-field as {a, c} = {a} ∩ {c} is not included.)
3
Probability
Probability
4
Probability
• Definition (probability)
• P defined on a measurable space (Ω, A) is a set function
P ∶ A → [0, 1] such that (probability axioms).
1. P(∅) = 0
2. P(A) ≥ 0 ∀A ⊆ Ω
3. For disjoint sets Ai and Aj ⇒ P(∪ki=1 Ai ) = ∑ki=1 P(Ai ) (countable
additivity)
4. P(Ω) = 1
5
Random variable
Random variable
• random variable:
A random variable is a real-valued function defined on Ω that is
measurable w.r.t. the probability space (Ω, A, P) and the Borel
measurable space (R, B), i.e.,
6
Random variable
7
Random variable
8
Random process
Random process
• random process Xt (w ), t ∈ I :
1. random sequence, random function, or random signal:
Xt ∶ Ω → the set of all sequences or functions
2. indexed family of infinite number of random variables:
Xt ∶ I → set of all random variables defined on Ω
3. Xt ∶ Ω × I → R
4. If t is fixed, then a random process becomes a random variable.
9
Random process
10
Functional analysis
Functional analysis
11
Functional analysis
• Definition (Kernel)
• Let X be a non-empty set. A function k ∶ X × X → R is a kernel if
there exists a Hilbert space H and a map φ ∶ X → H such that
∀x, x ′ ∈ X ,
k(x, x ′ ) ≜ ⟨φ(x), φ(x ′ )⟩H .
12
Functional analysis
• Theorem (Mercer)
• Let (X , µ) be a finite measurable space and k ∈ L∞ (X 2 , µ2 ) be a
kernel such that Tk ∶ L2 (X , µ) → L2 (X , µ) is positive definite.
• Let φi ∈ L2 (X , µ) be the normalized eigenfunctions of Tk associated
with the eigenvalues λi > 0. Then:
1. The eigenvalues {λi }∞
i=1 ∣ are absolutely summable.
2.
∞
k(x, x ′ ) = ∑ λi φi (x)φi (x ′ )
i=1
holds µ2 almost everywhere, where the series converges absolutely
and uniformly µ2 almost everywhere.
13
Functional analysis
14
Functional analysis
∞
f (x) = ⟨f , ∑ λi φi (⋅)φi (x)⟩
i=1 H
∞
= ∑ λi ⟨f , φi (⋅)⟩H φi (x)
i=1
∞
= ∑ λ̄i φi (x)
i=1
15
Gaussian process
Gaussian process
16
Function space view
⎡f (x )⎤
⎢ 1 ⎥
⎢ ⎥
Let f (x) be a (zero-mean) Gaussian process. Then f (X) = ⎢⎢ ⋮ ⎥⎥ ∈ Rn
⎢ ⎥
⎢f (xn )⎥
⎣ ⎦
and f (x∗ ) ∈ R are jointly Gaussian, i.e.,
⎡f (x1 )⎤ ⎡ k(x1 , x∗ )⎤⎥⎞
⎢
⎢
⎥
⎥ ⎛ ⎢⎢k(x1 , x1 ) ⋯ k(x1 , xn )
⎥
⎢ ⋮ ⎥ ⎜ ⎢ ⋮ ⋱ ⋮ ⋮ ⎥⎟
⎢ ⎥ ⎜ ⎢ ⎥⎟ .
⎢f (x )⎥ ∼ N ⎜0, ⎢k(x , x ) ⋯ k(xn , xn ) k(xn , x∗ )⎥⎥⎟
⎢ n ⎥ ⎜ ⎢ ⎟
⎢ ⎥ ⎝ ⎢⎢k(x∗ , x1 )
n 1
⎥
⎢f (x∗ )⎥
⎣ ⎦ ⎣ ⋯ k(x∗ , xn ) k(x∗ , x∗ )⎥⎦⎠
17
Function space view
f K (X , X ) K (X , x∗ )
[ ] ∼ N (0, [ ]) .
f∗ K (x∗ , X ) k(x∗ , x∗ )
18
Function space view
By conditioning, we get
where
µ∗ = K (x∗ , X )K (X , X )−1 f
and
σ∗ = k(x∗ , x∗ ) − K (x∗ , X )K (X , X )−1 K (X , x∗ ).
19
Function space view
y K (X , X ) + σn2 I K (X , x∗ )
[ ] ∼ N (0, [ ]) .
f∗ K (x∗ , X ) k(x∗ , x∗ )
20
Function space view
By conditioning, we get
where
µ∗ = K (x∗ , X )(K (X , X ) + σn2 I )−1 y
and
σ∗ = k(x∗ , x∗ ) − K (x∗ , X )(K (X , X ) + σn2 I )−1 K (X , x∗ ).
21
Comments on Gaussian process regression
22
Summary of Variational Inference
Summary of Variational Inference
23
Summary of Variational Inference
• Derivation:
ln p(D) = ∫ ln p(D)q(w ∣θ)dw
p(D)p(w ∣D)
= ∫ q(w ∣θ) ln dw
p(w ∣D)
p(w , D)
=∫ q(w ∣θ) ln dw
p(w ∣D)
p(D∣w )p(w )
=∫ q(w ∣θ) ln dw
p(w ∣D)
q(w ∣θ)p(D∣w )p(w )
=∫ q(w ∣θ) ln dw
q(w ∣θ)p(w ∣D)
q(w ∣θ) p(D∣w )p(w )
=∫ q(w ∣θ) ln dw + ∫ q(w ∣θ) ln dw
p(w ∣D) q(w ∣θ)
= DKL (q(w ∣θ)∣∣p(w ∣D)) + F[q]
where F[q] is the variational free energy or ELBO.
24
Summary of Variational Inference
25
Stein Variational Gradient
Descent
Stein Variational Gradient Descent
where
1 n
φ̂∗ (x) = ∑ [k(xj , x)∇xjl log p(xj ) + ∇xjl k(xj , x)] .
l l l
n j=1
26
Stein Variational Gradient Descent
1 n
φ̂∗ (x) = ∑ [k(xj , x)∇xjl log p(xj ) + ∇xjl k(xj , x)] .
l l l
n j=1
27
Questions?
27
Must read