Bayesian Gaussian

Multivariate Normal Distribution

Brunero Liseo
Sapienza Universit`a di Roma,
brunero.liseo@uniroma1.it
February 10, 2014
1 / 22
Outline
2 / 22
Inference on N(, )
Let
X
1
, . . . , X
n
iid
N
p
(, ).
with density
f
x
(x; , ) =
1
(2)
p/2
||
1/2
exp
_
1
2
(x )
1
(x )
_
Likelihood is
L(, )
1
||
n/2
exp
_
1
2
n
i=1
(x
i
)
1
(x
i
)
_
3 / 22
Alternative expression of quadratic form
n
i=1
(x
i
)
1
(x
i
)
=
n
i=1
(x
i
x )
1
(x
i
x )
=
n
i=1
_
(x
i
x)
1
(x
i
x)
_
+n( x )
1
( x )
= tr
1
n
i=1
(x
i
x)(x
i
x)
+n( x )
1
( x )
= tr
1
S +n( x )
1
( x )
Then
L(, )
1
||
n/2
exp
_
1
2
_
n( x )
1
( x ) +tr
1
S
_
_
4 / 22
Conjugate prior
| N
p
(, c
1
),
that is
(|)
1
||
1/2
exp
_
c
2
( )
1
( )
_
IW
p
(, ),
then has an Inverse Wishart distribution,
()
1
||
(p+1)
2
exp
_
1
2
tr
1
1
_
5 / 22
Wishart distribution
(W
k
(m, )) has support the space of all positive denite
symmetric matrices.
We say that the square k-dimensional matrix V, positive denite,
has Wishart distribution with m dof and scale parameter ,positive
denite matrix, and we denote it by W
k
(m, ), if the density is
f(V) =
1
2
mk/2
k
(m/2)||
m/2
|V|
(mk1)/2
exp
_
1
2
tr
1
V
_
,
with
k
(u) =
k(k1)/4
k
i=1
_
u
1
2
(i 1)
_
, u >
k 1
2
.
6 / 22
Construction of a random Wishart matrix
Let (Z
1
, , Z
m
)
iid
N
k
(0, I); then the quantity
W =
m
i=1
Z
i
Z
i
has a Wishart distribution W
k
(m, I).
The diagonal element of W, say W
jj
follows a
2
m
distribution.
Starting from (Z
1
, , Z
m
)
iid
N
k
(0, ) we can obtain a more
general W.
7 / 22
Inverse Wishart
An Inverse Wishart r.v. (W
1
k
(m, ), (say IW) has support the
space of all symmetric positive denite matrices.
A IW r.v. describes the the distribution of the inverse of a
Wishart matrix.
In Bayesian statistics it is often used as the conjugate prior for
the covariance matrix of a multivariate Gaussian model.
8 / 22
Let V W
k
(m, ). Since V is pos. def. with prob. 1, it is easy
to compute the density function of Z = V
1
:
f(Z) =
|Z|
(m+k+1)/2
2
mk/2
k
(m/2)||
m/2
exp
_
1
2
tr
1
Z
1
_
.
Also
E(Z) =

1
mk 1
.
9 / 22
A useful Lemma
Lemma.
Let A and B be positive real numbers and let a and b be any real
numbers. Then
A( a)
2
+B( b)
2
= (A+B)
_

aA+bB
A+B
_
2
+
AB
A+B
(a b)
2
(1)
Proof; see later
10 / 22
(Multivariate version)
Let x, a, b vectors in R
k
and let A, B be symmetric matrices k k
s.t. (A+B)
1
exists. Then,
(x a)
A(x a) + (x b)
B(x b)
= (x c)
(A+B)(x c) + (a b)
A(A+B)
1
B(a b)
where
c = (A+B)
1
(Aa +Bb)
When x R the result is exactly (1)
11 / 22
Proof
(x a)
A(x a) + (x b)
B(x b)
= x
(A+B)x 2x
(Aa +Bb) +a
Aa +b
Bb
Add and remove c
(A+B)c,
(x c)
(A+B) (x c) +G,
where G = a
Aa +b
Bb c
(A+B)c. Also
c
(A+B)c = (Aa +Bb)
(A+B)
1
(Aa +Bb) =
(add and remove Ab in the rst and third factors)
12 / 22
[A(a b) + (A+B)b]
(A+B)
1
[(A+B)a B(a b)]
= (a b)
A(A+B)
1
B(a b)+(ab)
Aa+b
(A+B)ab
B(ab) =
(a b)
A(A+B)
1
B(a b) +a
Aa +b
Bb;
Therefore
G = (a b)
A(A+B)
1
B(a b) .
13 / 22
Dickeys Theorem
Theorem.
Let X be a k-dimensional random vector and Y be a scalar r.v.
such that
X | Y N
k
(, Y ), Y GI(a, b);
Then the marginal distribution of X is Multivariate Student
X St
k
_
2a, ,
b
a
_
.
In particular, setting a = /2 e b = 1/2, then
Y
1

2
; X St
k
(, , /).
Proof: Easy.
14 / 22
The Posterior
Using the Lemma, one gets
|, x N
p
(
, (c +n)
1
), with
=
c +n x
c +n
and |x IW
p
(+n,
), where
=
_
S +
1
+
nc
n +c
( x )
( x )
_
1
15 / 22
The hyperparameters
We need to specify the following parameters:
, the prior mean for , the most reasonaable estimate before
the experiment;
c; the degree of believe in your elicitation about ; smaller
values of c makes the prior less informative;
and m represent the hyper-parameters about
1
; they
can be elicitated by taking into account the moments of an
Inverse WIshart inversa: for example,
E() =

1
p m1
16 / 22
Non informative case
You get a noninformative prior if you set the Hyper-parameters
equal to zero
When
c 0,
1
= 0, = 0,
you get the Jereys prior
(, ) =
_
det (I(, ))
1
||
p+1
2
17 / 22
Consequences of the use of the Jereys prior
is positive denite and symmetric. Using the Spectral Dec.
Thm one can write as = H
DH, where H is a matrix whose

columns are the eigenvectors and D is the diagonal matrix with
eigenvalues in a non increasing order
H
H = I
p
D = diag (
1
, . . . ,
p
).
Then, assuming that all the eigenvalues are dierent,
()d = (H, D)I
[
1
>
2
>>
p
]
dHdD
With a change of variable,
(H, D) = (H
DH)
i<j
(
i
j
)
18 / 22
Then
J
(, ) =
J
(, H, D)
1
D|
p+1
2
i<j
(
i
j
)
This prior, without any clear reason, introduces a bias factor which
contributes to make the eigenvalues of far apart.
This is counter-intuitive since we use to consider similar the
eigenvalues.
This conclusion agrees with the classical result, where the
estimator S must be modied since it produces matrices whose
eigenvalues are too sparse ...
There exist alternative methods for obtaining objective priors for
and . Berger and Yang (Annals of Statistics, 1996) produce the
reference priors.
19 / 22
Gibbs sampling for N(, )
Also in the multivariate case, it can be useful and convenient
to adopt a computational approach rather then perform closed
form calculations.
this solution is particularly important when you are interested
in functions of and .
20 / 22
Full conditionals
We need to write down the two full conditionals that is
| , x
| , x.
The rst one is already known.
|, x N
p
(
, (c +n)
1
), (2)
The second one can be easily seen to be
|, x IW
p
_
m+n,
_
1 +
n
i=1
(x
i
)(x
i
)
__
(3)
21 / 22
Rcode
22 / 22

Bayesian Gaussian

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayesian Gaussian

Uploaded by

Copyright:

Available Formats

Multivariate Normal Distribution

Multivariate Normal Distribution

(A+B)c = (Aa +Bb)

DH, where H is a matrix whose

You might also like