Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Definitions, Notations and Properties of the

Multivariate Normal Distribution


 
X1
Definition 1: A random vector X =  ...  taking values Rp is said to have a p-variate
 
Xp
(or p-dimensional) Normal distribution, which will be denoted by X ∼ N p , if every linear
combination of components of X, given by ℓ′ X for some ℓ ∈ Rp , has a univariate Normal
distribution.
Thus by taking ℓ′ = (0, . . . , 0, 1, 0 . . . , 0) with 1 in the i-th place and 0 elsewhere, it follows
that if X ∼ N p , then ∀i = 1, . . . , p its i-th component Xi has a finite mean (say) µi and

a finite variance (say) σii . Since |Cov(Xi , Xj )| ≤ σii σjj , it exists, and thus denote it by
   
µ1 σ11 · · · σ1p
 ..   .. ..  respectively denote the p × 1 mean-vector
σij . Let µ =  .  and Σ =  . σij . 
µp σp1 · · · σpp
and p × p variance-covariance (or dispersion) matrix of X. Then the component linear
combination ℓ′ X ∼ N1 (ℓ′ µ, ℓ′ Σℓ).
Definition 2: The m.g.f. of a p×1 random vector X is the function MX (t) of t for t ∈ Rp ,
given by MX (t) = E[exp(t′ X)], provided of course the expectation exists.
Thus if X ∼ N p , since for every t ∈ Rp , t′ X ∼ N1 (t′ µ, t′ Σt), t′ X has the m.g.f.
 
′ ′ 1 2 ′
Mt′ X (s) = E[exp(st X)] = exp st µ + s t Σt .
2

Thus MX (t) exists as it is given by


 
1 ′ ′
MX (t) = Mt′ X (1) = exp t µ + t Σt . (1)
2

Since the m.g.f. completely determines a distribution, and the m.g.f. of an N p depends
only on its mean-vector µ and dispersion-matrix Σ, a p-variate Normal distribution will be
completely specified by writing X ∼ N p (µ, Σ), which will be understood to have an m.g.f.
as given in equation (1).
Property 1: If Σ is non-singular then X has a (joint) p.d.f. in Rp given by
 
−p/2 −1/2 1 ′ −1
fX (x) = (2π) |Σ| exp − (x − µ) Σ (x − µ) (2)
2

Proof: First let’s prove that the function in (2) is indeed a (p-dimensional joint) p.d.f.. In
order to do that, first note that the r.h.s. of (2) is always positive. Next note that by the
spectral decomposition theorem, any positive-definite matrix, such as Σ (in general since
it is a dispersion matrix, Σ would be positive-semi-definite, but if it is also non-singular,

1
then it is positive-definite) may be decomposed as Σ = P ΛP ′ , where P is an orthogonal
and Λ is a diagonal matrix with positive diagonal entries, so that Σ−1 = P Λ−1 P ′ . Now
let√A = Λ−1/2 P ′ , where Λ−1/2 is the diagonal matrix with its i-th diagonal element being
1/ λi , where the i-th diagonal element of Λ is λi (> 0). Then Σ−1 = A′ A and |Σ|−1/2 =
|A|. Thus the r.h.s. of (2) may be written as (2π)−p/2 |A| exp − 21 y ′ y , where y = A(x−µ).


Thus if one makes the p-dimensional change of variable of x → A(x − µ) = y (say), then
Z  
−p/2 −1/2 1 ′ −1
(2π) |Σ| exp − (x − µ) Σ (x − µ) dx
Rp 2
Z   p  Z ∞ 
−p/2 1 ′ Y 1 − 12 yi2
= (2π) exp − y y dy = √ e dyi = 1.
Rp 2 i=1
2π −∞

Thus the fX (x) in (2) is a (p-dimensional joint) p.d.f. in Rp .


Now the m.g.f. of this p.d.f. may by found as
Z  
′ −p/2 −1/2 1 ′ −1 ′
MX (t) = E[exp(t X)] = (2π) |Σ| exp − (x − µ) Σ (x − µ) + t x dx
Rp 2
Z  
−p/2 −1/2 1 ′ −1 ′ 1 ′
= (2π) |Σ| exp − (x − µ − Σt) Σ (x − µ − Σt) + t µ + t Σt dx
Rp 2 2
  Z   
′ 1 ′ −p/2 −1/2 1 ′ −1
= exp t µ + t Σt (2π) |Σ| exp − (x − µ − Σt) Σ (x − µ − Σt) dx
2 Rp 2
 
1
= exp t′ µ + t′ Σt (since the integrand in the square bracket may be recognized as
2
same as the r.h.s. of (2) with µ = µ + Σt, which integrates to 1, as just shown above).

Thus we see that the r.h.s. of (2) is a legitimate (p-dimensional joint) p.d.f. (in Rp ), which
has the same m.g.f. as that of an N p (µ, Σ) distribution as established in (1). Now since the
m.g.f. of a distribution is unique, the N p (µ, Σ) distribution must have the same p.d.f. as
given in (2). ▽
Although for the individual components of X it has already been established in the para-
graph following Definition 1 that the marginal distribution of each of the individual com-
ponents of X is univariate Normal i.e. X i ∼ N 1 (µi , σii ) ∀i = 1, . . . , p, this result is more
general in the sense of the following.
Property 2: If X ∼ N p (µ, Σ), and Y is any q×1 subset of X given by Y ′ = (X i1 , . . . , X iq )
with q ≤ p such that 1 ≤ i1 , . . . , iq ≤ p,
 then 
the marginal (joint) distribution
 of of Y is a q-
µi1 σi1 i1 · · · σi1 iq
 ..   .. .. .
variate Normal with the mean vector  .  and dispersion matrix  . σ ij ik . 
µiq σ iq i1 · · · σ iq iq
Proof: Follows from the fact that any linear combination of the components of Y is also a
linear combination of the components of X, and the corresponding moments of Y trivially
follows from the definition of µ and Σ. ▽

2
 
X1
Property 3: If X ∼ N p (µ, Σ) and X is partitioned as X = , where X i is pi ×1 for
     X2 
X1 µ1 Σ11 Σ12
i = 1, 2 with p1 + p2 = p, so that ∼ Np , , with µi being
X2 µ2 Σ21 Σ22
pi × 1 and Σij being pi × pj for i, j = 1, 2, then X 1 and X 2 are independent iff Σ12 = 0p1 ×p2 ,
where 0p1 ×p2 , is the p1 × p2 null matrix.
Proof: X 1 and X 2 are independent ⇒ Cov(X 1 , X 2 ) = Σ12 = 0p1 ×p2 . For proving the
converse, first note that by Property 2 for i = 1, 2, X i ∼ N p i (µi , Σii ) and thus (by (1))
has the m.g.f. MX i (ti ) = exp t′i µi + 21 t′i Σii ti for ti ∈ Rpi . Now for t ∈ Rp first partition
   
t1 µ
t as t = so that for i = 1, 2, ti is pi × 1, t′ µ = (t′1 , t′2 ) 1
= t′1 µ1 + t′2 µ2 and
t2    µ2
Σ11 0p1 ×p2 t1
′ ′
t Σt = (t1 , t2 )′
(since Σ12 = 0p1 ×p2 and Σ21 = Σ′12 = 0p2 ×p1 ) =
0p2 ×p1 Σ22 t2
t′1 Σ11 t1 + t′2 Σ22 t2 . Thus the m.g.f. of X
 
′ 1 ′
MX (t) = exp t µ + t Σt
2
 
′ ′ 1 ′ ′
= exp (t1 µ1 + t2 µ2 ) + (t1 Σ11 t1 + t2 Σ22 t2 ) = MX 1 (t1 )MX 2 (t2 )
2
is the product of the m.g.f.s of X 1 and X 2 showing that X 1 and X 2 are independent1 . ▽
Property 4: If X ∼ N p (µ, Σ) then for any q × p matrix A and q × 1 vector b, Y =
AX + b ∼ N q (Aµ + b, AΣA′ ).
Proof: Follows from the fact that for any ℓ ∈ Rq ℓ′ Y = ℓ′ AX + ℓ′ b ∼ N1 (ℓ′ (Aµ +
b), ℓ′ AΣA′ ℓ) proving that Y has a (q-variate) Normal distribution. That E[Y ] = Aµ + b
and D[Y ] = AΣA′ (as well as the expressions for E[ℓ′ Y ] and V [ℓ′ Y ] mentioned in the
last sentence) follows from the moment results for random vectors (proved in the other
supplementary notes). ▽
Corollary 4: Let Σ−1/2 = P Λ−1/2 P ′ , where P , Λ and Λ−1/2 are as in the proof of Prop-
erty 1 obtained through the spectral decomposition of Σ. Then Z = Σ−1/2 (X − µ) ∼
N p (0p , I p ), where 0p is the p × 1 null vector and I p is the p × p identity matrix.
Proof: Immediately follows by taking q = p, A = Σ−1/2 and b = −Σ−1/2 µ in Property 4.

This result (in Corollary 4) may be thought of as the multivariate generalization of
standardization of a Normal variate, which in the univariate case simply states that if X ∼
N1 (µ, σ 2 ) then Z = X−µσ
∼ N1 (0, 1). And like Z 2 ∼ χ21 in the univariate case, here it
generalizes to the following.
Property 5: If X ∼ N p (µ, Σ) then (X − µ)′ Σ−1 (X − µ) = Z ′ Z ∼ χ2p .
1
Using
 similar
 calculations and (2) it is also easy to show that if Σ12 = 0p1 ×p2 then the joint p.d.f. of
X1
X= is the product of the p.d.f.s of X 1 and X 2 and thus establishing their statistical independence.
X2
However this requires the assumption of the existence of the densities of X, X 1 and X 2 (which may not
exist because of the singularities of the respective dispersion matrices) and thus the proof using the m.g.f.
as given above is more general, which remains valid in much broader situations.

3
 
X1
Property 6: If X ∼ N p (µ, Σ) and X is partitioned as X = as in Property
X2
3, then the conditional distribution of X 1 |X 2 = x2 is N p1 (µ1 + Σ12 Σ−1 22 (x2 − µ2 ), Σ11 −
Σ12 Σ22 Σ21 ) (and likewise the conditional distribution of X 2 |X 1 = x1 is N p2 (µ2 +Σ21 Σ−1
−1
11 (x1 −
−1
µ1 ), Σ22 − Σ21 Σ11Σ12 )).
−Σ12 Σ−1

I p1 22
Proof: Let A = and Y = A(X − µ). Then by Property 4 Y ∼
0p2 ×p1 I p2
N p (0p , AΣA′ ).
−Σ12 Σ−1
   
′ I p1 22 Σ11 Σ12 I p1 0p1 ×p2
Now D[Y ] = AΣA =
0p2 ×p1 I p2 Σ21 Σ22 −Σ−1
22 Σ21 I p2
−1 −1
    
Σ11 − Σ12 Σ22 Σ21 0p1 ×p2 I p1 0p1 ×p2 Σ11 − Σ12 Σ22 Σ21 0p1 ×p2
= = .
Σ21 Σ22 −Σ−122 Σ21 I p2 0p2 ×p1 Σ22
 
Y1
Thus if we partion Y as Y = with a pi × 1 Y i for i = 1, 2, then by Property 3
Y2
Y 1 and Y 2 are independent. Thus the conditional distribution of Y 1 given Y 2 is same as
its unconditional distribution, which (∵ Y ∼ N p (0p , AΣA′ )) by Property 2 is a p1 -variate
Normalwith mean
 0p1 and dispersion Σ11 − Σ12 Σ−1 22 Σ21 .
Y1
Since = Y = A(X − µ)
Y2
−Σ12 Σ−1 X 1 − µ1 − Σ12 Σ−1
    
I p1 22 X 1 − µ1 22 (X 2 − µ2 )
= =
0p2 ×p1 I p2 X 2 − µ2 X 2 − µ2

the conditional distribution of Y 1 given X 2 also is N p1 (0p1 , Σ11 − Σ12 Σ−1 22 Σ21 ). But now
given X 2 = x2 Y 1 = X 1 −c where c is the constant vector µ1 +Σ12 Σ−1 22 (x2 −µ2 ). Therefore
the conditional distribution of X 1 given X 2 is same as the conditional distribution of Y 1 + c
which (by Property 4) is N p1 (c, Σ11 − Σ12 Σ−1 22 Σ21 ), which by substituting the value of c
is N p1 (µ1 + Σ12 Σ−1
22 (x 2 − µ2 ), Σ11 − Σ Σ −1
12 22 Σ 21 ). ▽

You might also like