MRM834 Notes

1 Organization of Data and Distance
Multivariate data: p ≥ 1 variables, each variable recorded for n distinct items, indi-
viduals or experimental trials.
Example: annual maximum temperature and annual maximum rainfall in Manch-
ester from 1901-2000. Here, p = 2 and n = 100.
1.1 Data Matrix

Let xij denote the measurement of the ith variable on the jth item. Consequently,
one can display the measurements as
Item 1 Item 2 · · · Item j ··· Item n

Variable 1 x11 x12 ··· x1j ··· x1n
Variable 2 x21 x22 ··· x2j ··· x2n
.. .. .. .. ..
. . . . .
Variable i xi1 xi2 ··· xij ··· xin
.. .. .. .. ..
. . . . .
Variable p xp1 xp2 ··· xpj ··· xpn
or equivalently as the matrix

 
x11 x12 · · · x1j · · · x1n
 x21 x22 · · · x2j · · · x2n 
 . 
 . .. .. .. 
 . . . . 
X= 
 xi1 xi2 · · · xij · · · xin 
 . .. .. .. 
 .. . . . 
xp1 xp2 · · · xpj · · · xpn
This is known as the data matrix.
1.2 Descriptive Statistics

The sample mean of the ith variable is defined by:
∑
n
1
x̄i = n
xij . (1.1)
j=1
The sample variance of the ith variable is defined by:

∑
n
s2i = sii = 1
n−1
(xij − x̄i )2 . (1.2)
j=1
Also denoted by sii . The sample standard deviation of the ith variable is the
√
square root sii .
The sample covariance of the ith and the kth variables is defined by:
∑
n
sik = 1
n−1
(xij − x̄i )(xkj − x̄k ).
j=1
1
Note sik = ski .
The sample correlation coefficient between the ith and the kth variables is defined
by:
rik = √siisik
√
skk
.
Also rik = rki .
The corresponding matrix representations are:
 
x̄1
 x̄2 
 
x =  .. 
 . 
x̄p
 
s11 s12 · · · s1p
 s21 s22 · · · s2p 
 . 
S=
. .. 
.. .. . .
 . . 
..
sp1 sp2 . spp
and  
1 r12 · · · r1p
 r21 1 · · · r2p 
 
R= .. .. . . . .. 
 . . . 
rp1 rp2 ··· 1
1.3 Graphical Displays

Consider the data matrix
 
9 2 6 5 8
X =  12 8 6 4 10 
3 4 0 2 1
The commend pairs produces pairwise scatterplots.
> x < −matrix(c(9, 2, 6, 5, 8, 12, 8, 6, 4, 10, 3, 4, 0, 2, 1), 5, 3)
> x[, 1][, 2][, 3]
1.4 Distance Measures

Consider two data points x = (x1 , · · · , xp ) and y = (y1 , · · · , yp ). Some distance
measures are:
Euclidean distance between x and y is:
√
d(x, y) = (x1 − y1 )2 + · · · + (xp − yp )2 .
the city-block distance between x and y is:
d(x, y) = w1 |x1 − y1 | + · · · + wp |xp − yp |,
where wk ’s are weights (e.g. wk = 1/p).
2
Minkowski distance between x and y is:
( )1/λ
d(x, y) = w1 |x1 − y1 |λ + · · · + wp |xp − yp |λ .
Canberra distance between x and y is:

|x1 −y1 | |xp −yp |
d(x, y) = x1 +y1
+ ··· + xp +yp
.
Bhattacharyya distance between x and y is:

√ √ √ √
d(x, y) = ( x1 − y 1 )2 + · · · + ( xp − y p )2
the statistical distance between x and y is:

√
(xp −yp )2
d(x, y) = (x1s−y
2
11
1)
+ ··· + spp
.
Also known as the Karl Pearson distance (named after the British statistician).
Mahalanobis distance between x and y is:
d(x, y) = (x − y)T S −1 (x − y).
In general, a distance measure d(x, y) must satisfy the following conditions:

1. d(x, y) = d(y, x).
2. d(x, y) > 0 if x ̸= y.
3. d(x, y) = 0 if x = y.
4. d(x, y) ≤ d(x, z) + d(z, y).
2 Some Matrix Algebra

2.1 Vectors
The inner product between two vectors x and y is defined by
xT y = x1 y1 + · · · + xk yk .
The length of a vector xT = (x1 , · · · , xk ) is

√ √
xT x = x21 + · · · + x2k .
The angle θ between x and y is given by

Ty
cosθ = √ x√ .
xT x yT y
If xT y = 0 then x and y are perpendicular.

A set of vectors x1 , · · · , xn is said to be linearly dependent if there exist constants
c1 , ..., cn , not all zero, such that
c1 x1 + + cn xn = 0.
Otherwise, they are linearly independent.
3
2.1.1 Matrices
Transpose
((A)T )T = A.
(A + B)T = AT + B T .
(AB)T = B T AT .
A matrix A is symmetric if A = AT .
A matrix A is idempotant if A = AT and A2 = A.
Trace
The trace of a matrix is sum of its diagonal elements denoted by tr(A).
tr(c) = c.
tr(A ± B) = tr(A) ± tr(B).
tr(cA) = c tr(A).
∑
tr(AB) = tr(BA) = aij bji .
i,j
∑
tr(AAT ) = tr(AT A) = i,j a2ij .
tr(B −1 AB) = tr(A).
Determinants
∑ ∑
|A| = ki=1 aij Aij = kj=1 aij Aij , where Aij are the co-factors.
A said to be non-singular if |A| ̸= 0.

∏k
If A is triangular or diagonal, |A| = i=1 aii .
|A| = |AT |.
If A is non-singular |A−1 | = 1/|A|.
|cA| = ck |A|.
|AB| = |A||B|.
For k = 2, |A| = a11 a22 − a12 a21 .
For k = 3, |A| = a11 (a22 a33 −a23 a32 )−a12 (a21 a33 −a23 a31 )+a13 (a21 a32 −a31 a22 ).
4
Rank
The rank of a matrix A is the maximum number of linearly independent rows or
columns.
If A is p × n then 0 ≤ rank(A) ≤ min(p, n).
rank(A) = rank(AT ).
rank(AT A) = rank(AAT ) = rank(A).
If n = p then rank(A) = p if and only if A is non-singular.
If A is idempotent then its rank is equal to tr(A).
Inverse
For a k × k matrix A if there exists a matrix B such that
BA = I
then B is called the inverse of A and is denoted by A−1 . Here, I denotes the k × k
identity matrix.
A−1 = (Aij )T /|A|.
(cA)−1 = c−1 A−1 .
(A−1 )T = (AT )−1 .
(AB)−1 = B −1 A−1 .
For k = 2, ( )
−1 1 a22 −a12
A = .
a11 a22 −a12 a21 −a21 a11
Orthogonal Matrices
A k × k matrix A is said to be orthogonal if AT = A−1 .
AT A = I.
|A| = ±1.
aTi ai = 1 and aTi aj = 0 for i ̸= j, where aTi denotes the ith row of A.
AB is orthogonal if A and B are orthogonal.
5
Eigenvalues and Eigenvectors
The eigenvalues λ1 , · · · , λk of a k × k symmetric matrix A are the solutions of the
equation |A − λI| = 0. The eigenvector ej corresponding to λj is given by:
Aej = λj ej .
Usually the eigenvectors are chosen to satisfy eT1 e1 = · · · = eTk ek = 1 and be mutually
perpendicular.
|A| = Πλi .
∑
tr(A) = λi .
The rank of A equals the number of non-zero eigenvalues.
If A is idempotent then λi = 0 or 1 for all i.
Spectral Decomposition
The spectral decomposition of any k × k symmetric matrix A is:
A = λ1 e1 eT1 + · · · + λk ek eTk .
Also represented as:

A = P ΛP T ,
where
P = [e1 , e2 , · · · , ek ]
and  
λ1 0 ··· 0
 0 λ2 ··· 0 
 
Λ= .. .. ... .. .
 . . . 
0 0 · · · λk
Some consequences of this representation are:
∑
k
A−1 = 1
e eT
λi i i
= P Λ−1 P T ,
i=1
∑
k √
1/2
A = λi ei eTi = P Λ1/2 P T ,
i=1
∑
k
−1/2
A = √1 ei eT
λi i = P Λ−1/2 P T ,
i=1
∑
k
n
A = λni ei eTi = P Λn P T
i=1
and
∑
k
m/n
Am/n = λi ei eTi = P Λm/n P T .
i=1
6
2.1.2 Definiteness
A k × k symmetric matrix A is said to be nonnegative definite (denoted by A ≥ 0) if
xT Ax ≥ 0
for all xT = (x1 , · · · , xk ).

A k × k symmetric matrix A is said to be positive definite (denoted by A > 0) if
xT Ax > 0
for all x ̸= 0.
If A > 0 if and only if λi > 0.
If A > 0 then A is non-singular and |A| > 0.
If A ≥ 0 if and only if λi ≥ 0.
If A > 0 then A−1 > 0.
If A ≥ 0 then one can write A = B 2 for a symmetric matrix B.

MRM834 Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MRM834 Notes

Uploaded by

Copyright:

Available Formats

1 Organization of Data and Distance

1.1 Data Matrix

Item 1 Item 2 · · · Item j ··· Item n

or equivalently as the matrix

1.2 Descriptive Statistics

The sample variance of the ith variable is deﬁned by:

1.3 Graphical Displays

The commend pairs produces pairwise scatterplots.

> x < −matrix(c(9, 2, 6, 5, 8, 12, 8, 6, 4, 10, 3, 4, 0, 2, 1), 5, 3)

> x[, 1][, 2][, 3]

1.4 Distance Measures

 the city-block distance between x and y is:

d(x, y) = w1 |x1 − y1 | + · · · + wp |xp − yp |,

where wk ’s are weights (e.g. wk = 1/p).

 Canberra distance between x and y is:

 Bhattacharyya distance between x and y is:

 the statistical distance between x and y is:

d(x, y) = (x − y)T S −1 (x − y).

In general, a distance measure d(x, y) must satisfy the following conditions:

2 Some Matrix Algebra

The length of a vector xT = (x1 , · · · , xk ) is

The angle θ between x and y is given by

If xT y = 0 then x and y are perpendicular.

Otherwise, they are linearly independent.

tr(A ± B) = tr(A) ± tr(B).

tr(B −1 AB) = tr(A).

A said to be non-singular if |A| ̸= 0.

If A is non-singular |A−1 | = 1/|A|.

For k = 2, |A| = a11 a22 − a12 a21 .

If A is p × n then 0 ≤ rank(A) ≤ min(p, n).

rank(AT A) = rank(AAT ) = rank(A).

If n = p then rank(A) = p if and only if A is non-singular.

If A is idempotent then its rank is equal to tr(A).

A−1 = (Aij )T /|A|.

(cA)−1 = c−1 A−1 .

(A−1 )T = (AT )−1 .

AB is orthogonal if A and B are orthogonal.

The rank of A equals the number of non-zero eigenvalues.

If A is idempotent then λi = 0 or 1 for all i.

Also represented as:

for all xT = (x1 , · · · , xk ).

If A > 0 if and only if λi > 0.

If A > 0 then A is non-singular and |A| > 0.

If A > 0 then A−1 > 0.

If A ≥ 0 then one can write A = B 2 for a symmetric matrix B.

You might also like

the city-block distance between x and y is:

Canberra distance between x and y is:

Bhattacharyya distance between x and y is:

the statistical distance between x and y is: