Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

1 Organization of Data and Distance

Multivariate data: p ≥ 1 variables, each variable recorded for n distinct items, indi-
viduals or experimental trials.
Example: annual maximum temperature and annual maximum rainfall in Manch-
ester from 1901-2000. Here, p = 2 and n = 100.

1.1 Data Matrix


Let xij denote the measurement of the ith variable on the jth item. Consequently,
one can display the measurements as

Item 1 Item 2 · · · Item j ··· Item n


Variable 1 x11 x12 ··· x1j ··· x1n
Variable 2 x21 x22 ··· x2j ··· x2n
.. .. .. .. ..
. . . . .
Variable i xi1 xi2 ··· xij ··· xin
.. .. .. .. ..
. . . . .
Variable p xp1 xp2 ··· xpj ··· xpn

or equivalently as the matrix


 
x11 x12 · · · x1j · · · x1n
 x21 x22 · · · x2j · · · x2n 
 . 
 . .. .. .. 
 . . . . 
X= 
 xi1 xi2 · · · xij · · · xin 
 . .. .. .. 
 .. . . . 
xp1 xp2 · · · xpj · · · xpn
This is known as the data matrix.

1.2 Descriptive Statistics


The sample mean of the ith variable is defined by:

n
1
x̄i = n
xij . (1.1)
j=1

The sample variance of the ith variable is defined by:



n
s2i = sii = 1
n−1
(xij − x̄i )2 . (1.2)
j=1

Also denoted by sii . The sample standard deviation of the ith variable is the

square root sii .
The sample covariance of the ith and the kth variables is defined by:

n
sik = 1
n−1
(xij − x̄i )(xkj − x̄k ).
j=1

1
Note sik = ski .
The sample correlation coefficient between the ith and the kth variables is defined
by:
rik = √siisik

skk
.
Also rik = rki .
The corresponding matrix representations are:
 
x̄1
 x̄2 
 
x =  .. 
 . 
x̄p
 
s11 s12 · · · s1p
 s21 s22 · · · s2p 
 . 
S=
. .. 
.. .. . .
 . . 
..
sp1 sp2 . spp
and  
1 r12 · · · r1p
 r21 1 · · · r2p 
 
R= .. .. . . . .. 
 . . . 
rp1 rp2 ··· 1

1.3 Graphical Displays


Consider the data matrix
 
9 2 6 5 8
X =  12 8 6 4 10 
3 4 0 2 1

The commend pairs produces pairwise scatterplots.

> x < −matrix(c(9, 2, 6, 5, 8, 12, 8, 6, 4, 10, 3, 4, 0, 2, 1), 5, 3)

> x[, 1][, 2][, 3]

1.4 Distance Measures


Consider two data points x = (x1 , · · · , xp ) and y = (y1 , · · · , yp ). Some distance
measures are:
ˆ Euclidean distance between x and y is:

d(x, y) = (x1 − y1 )2 + · · · + (xp − yp )2 .

ˆ the city-block distance between x and y is:

d(x, y) = w1 |x1 − y1 | + · · · + wp |xp − yp |,

where wk ’s are weights (e.g. wk = 1/p).

2
ˆ Minkowski distance between x and y is:
( )1/λ
d(x, y) = w1 |x1 − y1 |λ + · · · + wp |xp − yp |λ .

ˆ Canberra distance between x and y is:


|x1 −y1 | |xp −yp |
d(x, y) = x1 +y1
+ ··· + xp +yp
.

ˆ Bhattacharyya distance between x and y is:


√ √ √ √
d(x, y) = ( x1 − y 1 )2 + · · · + ( xp − y p )2

ˆ the statistical distance between x and y is:



(xp −yp )2
d(x, y) = (x1s−y
2

11
1)
+ ··· + spp
.

Also known as the Karl Pearson distance (named after the British statistician).
ˆ Mahalanobis distance between x and y is:

d(x, y) = (x − y)T S −1 (x − y).

In general, a distance measure d(x, y) must satisfy the following conditions:


1. d(x, y) = d(y, x).
2. d(x, y) > 0 if x ̸= y.
3. d(x, y) = 0 if x = y.
4. d(x, y) ≤ d(x, z) + d(z, y).

2 Some Matrix Algebra


2.1 Vectors
The inner product between two vectors x and y is defined by

xT y = x1 y1 + · · · + xk yk .

The length of a vector xT = (x1 , · · · , xk ) is


√ √
xT x = x21 + · · · + x2k .

The angle θ between x and y is given by


Ty
cosθ = √ x√ .
xT x yT y

If xT y = 0 then x and y are perpendicular.


A set of vectors x1 , · · · , xn is said to be linearly dependent if there exist constants
c1 , ..., cn , not all zero, such that

c1 x1 + + cn xn = 0.

Otherwise, they are linearly independent.

3
2.1.1 Matrices
Transpose

((A)T )T = A.
(A + B)T = AT + B T .
(AB)T = B T AT .
A matrix A is symmetric if A = AT .
A matrix A is idempotant if A = AT and A2 = A.

Trace
The trace of a matrix is sum of its diagonal elements denoted by tr(A).

tr(c) = c.

tr(A ± B) = tr(A) ± tr(B).

tr(cA) = c tr(A).

tr(AB) = tr(BA) = aij bji .
i,j

tr(AAT ) = tr(AT A) = i,j a2ij .

tr(B −1 AB) = tr(A).

Determinants
∑ ∑
|A| = ki=1 aij Aij = kj=1 aij Aij , where Aij are the co-factors.

A said to be non-singular if |A| ̸= 0.


∏k
If A is triangular or diagonal, |A| = i=1 aii .

|A| = |AT |.

If A is non-singular |A−1 | = 1/|A|.

|cA| = ck |A|.

|AB| = |A||B|.

For k = 2, |A| = a11 a22 − a12 a21 .

For k = 3, |A| = a11 (a22 a33 −a23 a32 )−a12 (a21 a33 −a23 a31 )+a13 (a21 a32 −a31 a22 ).

4
Rank
The rank of a matrix A is the maximum number of linearly independent rows or
columns.

If A is p × n then 0 ≤ rank(A) ≤ min(p, n).

rank(A) = rank(AT ).

rank(AT A) = rank(AAT ) = rank(A).

If n = p then rank(A) = p if and only if A is non-singular.

If A is idempotent then its rank is equal to tr(A).

Inverse
For a k × k matrix A if there exists a matrix B such that

BA = I

then B is called the inverse of A and is denoted by A−1 . Here, I denotes the k × k
identity matrix.

A−1 = (Aij )T /|A|.

(cA)−1 = c−1 A−1 .

(A−1 )T = (AT )−1 .

(AB)−1 = B −1 A−1 .

For k = 2, ( )
−1 1 a22 −a12
A = .
a11 a22 −a12 a21 −a21 a11

Orthogonal Matrices
A k × k matrix A is said to be orthogonal if AT = A−1 .

AT A = I.

|A| = ±1.

aTi ai = 1 and aTi aj = 0 for i ̸= j, where aTi denotes the ith row of A.

AB is orthogonal if A and B are orthogonal.

5
Eigenvalues and Eigenvectors
The eigenvalues λ1 , · · · , λk of a k × k symmetric matrix A are the solutions of the
equation |A − λI| = 0. The eigenvector ej corresponding to λj is given by:

Aej = λj ej .

Usually the eigenvectors are chosen to satisfy eT1 e1 = · · · = eTk ek = 1 and be mutually
perpendicular.

|A| = Πλi .

tr(A) = λi .

The rank of A equals the number of non-zero eigenvalues.

If A is idempotent then λi = 0 or 1 for all i.

Spectral Decomposition
The spectral decomposition of any k × k symmetric matrix A is:

A = λ1 e1 eT1 + · · · + λk ek eTk .

Also represented as:


A = P ΛP T ,
where
P = [e1 , e2 , · · · , ek ]
and  
λ1 0 ··· 0
 0 λ2 ··· 0 
 
Λ= .. .. ... .. .
 . . . 
0 0 · · · λk
Some consequences of this representation are:


k
A−1 = 1
e eT
λi i i
= P Λ−1 P T ,
i=1


k √
1/2
A = λi ei eTi = P Λ1/2 P T ,
i=1


k
−1/2
A = √1 ei eT
λi i = P Λ−1/2 P T ,
i=1


k
n
A = λni ei eTi = P Λn P T
i=1

and

k
m/n
Am/n = λi ei eTi = P Λm/n P T .
i=1

6
2.1.2 Definiteness
A k × k symmetric matrix A is said to be nonnegative definite (denoted by A ≥ 0) if

xT Ax ≥ 0

for all xT = (x1 , · · · , xk ).


A k × k symmetric matrix A is said to be positive definite (denoted by A > 0) if

xT Ax > 0

for all x ̸= 0.

If A > 0 if and only if λi > 0.

If A > 0 then A is non-singular and |A| > 0.

If A ≥ 0 if and only if λi ≥ 0.

If A > 0 then A−1 > 0.

If A ≥ 0 then one can write A = B 2 for a symmetric matrix B.

You might also like