Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Multivariate data and Matrix algebra

By
Dr.Richard Tuyiragize
School of Statistics and Planning
Makerere University

January 12, 2022


1 Introduction
Multivariate analysis (MVA) deals with the statistical analysis of data collected on multiple
response variables on the same individual or subject or experimental unit. MVA enables a
joint or simultaneous examination of the variables in order to assess the effect (dependence
or correlation) of each variable in the presence of the others. For example, if economists want
to predict the likelihood of a recession they might look at consumers’ spending. However, to
get a more accurate picture, a wide range of factors must be considered; say from financial
factors to human behavior.

MVA gathers and puts together all possible information on many variables to make pre-
dictions and answers questions. MVA techniques play an important role in data analysis
in almost all branches of knowledge including solid science physical sciences and medical
sciences, engineering, physical sciences etc. With the advance of computer technology, the
applications of MVA are increasing because hundreds of factors can be considered in solving
a problem.

The methods of MVA can be broadly grouped into (4) categories;

1. Generalizations of univariate methods: These are methods with equivalents in


a single variable case and are extensions of the latter to handle several variables si-
multaneously. Thus, instead of computing mean and variance for individuals one at
a time, one computes a mean vector and a variance-covariance matrix respectively. If
one is to test for the hypothesis of a single variable, he can use the t-test which can
be generalized into Hotellings T-test for multivariate t-test. There is also Multivariate
Analysis of Variance (MANOVA) which is an extension for ANOVA.

2. Reduction of the dimensionality of the data: This refers to methods that are
primarily applied to reduce the data to a few manageable variables which can then
be used for further analysis. The new variables are in general uncorrelated but can
be a linear combination of the original variables. The most widely used method for
data reduction is Principal Component Analysis which forms linear combinations of
the original variables that are uncorrelated and have decreasing variance.

3. Classification: These techniques are used to classify individuals into distinct sub-
groups on the basis of the number of (possibly independent) variables. When the
groups are known a priori, Discriminant analysis is used to allocate members within the
group to the correct subgroups. A priori information required  group membership.
However if it is only suspected that natural groupings may exist and the aim is to
discover such groups, one uses a technique known as luster analysis to identify the
groups in question.

4. Dependency: The relationship between variables can be determined for the purpose
of predicting the values of one or more variables on the basis of observations on the other
variables. Example: Multivariate linear regression, multivariate analysis of variance.

STA3120 1 Email:trgzrich@yahoo.com
2 Structure of multivariate data set
Most multivariate data sets can be represented in a rectangular format, in which the ele-
ments of each row correspond to the variable values of a particular unit in the data set and
the elements of the columns correspond to the values taken by a particular variable.

Suppose there are p ≥ 2 variables (characteristics) measured from n items. Let xij denote
the value of the j th variable on the ith item (i = 1, 2, · · · ,n and j = 1, 2, · · · ,p, n ≫ p).

Variable1 Variable2 · · · Variablej · · · Variablep


Item1 x11 x12 ··· x1j ··· x1p

Item2 x21 x22 ··· x2j ··· x2p

.. .. .. .. .. .. ..
. . . . . . .

Itemn xn1 xn2 ··· xnj ··· xnp


 
x11 x12 · · · x1j · · · x1p
 
 
 x21 x22 · · · x2j · · · x2p 
 
Xnp =
 

 .. .. . . .. . . .. 
 . . . . . . 
 
 
xn1 xn2 · · · xnj · · · xnp
The above data is an nxp matrix. Where n is the number of observations and p is the
number of variables which is usually denoted as Xnp . We say that we have a p-variate or
p-dimensional dataset.

If i = 1, 2, · · · ,n is consisting of time, we have a time series dataset. Time series data is data
collected at different points in time. For instance GDP of Uganda for years 2000-2020.

If i = 1, 2, · · · ,n is subjects or objects (such as individuals, firms or countries/regions) then


we shall have a cross sectional data set. Cross sectional data is data collected at the same
point of time e.g.GDP for a given number of countries at a given time (annually, semi-annual,
quarterly etc).

If time series and cross sectional are combined, we shall have a panel data set or longitudinal
data set. (A data set is longitudinal if it tracks the same type of information on the same
subjects at multiple points in time). Panel data is data that combines both time series and
cross sectional data.

Special cases of nxp matrix

STA3120 2 Email:trgzrich@yahoo.com
If p = 1 the matrix reduces to an nx1 matrix (Univariate dataset)

If p = 2 implies nx2 (Bivariate dataset)

Similarly, tri-variate and higher dimensional data sets can be deduced

Note that Multivariate analysis (techniques) are generalizations of univariate and bivariate
techniques.

3 Types of variables
Variables are generally divided into two major categories; Quantitative and Categorical

ˆ Quantitative variable contains amounts.


ˆ Categorical variable contains groupings.
Each of these types of variable can be broken down into further types.

Quantitative variables: When you collect quantitative data, the numbers you record rep-
resent real amounts that can be added, subtracted, divided, etc. There are two types of
quantitative variables: discrete and continuous.

Discrete variables (aka integer variables) represents counts of individual items or values. For
example, Number of students in a class or Number of different tree species in a forest. Con-
tinuous variables (aka ratio variables) represents measurements of continuous or non-finite
values. For example, Distance, Volume or Age.

Categorical variables: These represent groupings of some kind. They are sometimes recorded
as numbers, but the numbers represent categories rather than actual amounts of things.
There are three types of categorical variables: binary, nominal, and ordinal variables.

Binary variables (aka dichotomous variables) represents yes/no outcomes. For example,
Heads/tails in a coin flip or Win/lose in a football game.
Nominal variables represent groups with no rank or order between them. For example,
Species names, Colors or Brands.
Ordinal variables represent groups that are ranked in a specific order. For example, finishing
place in a race or income level or rating scale responses.

Most of the techniques we are going to deal with in this course will assume variables of
numerical nature for which we can compute descriptive statistics, that is, means, variance,
standard deviation.

STA3120 3 Email:trgzrich@yahoo.com
The study of multivariate methods is greatly facilitated by the use of matrix algebra. The
next section presents a review of basic concepts of matrix algebra which are essential for the
interpretation and explanation of subsequent multivariate statistical techniques.

4 Matrix algebra

4.1 Definition of Matrix and Vector


A rectangular array of numbers with, for instance, n rows and p columns is called a matrix
of dimension n x p.

It is written as:
 
x11 x12 · · · x1j · · · x1p
 
 
 x21 x22 · · · x2j · · · x2p 
 
X=
 

 .. .. ... .. ... .. 
 . . . . 
 
 
xn1 xn2 · · · xnj · · · xnp
If  
a11 0 ··· 0 ··· 0
 
 
 a21 a22 · · · 0 · · · 0 
 
A=
 

 .. .. .. .. .. .. 
 . . . . . . 
 
 
an1 an2 · · · anj · · · anp
then A is a lower triangular matrix

If  
a11 a12 · · · a1j · · · a1p
 
 

 0 a22 · · · a2j · · · a2p 

A=
 

 .. .. .. .. .. .. 
 . . . . . . 
 
 
0 0 ··· 0 · · · anp
then A is a upper triangular matrix

STA3120 4 Email:trgzrich@yahoo.com
 
x1
 x2 
′ 
A vector is a matrix of n x1 real numbers; x =  ..  or x = x1 , x2 , · · · , xn
 
.
xn
For column vector x, if p=1, it reduces to a 1x1 vector called Scalar denoted as a11

If all the components are zeros, then the vector x is called zero or null or empty vector
denoted as 0.
′ 
A vector has both magnitude (length) and direction. The length of a vector x = x1 , x2 , · · · , xn
is defined by
p √
Lx = x21 + x22 + · · · + x2n = x′ x

The length of a vector can be expanded and contracted by multiplying with a constant a.
 
ax1
 ax2 
ax = ···

axn
Such multiplication of a vector x by a scalar a changes the length as;
p √
Lax = a2 x21 + a2 x22 + · · · + a2 x2n = |a| x′ x

When |a| > 1, vector x is expanded. When |a| < 1, vector x is contracted. When |a| = 1,
there is no change. If a< 0, the direction of vector x is changed.

The dot/Inner/Scalar product

For two vectors x and y, we define the dot product, i.e. x.y as the sum of the product of
corresponding components.

x .y = x1 y1 + x2 y2 + . . . . .+xp yp
 
y1
 y2 
 
= x1 x2 · · · xn  .. 
.
yn
Example
   
1 0
If x = and y =
2 3

STA3120 5 Email:trgzrich@yahoo.com
 
′ 0 
x.y = x .y = 1 2 =6
3
Note that the dot product is always a 1x1 matrix or a scalar.

Consider the vector x, the sum of the squares of its components;


p
x21 + x22 + . . . . .+x2p = x2j
P

 j=1
x1
  x2 

= x1 x2 · · · xp  .. 

.
xp

= x .x = Dot product

In multivariate analysis, the sums of squares and sums of products are dot products. Thus;
 
x1 − y 1
p  x2 − y 2 


(xj − yj )2 = x1 − y1 x2 − y2 · · · xp − yp  ..  = (x − y) .(x − y)
P
j=1  . 
x p − yp

4.2 Characteristics of a matrix


Rank

The rank of a matrix A is the maximum number of linearly independent rows (columns).

A set of k vectors x1 , x2 , · · · , xk is said to be linearly independent if a1 x1 + a1 x2 + . . . .


k
P
.+ak xk = ai xi = 0 only if a1 = a2 = . . . . . =ak = 0.
i=1

That is, if every ai is zero, the x1 , x2 , · · · , xk (columns) are linearly independent.

Linear independence implies every vector can not be written as a linear combination of the
other vectors. Vectors of the same dimension that are not linearly independent are said to be
linearly dependent which means at least one vector can be written as a linear combination
of the other vectors.

Example
     
3 2 3 4
Let A = , such that x1 = and x2 =
4 1 2 1
a1 x1 + a2 x2 = 0 =⇒ 3a1 + 2a2 =0 and 4a1 + a2 = 0; holds only if a1 = a2 = 0.

STA3120 6 Email:trgzrich@yahoo.com
This confirms
 that x1 and x2 are linearly independent. In other words, the columns of ma-
3 2
trix A = are linearly independent.
4 1
Example
       
1 1 1 1 1 1
Let A = 2 5 −1 , such that x1 = 2 , x2 = 5 , x3 = −1
      
0 1 −1 0 1 −1
a1 x 1 + a2 x 2 + a3 x 3 = 0

=⇒ a1 + a2 + a3 =0 ; 2a1 + 5a2 + −a3 =0; a2 + −a3 =0

=⇒ a1 + a2 =0; If a1 = a2 = 0, the a3 =0; If a1 = 1, then a2 = a3 = 0.5.

Therefore, x1 and x2 and x3 are not linearly independent.

Trace
k
P
The trace of a matrix is the sum of its diagonal elements: tr(A) = aii
i=1

Transpose of a matrix

When a new matrix is formed by inter-changing rows and columns of matrix A, then the
new formed matrix becomes the transpose of matrix A, denoted as AT or A1 .
′ ′ ′ ′ ′ ′
For matrices A and B, (AB) = B A and (A ± B) = A ± B

Square matrix

A p x p matrix Apxp is called a square matrix of order p’.

If all the off diagonal entries are zeros, the matrix A becomes a diagonal matrix denoted as
Diagp (aij ).

If Diagp (1), it becomes an identity matrix.


 
1 0 ··· 0 ··· 0
 
 
0
 1 ··· 0 · · · 0 
Diagp (1) = 
 

 .. .. ... .. . . . .. 
. . . .
 
 
0 0 ··· 0 ··· 1

STA3120 7 Email:trgzrich@yahoo.com
Symmetric matrix

A square matrix Apxp is said to be symmetric if Apxp = ATpxp so that aij = aji for every i and j.

 
a11 a12 · · · a1j · · · a1p
 
 
a21 a22 · · · a2j · · · a2p 
 
A=
 

 .. .. . . .. . . .. 
 . . . . . . 
 
 
ap1 ap2 · · · 0 · · · app

All diagonal matrices are symmetric.

Since for a symmetric matrix A, aij = aji for all i, j, we may decide to write only the upper
or lower triangle. i.e.

   
a11 a12 · · · a1j · · · a1p a11 ··· ···
   
   

 a22 · · · a2j · · · a2p   a21 a22 · · ·
  ··· 

A=  or 
   

 .. .. .. .. .. ..   .. .. .. .. .. .. 
 . . . . . .   .
  . . . . . 
 
   
··· · · · anp an1 an2 · · · anj · · · anp
Determinant of matrix

The determinant of a square matrix A denoted as det(A) or |A| of order pxp is a scalar such
that;

|A| = a11 A11 + a12 A12 + . . . . . . .+ a1p A1p

where Aij = (−1)i+j ∗ determinant of the matrix with the ith and j th column

Note that the determinant of a scalar say a is that very scalar, a

If A = Diagp (aij ), the |A| is the product of the diagonal entries,


p
Q
|A| = a11 .a22 . . . . . . . app = ajj
j=1

Example

STA3120 8 Email:trgzrich@yahoo.com

2 0 0
Given a diagonal matrix A = 0 4 0, then the determinant of A will be
0 0 1
|A| = 2 *4 *1 = 8

Inverse of matrix

For a square matrix A of order p i.e. Apxp , if there exists another square matrix B such that
A−1 B = BA = Ip , then B is termed as the inverse of A. Then we say that A is invertible or
non-singular, i.e. |A| =
̸ 0

For a 1 x 1 matrix (scalar) a, a−1 = 1


a

If A = diagonal matrix (ajj ), then; A−1 = diagp ( a1jj )


 
1  
3 0 0 3 0 0
 −1 0 4 0 
For example, If A =  0 1 ,A = ,

0 1
 4 
0 0
0 0 4 4

4.3 Eigen values and eigenvectors


Eigen values play an important role in multivariate techniques.

Let A be, a kxk square matrix and Ik be a kxk identity matrix, then the scalars λ1 , λ2 , · · · λk
satisfying the polynomial equation |A − λI| = 0 are called the eigen values or characteristic
roots of matrix A.

The equation |A − λI| = 0 is also called a characteristic equation.

Example
 
4 0
Given A = , obtain the eigen values of matrix A.
1 2
Solution

From |A − λI| = 0;
   
4 0 1 0

−λ I = 0; (4 − λ)(2 − λ) = 0
1 2 0 1

Where; λ1 = 4 and λ2 = 2 are the eigen values of matrix A

Let A be a matrix of dimension kxk and let λ be an eigenvalue of A. If x is a non-zero vector

STA3120 9 Email:trgzrich@yahoo.com
(x ̸= 0) such that Ax = λx, then x is said to be an eigen vector of matrix A associated with
eigen value λ.

If the multiplication of the vector by the square matrix results in a change in length of the
vector but does not result in the change of the direction or changes the vector to the opposite
direction, the vector is called an eigen vector of that matrix.

To each Eigen value λi , there exist a corresponding eigen vector x . Eigen vectors are not
unique as they contain an arbitrary scale factor, and so they are usually normalized so that
xT X = 1

The normalized eigen vector, ei , of xi is: ei = 1


x
Lxi i
= √xi′
xi xi

* ei ei = 1, for all i

′ x x
* ei ej = √ i′ q j

= 0 for all i ̸= j
xi xi xj xj
′ ′ ′
The normalized eigen vectors are chosen to satisfy e1 e1 = e2 e2 = . . . . . = ek ek = 1 and

be mutually perpendicular, ei ej = 0 for all i ̸= j

Example
   
6 16 −8
Let A = and X =
−1 −4 1
      
6 16 −8 −32 −8
AX = = =4
−1 −4 1 4 1
 
−8
is an eigen vector with an eigenvalue of 4
1
Example
   
2 1 3
Let A = and X =
1 2 −3
      
2 1 3 3 3
AX = = =1
1 2 −3 −3 −3
 
3
is an eigen vector with an eigenvalue of 1
−3
Example
 
1 2
Find the eigen values and eigen vectors of A =
3 2
   
1 2

1 0
1 − λ 2
From |A − λI| = 0; −λ I = 0; =⇒
=0
3 2 0 1 3 2 − λ

STA3120 10 Email:trgzrich@yahoo.com
(1 − λ)(2 − λ) - 6 =0; =⇒ λ2 − 3λ − 4 = 0

Thus, the eigen values of A are λ1 = 4 and λ2 = -1

To find the corresponding eigen vectors: Axi = λi xi , for i = 1, 2

For λ1 = 4
    
1 2 x11 x
Ax1 = λ1 x1 =⇒ = 4 11
3 2 x21 x22
=⇒ x11 + 2x21 = 4x11 =⇒ x21 = 23 x11 .
 
2
Let x11 = 2 =⇒ x21 = 3. Thus, x = —– is not unique
3
 
2
The normalized eigen vector of x = is
3
!
√2
 
2
e1 = √ 1′ x1 = √4+9
1
= √313
x1 x1 3 13

Note that e1 e1 = 1.

For λ2 = −1
    
1 2 x12 x
Ax2 = λ2 x2 =⇒ = 4 12
3 2 x22 x22
=⇒ x12 + 2x22 = −x12 =⇒ x22 = −x12 .
 
1
Let x12 = 1 =⇒ x22 = -1. Thus, x = —– is not unique
=1
 
1
The normalized eigen vector of x = is
−1
!
√1
 
1
e2 = √ 1′ x2 = √1+1
1
= √ −1
2
x2 x2 −1 2
′ ′
Note that e2 e2 = 1. Also, e1 and e2 are orthogonal (perpendicular), that is, e1 e2 =0

4.4 Matrix partitioning


It is sometimes convenient to partition a matrix into sub matrices. Partitioning is a way of
expressing a matrix as being made up of two or more sub matrices called partitions.

STA3120 11 Email:trgzrich@yahoo.com
It is also a useful way ofdetermining the inverse or determinant of square matrices.
 
2 3 4  
A 11 A 12
Given a matrix A = 3  2 3, A can be partitioned as = ,
A21 A22
4 3 4
 
 2 3
where A11 = 2; A12 = 3 4 ; A21 = 3; A22 = ,
3 4

4.5 HSUs result


 
A11 A12
If a square matrix say A is conveniently partitioned as A = , where A11 and A22
A21 A22
are square matrices.
(
Then |A|= |A11 ||A22 − A21 A−1
11 A12 |, if |A11 | =
̸ 0

|A22 ||A11 − A12 A−1


22 A21 |, if |A22 | =
̸ 0  
10 5 −2 0
 
For example, find the determinant of A =  6 3 2 1,

 4 5 12 3
5 1 3 8

10 5
Using HSUs result, |A11 | = = 0; this form fails.
6 3

12 3
The second form, |A22 | =
= 87; second form is adopted.
3 8
|A| =|A22 ||A11 − A12 A−1 A |
   22 21   −1  
12 3 10 5 −2 0 12 3 4 5
= - .
3 8 6 3 2 1 3 8 5 1
 −1  
12 3 1 8 −3
But = 87
3 8 −3 12
     
−1 1 −2 0 8 −3 4 5 1 −34 −74
⇒ A12 A22 A21 = 87 = 87
2 1 −3 12 5 1 82 71
   
10 5 1 −34 −74
∴ |A| = 87 −
6 3 87 82 71
=

student to complete!!!

Question:

STA3120 12 Email:trgzrich@yahoo.com
 
1 x x2 0
 0 1 x x2 
Find the determinant of matrix A using matrix partitioning; A =  
 x2 0 1 x 
x x2 0 1

4.6 Quadratic forms of matrix


For a square symmetric matrix say A and a column vector x both of order p, the expression

x Ax is defined as the quadratic form of matrix A and it is always a scalar.
 
a11 a12 
Let A = ; and x = x1 x2
a21 a22
  
′  a11 a12 x1
Then, x Ax = x1 x2
a21 a22 x2
 
 x1
= a11 x1 + a12 x2 a21 x1 + a22 x2
x2
= a11 x21 + a12 x1 x2 + a21 x1 x2 + a22 x22

All the terms in are of 2nd order i.e quadratic and the expression is called a Quadratic form
denoted as Q(f ) of a matrix A.

A square symmetric matrix A and its Q(f ) are said to be positive semi–definite if the Q(f )

is always positive or equal to zero for all non – zero; X AX ≥ 0 ∀ x ̸= 0

A square symmetric matrix A of order p is strictly positive definite if X AX ≥ 0 ∀ x ̸= 0
and its Q(f ) are said to be positive semi–definite if the Q(f ) is always positive or equal to

zero for all non – zero; X AX > 0 ∀ x ̸= 0

STA3120 13 Email:trgzrich@yahoo.com

You might also like