Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

1/11/2023

• Introduction: Real life examples of multivariate data. The data


matrix. Calculation of summary statistics, mean vectors, covariance
and correlation matrices.
MULTIVARIATE METHODS • Linear Combination.

Lecture 1-2 • Multivariate Normal distributions and assumptions

Introduction • Test for mean vector: one-sample and Hotteling T tests based on

union intersection approach.

tkaranjah@2023 tkaranjah@2023

• Simultaneous confidence intervals for detecting important components. Introduction


Multivariate analysis consists of a collection of
• Testing equality of two population means/Paired Test. methods that can be used when several
measurements are made on each individual or
• Discriminant functions /Principal components
object in one or more samples.
• Distributions of linear functions of a random vector and of quadratic forms.
We will refer to the measurements as variables
Multivariate normal regression and correlation analysis. and to the individuals or objects as units
• Elements of multivariate analysis of variance. Use of computer packages.
(research units, sampling units, or experimental
tkaranjah@2023
units) or observations.
tkaranjah@2023

Introduction Introduction
Ordinarily the variables are measured simultaneously We seek to express what is going on in terms of a reduced
on each sampling unit. Typically, these variables are set of dimensions.
correlated. If this were not so, there would be little Such multivariate techniques are exploratory; they
use for many of the techniques of multivariate essentially generate hypotheses rather than test them.
analysis. On the other hand, if our goal is a formal hypothesis test,
we need a technique that will
(1) allow several variables to be tested and still preserve
We need to untangle the overlapping information
the significance level and
provided by correlated variables and peer beneath
the surface to see the underlying structure. Thus the (2) do this for any intercorrelation structure of the
goal of many multivariate approaches is variables.
simplification. Many such tests are available.
tkaranjah@2023 tkaranjah@2023

1
1/11/2023

Introduction Introduction
multivariate analysis is concerned generally with two areas, These linear functions may also be useful as a follow-up
descriptive and inferential statistics. In the descriptive to inferential procedures.
realm, we often obtain optimal linear combinations of
variables. When we have a statistically significant test result that
The optimality criterion varies from one technique to compares several groups, for example, we can find the
another, depending on the goal in each case. Although linear combination (or combinations) of variables that led
linear combinations may seem too simple to reveal the to rejection of the hypothesis.
underlying structure, we use them for two obvious reasons: Then the contribution of each variable to these linear
1. they have mathematical tractability (linear combinations is of interest.
approximations are used throughout all science for the
same reason) and
2. they often perform well in practice.
tkaranjah@2023 tkaranjah@2023

Mean And Variance Of A Univariate Random Variable


Introduction
In the inferential area, many multivariate techniques are Informally, a random variable may be defined as a
extensions of univariate procedures. variable whose value depends on the outcome of a
In such cases, we review the univariate procedure before chance experiment.
presenting the analogous multivariate approach. Generally, we will consider only continuous random
variables. Some types of multivariate data are only
approximations to this ideal, such as test scores or a
seven-point semantic differential (Likert) scale consisting
of ordered responses ranging from strongly disagree to
strongly agree.

tkaranjah@2023 tkaranjah@2023

Mean And Variance Of A Univariate Random Variable Mean And Variance Of A Univariate Random Variable

The density function f (y) indicates the relative frequency If f (y) is unknown, the population mean μ will ordinarily
of occurrence of the random variable y. remain unknown unless it has been established from
Thus, if f (𝒚𝟏 ) > f (𝒚𝟐 ), then points in the neighborhood of extensive past experience with a stable population.
𝒚𝟏 are more likely to occur than points in the If a large random sample from the population represented
neighborhood of 𝒚𝟐 .
by f (y) is available, it is highly probable that the mean of
The population mean of a random variable y is defined the sample is close to μ.
(informally) as the mean of all possible values of y and is
denoted by μ.
The mean is also referred to as the expected value of y,
or E(y). If the density f (y) is known, the mean can
sometimes be found using methods of calculus.
tkaranjah@2023 tkaranjah@2023

2
1/11/2023

Mean And Variance Of A Univariate Random Variable Mean And Variance Of A Univariate Random Variable

The sample mean of a random sample of n observations However, is considered a good estimator for μ because
𝒚𝟏 , 𝒚𝟐 , . . . , 𝒚𝒏 is given by the ordinary arithmetic average E( ) = μ and var( ) = 𝜎 2 /n, where 𝜎 2 is the variance
of y. In other words, is an unbiased estimator of μ and
has a smaller variance than a single observation y.

Generally, will never be equal to μ; by this we If every y in the population is multiplied by a constant a,
the expected value is also multiplied by a:
mean that the probability is zero that a sample
will ever arise in which is exactly equal to μ.

tkaranjah@2023 tkaranjah@2023

Mean And Variance Of A Univariate Random Variable Mean And Variance Of A Univariate Random Variable

The sample mean has a similar property The sample variance is defined as

The variance of the population is defined as which can be shown to be equal to

tkaranjah@2023 tkaranjah@2023

Mean And Variance Of A Univariate Random Variable Covariance And Correlation Of Bivariate Random Variables
The sample variance is generally never equal to the
population variance (the probability of such an Covariance
occurrence is zero), but it is an unbiased estimator, that If two variables x and y are measured on each research
is,
unit (object or subject), we have a bivariate random
variable (x, y).
If each y is multiplied by a constant a, then population Often x and y will tend to co-vary; if one is above its
variance is given as mean, the other is more likely to be above its mean,
and vice versa.

Similarly, if
For example, height and weight were observed for a
sample of 20 college-age males. The data are given in
then the sample variance of z is given by
Table 1.
tkaranjah@2023 tkaranjah@2023

3
1/11/2023

Covariance And Correlation Of Bivariate Random Variables Covariance And Correlation Of Bivariate Random Variables

Table 1. Height and Weight for a Sample of 20 College-age Males The values of height x and weight y from Table 1
are both plotted in the vertical direction in Figure1.
The tendency for x and y to stay on the same side
of the mean is clear in Figure 1.

tkaranjah@2023 tkaranjah@2023

Covariance And Correlation Of Bivariate Random Variables Covariance And Correlation Of Bivariate Random Variables
Figure 1. Two variables with a tendency to covary.

tkaranjah@2023 tkaranjah@2023

Covariance And Correlation Of Bivariate Random Variables Covariance And Correlation Of Bivariate Random Variables
To obtain the sample covariance for the height Now we have
and weight data in Table 1, we first calculate

By itself, the sample covariance 128.88 is not very


meaningful. We are not sure if this represents a small,
moderate, or large amount of relationship between y
and x.
A method of standardizing the covariance is given
next.
tkaranjah@2023 tkaranjah@2023

4
1/11/2023

Covariance And Correlation Of Bivariate Random Variables Covariance And Correlation Of Bivariate Random Variables

Correlation To find a measure of linear relationship that is


Since the covariance depends on the scale of invariant to changes of scale, we can standardize
measurement of x and y, it is difficult to the covariance by dividing by the standard
compare covariances between different pairs of deviations of the two variables.
variables. For example, if we change a
measurement from inches to centimeters, the This standardized covariance is called a
covariance will change. correlation.

tkaranjah@2023 tkaranjah@2023

Covariance And Correlation Of Bivariate Random Variables Covariance And Correlation Of Bivariate Random Variables

The population correlation of two random


variables x and y is and the sample correlation is

tkaranjah@2023 tkaranjah@2023

Covariance And Correlation Of Bivariate Random Variables Covariance And Correlation Of Bivariate Random Variables

We obtain the correlation for the height and Then


weight data by first calculating the sample
variance of x:
and, similarly

Thus we have

tkaranjah@2023 tkaranjah@2023

5
1/11/2023

MEAN VECTORS MEAN VECTORS

Let y represent a random vector of p variables


measured on a sampling unit (subject or object). The sample mean vector can be found either as
If there are n individuals in the sample, the n the average of the n observation vectors or by
observation vectors are denoted by calculating the average of each of the p variables
separately:
𝒚𝟏 , 𝒚𝟐 , . . . , 𝒚𝒏 , where

tkaranjah@2023 tkaranjah@2023

MEAN VECTORS MEAN VECTORS

All n observation vectors 𝒚𝟏 , 𝒚𝟐 , . . . , 𝒚𝒏 can be Since n is usually greater than p, the data can be
transposed to row vectors and listed in the data more conveniently tabulated by entering the
matrix Y as follows: observation vectors as rows rather than columns.

Note that the first subscript 𝑖 corresponds to units


(subjects or objects) and the second subscript 𝑗
refers to variables.
This convention will be followed whenever
possible
tkaranjah@2023 tkaranjah@2023

MEAN VECTORS MEAN VECTORS

In addition to the two ways of calculating


For example, the second element of is
it may be given by obtain from Y. We sum the
𝑛 entries in each column of Y and divide by n,
which gives

, which can be
transposes to obtain

tkaranjah@2023 tkaranjah@2023

6
1/11/2023

MEAN VECTORS MEAN VECTORS

In populations, the mean of y over all possible It can be shown that the expected value of each
values in the population is called the population
mean vector or expected value of y. It is defined
as a vector of expected values of each variable,

here 𝜇𝑗 is the population mean of the 𝑗 − 𝑡ℎ


variable. We emphasize again that y is never equal to
tkaranjah@2023 tkaranjah@2023

MEAN VECTORS MEAN VECTORS


Table 2. Calcium in Soil and Turnip Greens
Example 2
Table 2 gives partial data from Kramer and Jensen
(1969a). Three variables were measured (in millie-
quivalents per 100 g) at 10 different locations in
the South. The variables are

tkaranjah@2023 tkaranjah@2023

MEAN VECTORS COVARIANCE MATRICES

To find the mean vector, we simply calculate the The sample covariance matrix
average of each column and obtain is the matrix of sample variances and covariances
of the p variables:

tkaranjah@2023 tkaranjah@2023

7
1/11/2023

COVARIANCE MATRICES COVARIANCE MATRICES

In S the sample variances of the p variables are on


the diagonal, and all possible pairwise sample
covariance's appear off the diagonal.

The j-th row (column) contains the covariance's of


𝑦𝑗 with the other 𝑝 − 1 variables then,

We obtain S by The first simply calculating


tkaranjah@2023 tkaranjah@2023

COVARIANCE MATRICES COVARIANCE MATRICES


The sample covariance matrix S can also be
expressed in terms of the observation vectors: We can also obtain S directly from the data matrix
Y as;

tkaranjah@2023 tkaranjah@2023

COVARIANCE MATRICES COVARIANCE MATRICES

If y is a random vector taking on any possible The population covariance matrix can also be
value in a multivariate population, the population found as
covariance matrix is defined as

the sample covariance matrix S is an unbiased


estimator for population covariance, i.e
tkaranjah@2023 tkaranjah@2023

8
1/11/2023

COVARIANCE MATRICES COVARIANCE MATRICES


To calculate the sample covariance matrix for the
calcium data of Table 2 using these Continuing in this fashion, we obtain
computational, we need the sum of squares of
each column and the sum of products of each pair
of columns.
For instance the computation of s13.

tkaranjah@2023 tkaranjah@2023

CORRELATION MATRICES CORRELATION MATRICES

The sample correlation between the j-th and k-th


variables is defined as

The sample correlation matrix is analogous to the The second row, for example, contains the
covariance matrix with correlations in place of correlation of y2 with each of the y’s (including
covariance's: the correlation of y2 with itself, which is 1).
tkaranjah@2023 tkaranjah@2023

CORRELATION MATRICES CORRELATION MATRICES

The correlation matrix can be obtained from the Then by use of matrix approach,
covariance matrix, and vice versa.
Define;

The population correlation matrix analogous is


defined as

tkaranjah@2023 tkaranjah@2023

9
1/11/2023

CORRELATION MATRICES CORRELATION MATRICES

To obtain the sample correlation matrix for the


example 2, we can calculate the individual elements
or use the direct matrix operation in using the
diagonal matrix;

where

Note that .865 > .493 > .327, which is a different order
tkaranjah@2023
than that of the covariances in S
tkaranjah@2023

Mean Vectors And Covariance Matrices For Subsets Of Variables Mean Vectors And Covariance Matrices For Subsets Of Variables

Two Subsets We will denote the two sub-vectors by y and x, with p


variables in y and q variables in x. Thus each observation
Sometimes a researcher is interested in two different
vector in a sample is partitioned as
kinds of variables, both measured on the same sampling
unit.
For example, several classroom behaviors are observed
for students, and during the same time period (the basic
experimental unit) several Lecturers behaviors are also
observed.
The researcher wishes to study the relationships between Hence there are p + q variables in each of n observation
the students variables and the Lecturers variables.
tkaranjah@2023
vectors.
tkaranjah@2023

Mean Vectors And Covariance Matrices For Subsets Of Variables Mean Vectors And Covariance Matrices For Subsets Of Variables

For the sample of n observation vectors, the mean To illustrate, let p = 2 and q = 3. Then
vector and covariance matrix have the form

we could also write

tkaranjah@2023 tkaranjah@2023

10
1/11/2023

Mean Vectors And Covariance Matrices For Subsets Of Variables Mean Vectors And Covariance Matrices For Subsets Of Variables

 Example 3.
 The following data In Table 3 contain measure of five
variables in a comparison of normal patients and diabetics
patients in an experimental set up. We give partial data for
normal patients only. The three variables of major interest
were

tkaranjah@2023 tkaranjah@2023

Mean Vectors And Covariance Matrices For Subsets Of Variables Mean Vectors And Covariance Matrices For Subsets Of Variables

Table 3. Relative Weight, Blood Glucose, and Insulin Levels

tkaranjah@2023 tkaranjah@2023

Mean Vectors And Covariance Matrices For Subsets Of Variables Mean Vectors And Covariance Matrices For Subsets Of Variables

The two additional variables of minor interest were

The mean vector, partitioned is

tkaranjah@2023 tkaranjah@2023

11
1/11/2023

Mean Vectors And Covariance Matrices For Subsets Of Variables Mean Vectors And Covariance Matrices For Subsets Of Variables

Three or More Subsets


The covariance matrix, partitioned is
In some cases, three or more subsets of variables are of
interest. If the observation vector y is partitioned as

where 𝒚𝟏 has p1 variables, 𝒚𝟐 has p2 , . . . , 𝒚𝒌 has pk ,


with p = p1 + p2 +· · ·+ pk , then the sample mean
vector and covariance matrix are given by
tkaranjah@2023 tkaranjah@2023

Mean Vectors And Covariance Matrices For Subsets Of Variables Mean Vectors And Covariance Matrices For Subsets Of Variables

The corresponding population results are

tkaranjah@2023 tkaranjah@2023

12

You might also like