SMA 2437 L1-2 Introduction

1/11/2023
• Introduction: Real life examples of multivariate data. The data

matrix. Calculation of summary statistics, mean vectors, covariance
and correlation matrices.
MULTIVARIATE METHODS • Linear Combination.
Lecture 1-2 • Multivariate Normal distributions and assumptions
Introduction • Test for mean vector: one-sample and Hotteling T tests based on
union intersection approach.
tkaranjah@2023 tkaranjah@2023
• Simultaneous confidence intervals for detecting important components. Introduction

Multivariate analysis consists of a collection of
• Testing equality of two population means/Paired Test. methods that can be used when several
measurements are made on each individual or
• Discriminant functions /Principal components
object in one or more samples.
• Distributions of linear functions of a random vector and of quadratic forms.
We will refer to the measurements as variables
Multivariate normal regression and correlation analysis. and to the individuals or objects as units
• Elements of multivariate analysis of variance. Use of computer packages.
(research units, sampling units, or experimental
tkaranjah@2023
units) or observations.
tkaranjah@2023
Introduction Introduction
Ordinarily the variables are measured simultaneously We seek to express what is going on in terms of a reduced
on each sampling unit. Typically, these variables are set of dimensions.
correlated. If this were not so, there would be little Such multivariate techniques are exploratory; they
use for many of the techniques of multivariate essentially generate hypotheses rather than test them.
analysis. On the other hand, if our goal is a formal hypothesis test,
we need a technique that will
(1) allow several variables to be tested and still preserve
We need to untangle the overlapping information
the significance level and
provided by correlated variables and peer beneath
the surface to see the underlying structure. Thus the (2) do this for any intercorrelation structure of the
goal of many multivariate approaches is variables.
simplification. Many such tests are available.
1
1/11/2023
Introduction Introduction
multivariate analysis is concerned generally with two areas, These linear functions may also be useful as a follow-up
descriptive and inferential statistics. In the descriptive to inferential procedures.
realm, we often obtain optimal linear combinations of
variables. When we have a statistically significant test result that
The optimality criterion varies from one technique to compares several groups, for example, we can find the
another, depending on the goal in each case. Although linear combination (or combinations) of variables that led
linear combinations may seem too simple to reveal the to rejection of the hypothesis.
underlying structure, we use them for two obvious reasons: Then the contribution of each variable to these linear
1. they have mathematical tractability (linear combinations is of interest.
approximations are used throughout all science for the
same reason) and
2. they often perform well in practice.
Mean And Variance Of A Univariate Random Variable

Introduction
In the inferential area, many multivariate techniques are Informally, a random variable may be defined as a
extensions of univariate procedures. variable whose value depends on the outcome of a
In such cases, we review the univariate procedure before chance experiment.
presenting the analogous multivariate approach. Generally, we will consider only continuous random
variables. Some types of multivariate data are only
approximations to this ideal, such as test scores or a
seven-point semantic differential (Likert) scale consisting
of ordered responses ranging from strongly disagree to
strongly agree.
Mean And Variance Of A Univariate Random Variable Mean And Variance Of A Univariate Random Variable
The density function f (y) indicates the relative frequency If f (y) is unknown, the population mean μ will ordinarily
of occurrence of the random variable y. remain unknown unless it has been established from
Thus, if f (𝒚𝟏 ) > f (𝒚𝟐 ), then points in the neighborhood of extensive past experience with a stable population.
𝒚𝟏 are more likely to occur than points in the If a large random sample from the population represented
neighborhood of 𝒚𝟐 .
by f (y) is available, it is highly probable that the mean of
The population mean of a random variable y is defined the sample is close to μ.
(informally) as the mean of all possible values of y and is
denoted by μ.
The mean is also referred to as the expected value of y,
or E(y). If the density f (y) is known, the mean can
sometimes be found using methods of calculus.
2
1/11/2023
The sample mean of a random sample of n observations However, is considered a good estimator for μ because
𝒚𝟏 , 𝒚𝟐 , . . . , 𝒚𝒏 is given by the ordinary arithmetic average E( ) = μ and var( ) = 𝜎 2 /n, where 𝜎 2 is the variance
of y. In other words, is an unbiased estimator of μ and
has a smaller variance than a single observation y.
Generally, will never be equal to μ; by this we If every y in the population is multiplied by a constant a,
the expected value is also multiplied by a:
mean that the probability is zero that a sample
will ever arise in which is exactly equal to μ.
The sample mean has a similar property The sample variance is defined as
The variance of the population is defined as which can be shown to be equal to
Mean And Variance Of A Univariate Random Variable Covariance And Correlation Of Bivariate Random Variables
The sample variance is generally never equal to the
population variance (the probability of such an Covariance
occurrence is zero), but it is an unbiased estimator, that If two variables x and y are measured on each research
is,
unit (object or subject), we have a bivariate random
variable (x, y).
If each y is multiplied by a constant a, then population Often x and y will tend to co-vary; if one is above its
variance is given as mean, the other is more likely to be above its mean,
and vice versa.
Similarly, if
For example, height and weight were observed for a
sample of 20 college-age males. The data are given in
then the sample variance of z is given by
Table 1.
3
1/11/2023
Covariance And Correlation Of Bivariate Random Variables Covariance And Correlation Of Bivariate Random Variables
Table 1. Height and Weight for a Sample of 20 College-age Males The values of height x and weight y from Table 1
are both plotted in the vertical direction in Figure1.
The tendency for x and y to stay on the same side
of the mean is clear in Figure 1.
Figure 1. Two variables with a tendency to covary.
To obtain the sample covariance for the height Now we have
and weight data in Table 1, we first calculate
By itself, the sample covariance 128.88 is not very

meaningful. We are not sure if this represents a small,
moderate, or large amount of relationship between y
and x.
A method of standardizing the covariance is given
next.
4
1/11/2023
Correlation To find a measure of linear relationship that is

Since the covariance depends on the scale of invariant to changes of scale, we can standardize
measurement of x and y, it is difficult to the covariance by dividing by the standard
compare covariances between different pairs of deviations of the two variables.
variables. For example, if we change a
measurement from inches to centimeters, the This standardized covariance is called a
covariance will change. correlation.
The population correlation of two random

variables x and y is and the sample correlation is
We obtain the correlation for the height and Then

weight data by first calculating the sample
variance of x:
and, similarly
Thus we have
5
1/11/2023
MEAN VECTORS MEAN VECTORS
Let y represent a random vector of p variables

measured on a sampling unit (subject or object). The sample mean vector can be found either as
If there are n individuals in the sample, the n the average of the n observation vectors or by
observation vectors are denoted by calculating the average of each of the p variables
separately:
𝒚𝟏 , 𝒚𝟐 , . . . , 𝒚𝒏 , where
All n observation vectors 𝒚𝟏 , 𝒚𝟐 , . . . , 𝒚𝒏 can be Since n is usually greater than p, the data can be
transposed to row vectors and listed in the data more conveniently tabulated by entering the
matrix Y as follows: observation vectors as rows rather than columns.
Note that the first subscript 𝑖 corresponds to units

(subjects or objects) and the second subscript 𝑗
refers to variables.
This convention will be followed whenever
possible
In addition to the two ways of calculating

For example, the second element of is
it may be given by obtain from Y. We sum the
𝑛 entries in each column of Y and divide by n,
which gives
, which can be
transposes to obtain
6
1/11/2023
In populations, the mean of y over all possible It can be shown that the expected value of each
values in the population is called the population
mean vector or expected value of y. It is defined
as a vector of expected values of each variable,
here 𝜇𝑗 is the population mean of the 𝑗 − 𝑡ℎ

variable. We emphasize again that y is never equal to

Table 2. Calcium in Soil and Turnip Greens
Example 2
Table 2 gives partial data from Kramer and Jensen
(1969a). Three variables were measured (in millie-
quivalents per 100 g) at 10 different locations in
the South. The variables are
MEAN VECTORS COVARIANCE MATRICES
To find the mean vector, we simply calculate the The sample covariance matrix
average of each column and obtain is the matrix of sample variances and covariances
of the p variables:
7
1/11/2023
COVARIANCE MATRICES COVARIANCE MATRICES
In S the sample variances of the p variables are on

the diagonal, and all possible pairwise sample
covariance's appear off the diagonal.
The j-th row (column) contains the covariance's of

𝑦𝑗 with the other 𝑝 − 1 variables then,
We obtain S by The first simply calculating


The sample covariance matrix S can also be
expressed in terms of the observation vectors: We can also obtain S directly from the data matrix
Y as;
If y is a random vector taking on any possible The population covariance matrix can also be
value in a multivariate population, the population found as
covariance matrix is defined as
the sample covariance matrix S is an unbiased

estimator for population covariance, i.e
8
1/11/2023

To calculate the sample covariance matrix for the
calcium data of Table 2 using these Continuing in this fashion, we obtain
computational, we need the sum of squares of
each column and the sum of products of each pair
of columns.
For instance the computation of s13.
CORRELATION MATRICES CORRELATION MATRICES
The sample correlation between the j-th and k-th

variables is defined as
The sample correlation matrix is analogous to the The second row, for example, contains the
covariance matrix with correlations in place of correlation of y2 with each of the y’s (including
covariance's: the correlation of y2 with itself, which is 1).
The correlation matrix can be obtained from the Then by use of matrix approach,
covariance matrix, and vice versa.
Define;
The population correlation matrix analogous is

defined as
9
1/11/2023
To obtain the sample correlation matrix for the

example 2, we can calculate the individual elements
or use the direct matrix operation in using the
diagonal matrix;
where
Note that .865 > .493 > .327, which is a different order
tkaranjah@2023
than that of the covariances in S
tkaranjah@2023
Mean Vectors And Covariance Matrices For Subsets Of Variables Mean Vectors And Covariance Matrices For Subsets Of Variables
Two Subsets We will denote the two sub-vectors by y and x, with p

variables in y and q variables in x. Thus each observation
Sometimes a researcher is interested in two different
vector in a sample is partitioned as
kinds of variables, both measured on the same sampling
unit.
For example, several classroom behaviors are observed
for students, and during the same time period (the basic
experimental unit) several Lecturers behaviors are also
observed.
The researcher wishes to study the relationships between Hence there are p + q variables in each of n observation
the students variables and the Lecturers variables.
tkaranjah@2023
vectors.
tkaranjah@2023
For the sample of n observation vectors, the mean To illustrate, let p = 2 and q = 3. Then
vector and covariance matrix have the form
we could also write
10
1/11/2023
 Example 3.
 The following data In Table 3 contain measure of five
variables in a comparison of normal patients and diabetics
patients in an experimental set up. We give partial data for
normal patients only. The three variables of major interest
were
Table 3. Relative Weight, Blood Glucose, and Insulin Levels
The two additional variables of minor interest were
The mean vector, partitioned is
11
1/11/2023
Three or More Subsets

The covariance matrix, partitioned is
In some cases, three or more subsets of variables are of
interest. If the observation vector y is partitioned as
where 𝒚𝟏 has p1 variables, 𝒚𝟐 has p2 , . . . , 𝒚𝒌 has pk ,

with p = p1 + p2 +· · ·+ pk , then the sample mean
vector and covariance matrix are given by
The corresponding population results are
12

SMA 2437 L1-2 Introduction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SMA 2437 L1-2 Introduction

Uploaded by

Copyright:

Available Formats

1/11/2023

• Introduction: Real life examples of multivariate data. The data

Lecture 1-2 • Multivariate Normal distributions and assumptions

union intersection approach.

• Simultaneous confidence intervals for detecting important components. Introduction

Mean And Variance Of A Univariate Random Variable

The variance of the population is defined as which can be shown to be equal to

By itself, the sample covariance 128.88 is not very

Correlation To find a measure of linear relationship that is

The population correlation of two random

We obtain the correlation for the height and Then

MEAN VECTORS MEAN VECTORS

Let y represent a random vector of p variables

MEAN VECTORS MEAN VECTORS

Note that the first subscript 𝑖 corresponds to units

MEAN VECTORS MEAN VECTORS

In addition to the two ways of calculating

MEAN VECTORS MEAN VECTORS

here 𝜇𝑗 is the population mean of the 𝑗 − 𝑡ℎ

MEAN VECTORS MEAN VECTORS

MEAN VECTORS COVARIANCE MATRICES

COVARIANCE MATRICES COVARIANCE MATRICES

In S the sample variances of the p variables are on

The j-th row (column) contains the covariance's of

We obtain S by The first simply calculating

COVARIANCE MATRICES COVARIANCE MATRICES

COVARIANCE MATRICES COVARIANCE MATRICES

the sample covariance matrix S is an unbiased

COVARIANCE MATRICES COVARIANCE MATRICES

CORRELATION MATRICES CORRELATION MATRICES

The sample correlation between the j-th and k-th

CORRELATION MATRICES CORRELATION MATRICES

The population correlation matrix analogous is

CORRELATION MATRICES CORRELATION MATRICES

To obtain the sample correlation matrix for the

Two Subsets We will denote the two sub-vectors by y and x, with p

we could also write

Table 3. Relative Weight, Blood Glucose, and Insulin Levels

The two additional variables of minor interest were

The mean vector, partitioned is

Three or More Subsets

where 𝒚𝟏 has p1 variables, 𝒚𝟐 has p2 , . . . , 𝒚𝒌 has pk ,

The corresponding population results are

You might also like