AMSM-Multivariate Descriptive Statistics - Chapter Six: October 2018

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/328631240

AMSM- Multivariate Descriptive Statistics- Chapter Six

Chapter · October 2018


DOI: 10.31219/osf.io/pcdxb

CITATIONS READS
0 131

1 author:

Mohammed Dahman
Kadir Has University
29 PUBLICATIONS   13 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

PROBABILITY & STATISTICS View project

Multivariate Statistical Modelling View project

All content following this page was uploaded by Mohammed Dahman on 31 October 2018.

The user has requested enhancement of the downloaded file.


Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb

1. Preface
This is the first chapter summary in the domain of multivariate analysis. We have introduced the concept of
multivariate analysis through data-matric, and variable vector. The transformation from univariate domain
and measurement of central tendency, as well as dispersion, to the multivariate dimension. Followed by the
definition and the calculation of mean vector. Furthermore, we have discussed the steps of finding the
covariance and correlation matrix between variables. Hands on practice was the last section of this chapter.
We have done few examples using R package to cover what we have learned till this point.

2. Introduction
In chapter two (Dahman, 2018a, p. 3) we have discussed, in a univariate domain, the central tendency (mean,
median, and mode). As well as, the spread or the dispersion (variance, and standard deviation). In a
multivariate domain, the measurement will have to change. For the central tendency we will measure (a
mean vector). And for the spread or the dispersion we will measure (a covariance matrix, and correlation
matrix). Before that, let me introduce a vector representation of words.

Assume we have identified a problem, yet we named (p) number of variables to understand the nature of
the problem. I can name my variables 𝑋1 , 𝑋2 , . . , 𝑋𝑗 , . . , 𝑋𝑝 . On each variable I can have (n) number of
observations. Rowwise, I can have a vector represents an observation in particular. Columnwise, I can have
a vector represents a variable in particular. See the table below.

𝑋1 𝑛⁄ 𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 . . 𝑿𝒑 𝑋𝑖1 𝑋1𝑗
𝑣
𝑋2 1 𝑋11 𝑋12 .. 𝑋1𝑗 . . 𝑋1𝑝 𝑋𝑖2 𝑋2𝑗
.. 2 𝑋21 𝑋22 .. 𝑋2𝑗 . . 𝑋2𝑝 .. ..
𝑋= 𝑋 𝑋𝑛𝑥𝑝 . . . . . .. .. .. .. .. 𝑋𝑖 = 𝑋 𝑋𝑗 =
𝑗 𝒊 𝑿𝒊𝟏 𝑋𝑖2 .. 𝑿𝒊𝒋 . . 𝑋𝑖𝑝 𝑖𝑗 𝑋3𝑗
.. .. .. .. .. .. .. .. .. ..
[𝑋𝑝 ] 𝑃𝑥1
[ 𝑛 𝑋𝑛1 𝑋𝑛2 .. 𝑋𝑛𝑗 . . 𝑋𝑛𝑝 ]𝑛𝑥𝑝 [𝑋𝑖𝑝 ] [𝑋𝑛𝑗 ]

Vector of My data-matrices (nxp) before I Rowwise of 𝑋𝑖 columnwise of 𝑋𝑗


variables collect the data. See the first title observation. I can extract variable. I can extract a
(px1) 𝑿𝟏 .. 𝑿𝒑 are the variables. Then, for a vector of a particular vector of a particular
𝑿𝟏 variable I have “n” observation. Example the variable. Example the
observations. 𝑿𝟏𝟏 , 𝑿𝟐𝟏 .., 𝑿𝒏𝟏 . 𝑿𝒊 observation. 𝑿𝒋 variable.
Likewise, for other variables.
Example, 𝑿𝒑 variable, has “n”
observations 𝑿𝟏𝒑 , 𝑿𝟐𝒑 ,.., 𝑿𝒏𝒑 .
Notice all the entries are un-know.

3. Mean Vector
Once, I start collecting the data, I will have the exact same picture as in the table above. The only striking
difference is that all the entries in my data-matrices will be known. Now, once you have your data-matrices
filled with known variables values (the population values). In other words, if my 𝑿𝒏𝒙𝒑 is the population,
P a g e 1|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb

then, if I calculate the mean of 𝑿𝟏 and 𝑿𝟐 .. 𝑿𝒑 , as a result, I will have a vector of the expected value of each
variable (i.e. the mean parameter). See the figure below.
𝜇1 𝐸(𝑋1 ) See I have the mean Notice that. 𝐸(𝑋) = ∑𝑎𝑙𝑙 𝑥 𝑥𝑓(𝑥) for the discrete
𝜇2 𝐸(𝑋2 ) parameter 𝜇1 for case. And, 𝐸(𝑋) = ∫+∞ 𝑥𝑓(𝑥) ⅆ𝑥 . For continues
.. .. −∞
𝜇= 𝜇 = variable one 𝑋1 . And so
𝑗 𝐸(𝑋𝑗 ) case.
.. .. on till 𝜇𝑝 for variable 𝑋𝑃
𝜇
[ 𝑝 ]𝑝𝑥1 [𝐸(𝑋𝑝 )]
However, if my 𝑿𝒏𝒙𝒑 is the sample, in which represents the population, then, if I calculate the mean of 𝑿 ̅𝟏
̅ 𝟐 .. 𝑿
and 𝑿 ̅ 𝒑 , as a result, I will have a vector of the expected value of each variable (i.e. the mean statistic).
See the figure below.
𝑛
1 Notice that from univariate perspective we know that:
∑ 𝑥𝑖1 𝟏
𝑛 𝒙 ̂ = ∑𝒏𝒊=𝟏 𝒙𝒊 .
̅=𝝁
𝑖=1 𝒏
𝑛 Just apply this formula for each variable you have.
𝑥̅1 1
𝑥̅2 ∑ 𝑥𝑖2
𝑛
.. 𝑖=1
..
𝑥̅ = 𝑥̅ = 𝑛
𝑗 1
.. ∑ 𝑥𝑖𝑗
𝑛
[𝑥̅𝑝 ] 𝑖=1
𝑝𝑥1 ..
𝑛
1
∑ 𝑥𝑖𝑝
[𝑛 𝑖=1 ]

However, we don’t want to go for a single calculation, of each mean statistic, for each variable. We want to
do a matrix calculation. See the steps from matrix point of view:

̅ for each variable.


1. look at the matrix below (the sample data-matrices) is (nxp). What we want is to find 𝒙
Note that 𝒙̅ is a vector mean (px1). To reach from (nxp) to (px1).
2. Transpose the data-matrices (nxp) (i.e. (𝒏𝒙𝒑)𝑻 = (𝒑𝒙𝒏))
3. create a vector (nx1) of ONEs.
4. Multiply (𝒏𝒙𝒑)𝑻 . (𝒏𝒙𝟏), the result is (𝒑𝒙𝟏).
5. Finally, multiply the final product (px1) by 1/n.

𝑛⁄ 𝑿𝟏 𝑿𝟐 . . 𝑿𝒋 .. 𝑿𝒑 𝑛⁄ 1 𝟐 .. 𝒊 .. 𝒏 1 𝑥̅1
𝑣 𝑣
1 𝑋11 𝑋12 . . 𝑋1𝑗 .. 𝑋1𝑝 𝑿𝟏 𝑋11 𝑋21 .. 𝑋𝑖1 .. 𝑋𝑛1 1 𝑥̅2
2 𝑋21 𝑋22 . . 𝑋2𝑗 .. 𝑋2𝑝 𝑿𝟐 𝑋12 𝑋22 .. 𝑋𝑗2 .. 𝑋𝑛2 ..
..
𝑋𝑛𝑥𝑝 . . . . . .. .. .. .. .. 𝑇
𝑋𝑝𝑥𝑛 ... .. .. .. .. .. .. 1= 𝑥̅ = 1/𝑛 𝑥̅
𝑿𝒋 𝑿𝟏𝒋 𝑋2𝑗 .. 𝑿𝒊𝒋 .. 𝑋𝑛𝑗 1 𝑗
𝒊 𝑿𝒊𝟏 𝑋𝑖2 . . 𝑿𝒊𝒋 .. 𝑋𝑖𝑝
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
[ 𝑛 𝑋𝑛1 𝑋𝑛2 . . 𝑋𝑛𝑗 .. 𝑋𝑛𝑝 ]𝑛𝑥𝑝 [ 𝑿𝒑 𝑋1𝑝 𝑋2𝑝 .. 𝑋𝑗𝑝 .. 𝑋𝑛𝑝 ] [1]𝑛𝑥1 [𝑥̅𝑝 ]
𝑝𝑥𝑛 𝑝𝑥1

If you are not familiar with matrix multiplication you may refer to summary papers I wrote on Advanced
Matrix Theory & linear Algebra (Dahman, 2016). See, 𝑥̅1 = (𝑋11 ∗ 1) + (𝑋21 ∗ 1)+. . +(𝑋𝑛1 ∗ 1). The final

P a g e 2|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb

𝟏
value of this calculation will be multiplied by 1/n. In fact, what we have done is just following ∑𝒏𝒊=𝟏 𝒙𝒊 . See
𝒏
the summation terms and finally the value multiplied by 1/n. To conclude, what we have done is just some
matrix manipulation instead of using a single calculation. That offers us more productivity and less expensive
calculation process.

4. Covariance Matrix
First, let’s start from population point of view (i.e. my 𝑿𝒏𝒙𝒑 is the population). That means, a population
covariance matrix. Let me remind you, from a univariate perspective, what is variance of variable (𝒙𝒋 ).
it’s 𝑽(𝒙𝒋 ) = 𝑬(𝒙𝒋 − 𝝁𝒋 )𝟐 = 𝝈𝒋 𝟐 . In 𝑿𝒏𝒙𝒑 I have 𝑿𝟏 , 𝑿𝟐 , . . , 𝑿𝒋 , . . 𝑿𝒑 variables. That means, 𝑿𝟏 is 𝝈𝟐𝟏 . Similarly, 𝑿𝟐
is 𝝈𝟐𝟐 all to the way till 𝑿𝒑 is 𝝈𝟐𝒑 . The question is, what is the relationship between 𝑿𝟏 , and 𝑿𝟐 ? In a general
form 𝑿𝒋 , and 𝑿𝒌 ? This relationship is known as the “covariance” between two variables. How to find the
covariance between two variables? It’s 𝒄𝒐𝒗(𝒙𝒋 , 𝒙𝒌 ) = 𝑬(𝒙𝒋 − 𝝁𝒋 )(𝒙𝒌 − 𝝁𝒌 ) = 𝝈𝒋𝒌 𝟐 . Let me show you now
the pattern:

Finding the variance of a variable; 𝑽(𝒙𝒋 ). 𝑬(𝒙𝒋 − 𝝁𝒋 )𝟐 = (𝒙𝒋 − 𝝁𝒋 )(𝒙𝒋 − 𝝁𝒋 ) = 𝝈𝒋 𝟐 = 𝝈𝒋𝒋


Finding the covariance between two variables; 𝒄𝒐𝒗(𝒙𝒋 , 𝒙𝒌 ) 𝑬(𝒙𝒋 − 𝝁𝒋 )(𝒙𝒌 − 𝝁𝒌 ) = 𝝈𝒋𝒌
Put all that together, you will find the scenario of variance is different from mean. In mean multivariate case
we have a vector. However, in variance multivariate case we have a matrix. That make sense if you
understand 𝝈𝒋𝒋 and 𝝈𝒋𝒌 . Now, if I have (p) number of variables then I will have (pxp) matrix. Where, the
diagonal will be the variance of the corresponding variable and above and below the diagonal will be the
corresponding covariance between two variables. See the table below: see 𝜎22 is the variance of 𝑋1 , and
𝜎12 is the covariance between 𝑋1 , 𝑎𝑛ⅆ 𝑋2 . Note: (1) this matrix is symmetric one. (2) the left side is
population data-matrices; the right side is for a sample data-matrices.
𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 .. 𝑿𝒑 𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 .. 𝑿𝒑
𝑿𝟏 𝝈𝟏𝟏 𝜎12 .. 𝜎1𝑗 .. 𝜎1𝑝 𝑿𝟏 𝒔𝟏𝟏 𝑠12 .. 𝑠1𝑗 .. 𝑠1𝑝
𝑿𝟐 𝜎12 𝝈𝟐𝟐 .. 𝜎2𝑗 .. 𝜎2𝑝 𝑿𝟐 𝑠12 𝒔𝟐𝟐 .. 𝑠2𝑗 .. 𝑠2𝑝
... .. .. .. .. .. .. ... .. .. .. .. .. ..
𝑿𝒋 𝜎𝟏𝒋 𝜎2𝑗 .. 𝝈𝒋𝒋 .. 𝜎𝑗𝑝 𝑿𝒋 𝒔𝟏𝒋 𝑠2𝑗 .. 𝒔𝒋𝒋 .. 𝑠𝑗𝑝
.. .. .. .. .. .. .. .. .. .. .. .. .. ..
[𝑿𝒑 𝜎1𝑝 𝜎2𝑝 .. 𝜎𝑗𝑝 .. 𝝈𝒑𝒑 ]
𝑝𝑥𝑝
[𝑿𝒑 𝑠1𝑝 𝑠2𝑝 .. 𝑠𝑗𝑝 .. 𝒔𝒑𝒑 ]
𝑝𝑥𝑝

Now, in case of my data-matrices is a sample one (the right side of the table). Findings:
𝑛 𝑛
Finding 𝒔𝒋𝒋 . 1
𝑠𝑗𝑗 = 𝑛−1 ∑
2
(𝑥𝑖𝑗 − 𝑥̅𝑗 ) = 𝑛−1 ∑
1
(𝑥𝑖𝑗 − 𝑥̅𝑗 )(𝑥𝑖𝑗 − 𝑥̅𝑗 ).
𝑖=1 𝑖=1
𝑛
Finding 𝒔𝒋𝒌 1
𝑠𝑗𝑘 = ∑(𝑥𝑖𝑗 − 𝑥̅𝑗 )(𝑥𝑖𝑘 − 𝑥̅𝑘 )
𝑛−1
𝑖=1
Continue, again as we aimed in the calculation of mean vector, by not to use a single calculation, we will use
matrix calculation to find the variance and covariance matrix. The steps are going to be as following:

1. First from the data-matric (nxp) we calculate 𝑥̅ for all the variables.

2. Then we create a new value by subtracting each corresponding value from its mean statistic. 𝑥𝑖𝑗 = 𝑥𝑖𝑗 −
̅𝑗 .
𝑥

P a g e 3|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb


3. That will lead to a new matrix name 𝑥𝑛𝑥𝑝 .
∗ ∗ 𝑇
4. Do the transpose of 𝑥𝑛𝑥𝑝 , to get (𝑥𝑝𝑥𝑛 ) ,

5. Multiply (𝑥𝑝𝑥𝑛 )𝑇 by (𝑥𝑛𝑥𝑝

) to get a new matrix 𝑠𝑝𝑥𝑝 .

6. Multiply 𝑠𝑝𝑥𝑝 by 1/n, the result is the covariance matrix,

𝑿𝟏 𝑿𝟐 . . 𝑿𝒋 .. 𝑿𝒑 𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 .. 𝑿𝒑 ∗
Note: 𝑥𝑖𝑗 =
∗ ∗ 𝑥∗1𝑗 𝑥∗1𝑝
1 𝑋11 𝑋12 . . 𝑋1𝑗 .. 𝑋1𝑝
1 𝑥11 𝑥12 .. .. 𝑥𝑖𝑗 − 𝒙̅𝒋
2 𝑋21 𝑋22 . . 𝑋2𝑗 .. 𝑋2𝑝 ∗
2 𝑥21 ∗
𝑥22 .. 𝑥∗2𝑗 .. 𝑥∗2𝑝
𝑋𝑛𝑥𝑝 . . . . . .. .. .. .. .. ∗
𝑥𝑛𝑥𝑝 ... .. .. .. .. .. ..
𝒊 𝑿𝒊𝟏 𝑋𝑖2 . . 𝑿𝒊𝒋 .. 𝑋𝑖𝑝 𝒊 𝑥∗𝑖1 𝑥∗𝑖2 .. 𝑥∗𝑖𝑗 .. ∗
𝑥𝑖𝑝
.. .. .. .. .. .. .. .. .. .. .. .. .. ..
[ 𝑛 𝑋𝑛1 𝑋𝑛2 . . 𝑋𝑛𝑗 .. 𝑋𝑛𝑝 ]𝑛𝑥𝑝 ∗ 𝑥∗𝑛2 ∗ ∗
[ 𝑛 𝑥𝑛1 .. 𝑥𝑛𝑗 .. 𝑥𝑛𝑝 ]
̅𝟏
𝒙 ̅𝟐
𝒙 ̅𝒋
𝒙 ̅𝒑
𝒙 𝑛𝑥𝑝

𝟏 𝟐 . . 𝒊 . . 𝑿𝒏 𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 . . 𝑿𝒑
∗ ∗ ∗ ∗ 𝑿𝟏 𝒔𝟏𝟏 𝑠12 .. 𝑠1𝑗 . . 𝑠1𝑝
𝑿𝟏 𝑥11 𝑥21 . . 𝑥𝑖1 . . 𝑥𝑛1
𝑿𝟐 𝑥 ∗ ∗ ∗ ∗ 𝑿𝟐 𝑠12 𝒔𝟐𝟐 .. 𝑠2𝑗 . . 𝑠2𝑝
∗ 12 𝑥22 . . 𝑥𝑖2 . . 𝑥𝑛2 ∗
(𝑥𝑝𝑥𝑛
𝑇 ∗
) ∗ 𝑥𝑛𝑥𝑝 = 1/𝑛 . . . . . .. .. .. .. ..
(𝑥𝑝𝑥𝑛 )𝑇 . . . . . .. .. .. .. ..
𝑿𝒋 𝑥 ∗ ∗ 𝑿𝒋 𝒔𝟏𝒋 𝑠2𝑗 .. 𝒔𝒋𝒋 . . 𝑠𝑗𝑝
1𝑗 𝑥2𝑗 .. 𝑥∗𝑖𝑗 .. 𝑥∗𝑛𝑗 .. .. .. .. .. .. ..
.. .. .. .. .. .. ..
∗ ∗ 𝑥∗𝑖𝑝 ∗ [𝑿𝒑 𝑠1𝑝 𝑠2𝑝 .. 𝑠𝑗𝑝 . . 𝒔𝒑𝒑 ]
[𝑿𝒑 𝑥1𝑝 𝑥2𝑝 .. .. 𝑥𝑛𝑝 ] 𝑝𝑥𝑛
𝑝𝑥𝑝

Again if you are not familiar with matrix multiplication you may refer to (Dahman, 2016) and find the
summary papers on Advanced Matrix Theory & linear Algebra.

5. Correlation Matrix
Having you understood the concept and calculation of mean vector and covariance matrix. Then you are
ready to understand the meaning of correlation matrix. They symbol of correlation is denoted by 𝝆
pronounced (Rho (/roʊ/). This matrix will be (pxp). Notice that the diagonal will be (1). It’s a symmetric and
upper or lower the diagonal represents the correlation between two variables. The calculation of correlation
𝒄𝒐𝒗(𝒙𝒋 ,𝒙𝒌 )
between two variables 𝒄𝒐𝒓𝒓(𝒙𝒋 , 𝒙𝒌 ) = , note: 𝜎 is the standard deviation. Now see the matrix:
𝜎𝑗 ∗ 𝜎𝑘

𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 .. 𝑿𝒑 Suppose I want to find: Notice that, if 𝜌𝑗𝑘 = 1 that


𝑿𝟏 𝟏 𝜎12 .. 𝜎1𝑗 .. 𝜎1𝑝 𝜎𝑗𝑘 indicates a strong positive
𝜌𝑗𝑘 =
𝑿𝟐 𝜌12 𝟏 .. 𝜎2𝑗 .. 𝜎2𝑝 𝜎𝑗 𝜎𝑘 correlation.
... .. .. .. .. .. .. Suppose I want to find: That’s 𝜌𝑗𝑘 = −1 that indicates a strong
𝑿𝒋 𝜌𝟏𝒋 𝜎2𝑗 .. 𝟏 .. 𝜎𝑗𝑝
.. .. why diagonal is “1” if the negative correlation.
.. .. .. .. ..
parameter corresponds to its 𝜌𝑗𝑘 = 0 that indicates there is no
[𝑿𝒑 𝜌1𝑝 𝜌2𝑝 .. 𝜌𝑗𝑝 .. 𝟏 ]𝑝𝑥𝑝
self relation between j, and k
𝜎𝑗𝑗 𝜎2
𝜌𝑗𝑗 = = 2=1
𝜎𝑗 𝜎𝑗 𝜎

Please notice: 𝝆 represents population parameter. but 𝒓 will represent sample statistic. Now, how do we
calculate correlation matrix. The steps are:

1. First from the data-matric (nxp) we calculate 𝑥̅ and 𝑠𝑗𝑗 for all the variables

P a g e 4|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb


2. Then we create a new value by subtracting each corresponding value from its mean statistic. 𝑥𝑖𝑗 = 𝑥𝑖𝑗 −
𝑥̅𝑗 . And divide the value by √𝑠𝑗𝑗
3. That will lead to a new matrix name 𝑥̂𝑛𝑥𝑝 .
4. Do the transpose of 𝑥̂𝑛𝑥𝑝 , to get (𝑥̂𝑛𝑥𝑝 )𝑇 ,
5. Multiply (𝑥̂𝑛𝑥𝑝 )𝑇 by (𝑥̂𝑛𝑥𝑝 ) and the result by 1/n; to get a new matrix 𝑅𝑝𝑥𝑝 .

6. DSR
So far, we are able to build three (pxp) matrices.
𝑿𝟏 𝑿𝟐 . . 𝑿𝒋 .. 𝑿𝒑 𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 .. 𝑿𝒑 𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 .. 𝑿𝒑
𝑿𝟏 𝒔𝟏𝟏 𝑠12 . . 𝑠1𝑗 .. 𝑠1𝑝 𝑿𝟏 𝟏 𝑟12 .. 𝑟1𝑗 .. 𝑟1𝑝 𝑿𝟏 𝒔𝟏𝟏 0 .. 0 .. 0
𝑿𝟐 𝑠12 𝒔𝟐𝟐 . . 𝑠2𝑗 .. 𝑠2𝑝 𝑿𝟐 𝑟12 𝑿𝟐 0 0 .. 0
𝟏 .. 𝑟2𝑗 .. 𝑟2𝑝 𝒔𝟐𝟐 ..
𝑆 = ... .. .. .. .. .. .. 𝑅 = ... .. .. .. .. .. .. 𝐷 = ... .. .. .. .. .. ..
𝑿𝒋 𝒔𝟏𝒋 𝑠2𝑗 . . 𝒔𝒋𝒋 .. 𝑠𝑗𝑝 𝑿𝒋 𝒓𝟏𝒋 𝑿𝒋 𝟎 0 .. 0
𝑟2𝑗 .. 𝟏 .. 𝑟𝑗𝑝 .. 𝒔𝒋𝒋
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. ..
[𝑿𝒑 𝑠1𝑝 𝑠2𝑝 . . 𝑠𝑗𝑝 .. 𝒔𝒑𝒑 ] 𝑿 𝟏 ]𝑝𝑥𝑝 𝑿
[ 𝒑 0 0 .. 0 . . 𝒔𝒑𝒑 ]
𝑝𝑥𝑝 [ 𝒑 𝑟1𝑝 𝑟2𝑝 .. 𝑟𝑗𝑝 .. 𝑝𝑥𝑝

You will find that 𝑹 = 𝐷−1/2 ∗ 𝑆 ∗ 𝐷−1/2 . And 𝑺 = 𝐷1/2 ∗ 𝑅 ∗ 𝐷1/2 .

7. Sum Square and Cross Product Matrix (SSCP)


Please keep in mind this:

(𝒏 − 𝟏)𝑺 = (𝑋 ∗ )𝑇 𝑋 ∗
̃ )𝑇 X
(𝒏 − 𝟏)𝑹 = (X ̃

̃ are transformed matrices from the original data-matrix.


And you should know now that 𝑋 ∗ as well as X

̃ )𝑇 X
Now what is SSCP. It’s in a nutshell, (𝑋 ∗ )𝑇 𝑋 ∗ ; (X ̃; (𝑋)𝑇 X. keep in mind these three products.

8. Hands on Practice
1. Mean Vector: I will use a simple example to find the mean victor by using R. you can use MATLAB or
excel or any other available software packages.
Import DS from (Dahman, 2018b) ds<-read.csv("https://mfr.ca-1.osf.io/render?url=https://osf.io/k3v2r/?
action=download%26mode=render",header=TRUE,sep=",")

Create vector of ONEs ones<-rep(1,308)


Transpose of the DS dstrans<-t(ds)
Multiply the transposed product by the meanvect<-dstrans%*%ones
ones vector
Find the final mean vector dsmeanvector<-(1/308)*meanvect

P a g e 5|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb

The result is a mean vector of the six


variables.

2. Covariance Matrix: I will use the same dataset to find the covariance by using R. you can use MATLAB or
excel or any other available software packages
Import DS from (Dahman, 2018b) ds<-read.csv("https://mfr.ca-1.osf.io/render?url=https://osf.io/k3v2r/?
action=download%26mode=render",header=TRUE,sep=",")

Use Cov function cov(ds)


Result is a covariance matrix. See the diameter depth length weight curve Resistance
diagonal is the variance of the diameter 5.424243e-04 -2.729731e-04 4.334051e-03 -5.007373e-04 -8.277195e-24 -0.010087410
corresponding variable. Above and below depth -2.729731e-04 6.403787e-02 5.227447e-02 4.244711e-02 6.621756e-23 -0.011384202
is the covariance between two variables. length 4.334051e-03 5.227447e-02 3.005156e-01 -5.169159e-02 3.090153e-22 -0.103230456
weight -5.007373e-04 4.244711e-02 -5.169159e-02 6.150320e-02 1.158807e-22 -0.003855537
curve -8.277195e-24 6.621756e-23 3.090153e-22 1.158807e-22 1.018933e-02 1.239711319
Resistance -1.008741e-02 -1.138420e-02 -1.032305e-01 -3.855537e-03 1.239711e+00 229.840460133

Correlation Matrix: I will use the same dataset to find the correlation matrix by using R. you can use MATLAB
or excel or any other available software packages
Import DS from (Dahman, 2018b) ds<-read.csv("https://mfr.ca-1.osf.io/render?url=https://osf.io/k3v2r/
?action=download%26mode=render",header=TRUE,sep=",")

Use cor function cor(ds)


Result is a correlation matrix. See the diagonal diameter depth length weight curve Resistance
is ONE. Above and below is the correlation diameter 1.000000e+00 -4.631607e-02 3.394618e-01 -8.669450e-02 -3.520795e-21 -0.028569120
between two variables. depth -4.631607e-02 1.000000e+00 3.768233e-01 6.763646e-01 2.592280e-21 -0.002967365
length 3.394618e-01 3.768233e-01 1.000000e+00 -3.802223e-01 5.584362e-21 -0.012421130
weight -8.669450e-02 6.763646e-01 -3.802223e-01 1.000000e+00 4.629025e-21 -0.001025470
curve -3.520795e-21 2.592280e-21 5.584362e-21 4.629025e-21 1.000000e+00 0.810092224
Resistance -2.856912e-02 -2.967365e-03 -1.242113e-02 -1.025470e-03 8.100922e-01 1.000000000

Observe that the correlation between the variables is not exist. Example, correlation between depth and diameter is -4.631*10-2 = -0.04631. that means, t
here is no correlation exist. You can plot this to visualize.
plot(ds$diameter,ds$depth)

P a g e 6|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb

P a g e 7|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb

• Dahman, M. R. (2016). AoDS Solution. Retrieved from https://bizmodule.blogspot.com/


• Dahman, M. R. (2018a). AMSM- Univariate Descriptive Statistics-Chapter Two. OSF Preprints.
https://doi.org/10.31219/OSF.IO/THD3C
• Dahman, M. R. (2018b). Applied Multivariate Statistical Modeling. OSF. https://doi.org/10.17605/OSF.IO/3X76A

P a g e 8|8

View publication stats

You might also like