Professional Documents
Culture Documents
AMSM-Multivariate Descriptive Statistics - Chapter Six: October 2018
AMSM-Multivariate Descriptive Statistics - Chapter Six: October 2018
AMSM-Multivariate Descriptive Statistics - Chapter Six: October 2018
net/publication/328631240
CITATIONS READS
0 131
1 author:
Mohammed Dahman
Kadir Has University
29 PUBLICATIONS 13 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Mohammed Dahman on 31 October 2018.
1. Preface
This is the first chapter summary in the domain of multivariate analysis. We have introduced the concept of
multivariate analysis through data-matric, and variable vector. The transformation from univariate domain
and measurement of central tendency, as well as dispersion, to the multivariate dimension. Followed by the
definition and the calculation of mean vector. Furthermore, we have discussed the steps of finding the
covariance and correlation matrix between variables. Hands on practice was the last section of this chapter.
We have done few examples using R package to cover what we have learned till this point.
2. Introduction
In chapter two (Dahman, 2018a, p. 3) we have discussed, in a univariate domain, the central tendency (mean,
median, and mode). As well as, the spread or the dispersion (variance, and standard deviation). In a
multivariate domain, the measurement will have to change. For the central tendency we will measure (a
mean vector). And for the spread or the dispersion we will measure (a covariance matrix, and correlation
matrix). Before that, let me introduce a vector representation of words.
Assume we have identified a problem, yet we named (p) number of variables to understand the nature of
the problem. I can name my variables 𝑋1 , 𝑋2 , . . , 𝑋𝑗 , . . , 𝑋𝑝 . On each variable I can have (n) number of
observations. Rowwise, I can have a vector represents an observation in particular. Columnwise, I can have
a vector represents a variable in particular. See the table below.
𝑋1 𝑛⁄ 𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 . . 𝑿𝒑 𝑋𝑖1 𝑋1𝑗
𝑣
𝑋2 1 𝑋11 𝑋12 .. 𝑋1𝑗 . . 𝑋1𝑝 𝑋𝑖2 𝑋2𝑗
.. 2 𝑋21 𝑋22 .. 𝑋2𝑗 . . 𝑋2𝑝 .. ..
𝑋= 𝑋 𝑋𝑛𝑥𝑝 . . . . . .. .. .. .. .. 𝑋𝑖 = 𝑋 𝑋𝑗 =
𝑗 𝒊 𝑿𝒊𝟏 𝑋𝑖2 .. 𝑿𝒊𝒋 . . 𝑋𝑖𝑝 𝑖𝑗 𝑋3𝑗
.. .. .. .. .. .. .. .. .. ..
[𝑋𝑝 ] 𝑃𝑥1
[ 𝑛 𝑋𝑛1 𝑋𝑛2 .. 𝑋𝑛𝑗 . . 𝑋𝑛𝑝 ]𝑛𝑥𝑝 [𝑋𝑖𝑝 ] [𝑋𝑛𝑗 ]
3. Mean Vector
Once, I start collecting the data, I will have the exact same picture as in the table above. The only striking
difference is that all the entries in my data-matrices will be known. Now, once you have your data-matrices
filled with known variables values (the population values). In other words, if my 𝑿𝒏𝒙𝒑 is the population,
P a g e 1|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb
then, if I calculate the mean of 𝑿𝟏 and 𝑿𝟐 .. 𝑿𝒑 , as a result, I will have a vector of the expected value of each
variable (i.e. the mean parameter). See the figure below.
𝜇1 𝐸(𝑋1 ) See I have the mean Notice that. 𝐸(𝑋) = ∑𝑎𝑙𝑙 𝑥 𝑥𝑓(𝑥) for the discrete
𝜇2 𝐸(𝑋2 ) parameter 𝜇1 for case. And, 𝐸(𝑋) = ∫+∞ 𝑥𝑓(𝑥) ⅆ𝑥 . For continues
.. .. −∞
𝜇= 𝜇 = variable one 𝑋1 . And so
𝑗 𝐸(𝑋𝑗 ) case.
.. .. on till 𝜇𝑝 for variable 𝑋𝑃
𝜇
[ 𝑝 ]𝑝𝑥1 [𝐸(𝑋𝑝 )]
However, if my 𝑿𝒏𝒙𝒑 is the sample, in which represents the population, then, if I calculate the mean of 𝑿 ̅𝟏
̅ 𝟐 .. 𝑿
and 𝑿 ̅ 𝒑 , as a result, I will have a vector of the expected value of each variable (i.e. the mean statistic).
See the figure below.
𝑛
1 Notice that from univariate perspective we know that:
∑ 𝑥𝑖1 𝟏
𝑛 𝒙 ̂ = ∑𝒏𝒊=𝟏 𝒙𝒊 .
̅=𝝁
𝑖=1 𝒏
𝑛 Just apply this formula for each variable you have.
𝑥̅1 1
𝑥̅2 ∑ 𝑥𝑖2
𝑛
.. 𝑖=1
..
𝑥̅ = 𝑥̅ = 𝑛
𝑗 1
.. ∑ 𝑥𝑖𝑗
𝑛
[𝑥̅𝑝 ] 𝑖=1
𝑝𝑥1 ..
𝑛
1
∑ 𝑥𝑖𝑝
[𝑛 𝑖=1 ]
However, we don’t want to go for a single calculation, of each mean statistic, for each variable. We want to
do a matrix calculation. See the steps from matrix point of view:
𝑛⁄ 𝑿𝟏 𝑿𝟐 . . 𝑿𝒋 .. 𝑿𝒑 𝑛⁄ 1 𝟐 .. 𝒊 .. 𝒏 1 𝑥̅1
𝑣 𝑣
1 𝑋11 𝑋12 . . 𝑋1𝑗 .. 𝑋1𝑝 𝑿𝟏 𝑋11 𝑋21 .. 𝑋𝑖1 .. 𝑋𝑛1 1 𝑥̅2
2 𝑋21 𝑋22 . . 𝑋2𝑗 .. 𝑋2𝑝 𝑿𝟐 𝑋12 𝑋22 .. 𝑋𝑗2 .. 𝑋𝑛2 ..
..
𝑋𝑛𝑥𝑝 . . . . . .. .. .. .. .. 𝑇
𝑋𝑝𝑥𝑛 ... .. .. .. .. .. .. 1= 𝑥̅ = 1/𝑛 𝑥̅
𝑿𝒋 𝑿𝟏𝒋 𝑋2𝑗 .. 𝑿𝒊𝒋 .. 𝑋𝑛𝑗 1 𝑗
𝒊 𝑿𝒊𝟏 𝑋𝑖2 . . 𝑿𝒊𝒋 .. 𝑋𝑖𝑝
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
[ 𝑛 𝑋𝑛1 𝑋𝑛2 . . 𝑋𝑛𝑗 .. 𝑋𝑛𝑝 ]𝑛𝑥𝑝 [ 𝑿𝒑 𝑋1𝑝 𝑋2𝑝 .. 𝑋𝑗𝑝 .. 𝑋𝑛𝑝 ] [1]𝑛𝑥1 [𝑥̅𝑝 ]
𝑝𝑥𝑛 𝑝𝑥1
If you are not familiar with matrix multiplication you may refer to summary papers I wrote on Advanced
Matrix Theory & linear Algebra (Dahman, 2016). See, 𝑥̅1 = (𝑋11 ∗ 1) + (𝑋21 ∗ 1)+. . +(𝑋𝑛1 ∗ 1). The final
P a g e 2|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb
𝟏
value of this calculation will be multiplied by 1/n. In fact, what we have done is just following ∑𝒏𝒊=𝟏 𝒙𝒊 . See
𝒏
the summation terms and finally the value multiplied by 1/n. To conclude, what we have done is just some
matrix manipulation instead of using a single calculation. That offers us more productivity and less expensive
calculation process.
4. Covariance Matrix
First, let’s start from population point of view (i.e. my 𝑿𝒏𝒙𝒑 is the population). That means, a population
covariance matrix. Let me remind you, from a univariate perspective, what is variance of variable (𝒙𝒋 ).
it’s 𝑽(𝒙𝒋 ) = 𝑬(𝒙𝒋 − 𝝁𝒋 )𝟐 = 𝝈𝒋 𝟐 . In 𝑿𝒏𝒙𝒑 I have 𝑿𝟏 , 𝑿𝟐 , . . , 𝑿𝒋 , . . 𝑿𝒑 variables. That means, 𝑿𝟏 is 𝝈𝟐𝟏 . Similarly, 𝑿𝟐
is 𝝈𝟐𝟐 all to the way till 𝑿𝒑 is 𝝈𝟐𝒑 . The question is, what is the relationship between 𝑿𝟏 , and 𝑿𝟐 ? In a general
form 𝑿𝒋 , and 𝑿𝒌 ? This relationship is known as the “covariance” between two variables. How to find the
covariance between two variables? It’s 𝒄𝒐𝒗(𝒙𝒋 , 𝒙𝒌 ) = 𝑬(𝒙𝒋 − 𝝁𝒋 )(𝒙𝒌 − 𝝁𝒌 ) = 𝝈𝒋𝒌 𝟐 . Let me show you now
the pattern:
Now, in case of my data-matrices is a sample one (the right side of the table). Findings:
𝑛 𝑛
Finding 𝒔𝒋𝒋 . 1
𝑠𝑗𝑗 = 𝑛−1 ∑
2
(𝑥𝑖𝑗 − 𝑥̅𝑗 ) = 𝑛−1 ∑
1
(𝑥𝑖𝑗 − 𝑥̅𝑗 )(𝑥𝑖𝑗 − 𝑥̅𝑗 ).
𝑖=1 𝑖=1
𝑛
Finding 𝒔𝒋𝒌 1
𝑠𝑗𝑘 = ∑(𝑥𝑖𝑗 − 𝑥̅𝑗 )(𝑥𝑖𝑘 − 𝑥̅𝑘 )
𝑛−1
𝑖=1
Continue, again as we aimed in the calculation of mean vector, by not to use a single calculation, we will use
matrix calculation to find the variance and covariance matrix. The steps are going to be as following:
1. First from the data-matric (nxp) we calculate 𝑥̅ for all the variables.
∗
2. Then we create a new value by subtracting each corresponding value from its mean statistic. 𝑥𝑖𝑗 = 𝑥𝑖𝑗 −
̅𝑗 .
𝑥
P a g e 3|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb
∗
3. That will lead to a new matrix name 𝑥𝑛𝑥𝑝 .
∗ ∗ 𝑇
4. Do the transpose of 𝑥𝑛𝑥𝑝 , to get (𝑥𝑝𝑥𝑛 ) ,
∗
5. Multiply (𝑥𝑝𝑥𝑛 )𝑇 by (𝑥𝑛𝑥𝑝
∗
) to get a new matrix 𝑠𝑝𝑥𝑝 .
𝑿𝟏 𝑿𝟐 . . 𝑿𝒋 .. 𝑿𝒑 𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 .. 𝑿𝒑 ∗
Note: 𝑥𝑖𝑗 =
∗ ∗ 𝑥∗1𝑗 𝑥∗1𝑝
1 𝑋11 𝑋12 . . 𝑋1𝑗 .. 𝑋1𝑝
1 𝑥11 𝑥12 .. .. 𝑥𝑖𝑗 − 𝒙̅𝒋
2 𝑋21 𝑋22 . . 𝑋2𝑗 .. 𝑋2𝑝 ∗
2 𝑥21 ∗
𝑥22 .. 𝑥∗2𝑗 .. 𝑥∗2𝑝
𝑋𝑛𝑥𝑝 . . . . . .. .. .. .. .. ∗
𝑥𝑛𝑥𝑝 ... .. .. .. .. .. ..
𝒊 𝑿𝒊𝟏 𝑋𝑖2 . . 𝑿𝒊𝒋 .. 𝑋𝑖𝑝 𝒊 𝑥∗𝑖1 𝑥∗𝑖2 .. 𝑥∗𝑖𝑗 .. ∗
𝑥𝑖𝑝
.. .. .. .. .. .. .. .. .. .. .. .. .. ..
[ 𝑛 𝑋𝑛1 𝑋𝑛2 . . 𝑋𝑛𝑗 .. 𝑋𝑛𝑝 ]𝑛𝑥𝑝 ∗ 𝑥∗𝑛2 ∗ ∗
[ 𝑛 𝑥𝑛1 .. 𝑥𝑛𝑗 .. 𝑥𝑛𝑝 ]
̅𝟏
𝒙 ̅𝟐
𝒙 ̅𝒋
𝒙 ̅𝒑
𝒙 𝑛𝑥𝑝
𝟏 𝟐 . . 𝒊 . . 𝑿𝒏 𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 . . 𝑿𝒑
∗ ∗ ∗ ∗ 𝑿𝟏 𝒔𝟏𝟏 𝑠12 .. 𝑠1𝑗 . . 𝑠1𝑝
𝑿𝟏 𝑥11 𝑥21 . . 𝑥𝑖1 . . 𝑥𝑛1
𝑿𝟐 𝑥 ∗ ∗ ∗ ∗ 𝑿𝟐 𝑠12 𝒔𝟐𝟐 .. 𝑠2𝑗 . . 𝑠2𝑝
∗ 12 𝑥22 . . 𝑥𝑖2 . . 𝑥𝑛2 ∗
(𝑥𝑝𝑥𝑛
𝑇 ∗
) ∗ 𝑥𝑛𝑥𝑝 = 1/𝑛 . . . . . .. .. .. .. ..
(𝑥𝑝𝑥𝑛 )𝑇 . . . . . .. .. .. .. ..
𝑿𝒋 𝑥 ∗ ∗ 𝑿𝒋 𝒔𝟏𝒋 𝑠2𝑗 .. 𝒔𝒋𝒋 . . 𝑠𝑗𝑝
1𝑗 𝑥2𝑗 .. 𝑥∗𝑖𝑗 .. 𝑥∗𝑛𝑗 .. .. .. .. .. .. ..
.. .. .. .. .. .. ..
∗ ∗ 𝑥∗𝑖𝑝 ∗ [𝑿𝒑 𝑠1𝑝 𝑠2𝑝 .. 𝑠𝑗𝑝 . . 𝒔𝒑𝒑 ]
[𝑿𝒑 𝑥1𝑝 𝑥2𝑝 .. .. 𝑥𝑛𝑝 ] 𝑝𝑥𝑛
𝑝𝑥𝑝
Again if you are not familiar with matrix multiplication you may refer to (Dahman, 2016) and find the
summary papers on Advanced Matrix Theory & linear Algebra.
5. Correlation Matrix
Having you understood the concept and calculation of mean vector and covariance matrix. Then you are
ready to understand the meaning of correlation matrix. They symbol of correlation is denoted by 𝝆
pronounced (Rho (/roʊ/). This matrix will be (pxp). Notice that the diagonal will be (1). It’s a symmetric and
upper or lower the diagonal represents the correlation between two variables. The calculation of correlation
𝒄𝒐𝒗(𝒙𝒋 ,𝒙𝒌 )
between two variables 𝒄𝒐𝒓𝒓(𝒙𝒋 , 𝒙𝒌 ) = , note: 𝜎 is the standard deviation. Now see the matrix:
𝜎𝑗 ∗ 𝜎𝑘
Please notice: 𝝆 represents population parameter. but 𝒓 will represent sample statistic. Now, how do we
calculate correlation matrix. The steps are:
1. First from the data-matric (nxp) we calculate 𝑥̅ and 𝑠𝑗𝑗 for all the variables
P a g e 4|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb
∗
2. Then we create a new value by subtracting each corresponding value from its mean statistic. 𝑥𝑖𝑗 = 𝑥𝑖𝑗 −
𝑥̅𝑗 . And divide the value by √𝑠𝑗𝑗
3. That will lead to a new matrix name 𝑥̂𝑛𝑥𝑝 .
4. Do the transpose of 𝑥̂𝑛𝑥𝑝 , to get (𝑥̂𝑛𝑥𝑝 )𝑇 ,
5. Multiply (𝑥̂𝑛𝑥𝑝 )𝑇 by (𝑥̂𝑛𝑥𝑝 ) and the result by 1/n; to get a new matrix 𝑅𝑝𝑥𝑝 .
6. DSR
So far, we are able to build three (pxp) matrices.
𝑿𝟏 𝑿𝟐 . . 𝑿𝒋 .. 𝑿𝒑 𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 .. 𝑿𝒑 𝑿𝟏 𝑿𝟐 .. 𝑿𝒋 .. 𝑿𝒑
𝑿𝟏 𝒔𝟏𝟏 𝑠12 . . 𝑠1𝑗 .. 𝑠1𝑝 𝑿𝟏 𝟏 𝑟12 .. 𝑟1𝑗 .. 𝑟1𝑝 𝑿𝟏 𝒔𝟏𝟏 0 .. 0 .. 0
𝑿𝟐 𝑠12 𝒔𝟐𝟐 . . 𝑠2𝑗 .. 𝑠2𝑝 𝑿𝟐 𝑟12 𝑿𝟐 0 0 .. 0
𝟏 .. 𝑟2𝑗 .. 𝑟2𝑝 𝒔𝟐𝟐 ..
𝑆 = ... .. .. .. .. .. .. 𝑅 = ... .. .. .. .. .. .. 𝐷 = ... .. .. .. .. .. ..
𝑿𝒋 𝒔𝟏𝒋 𝑠2𝑗 . . 𝒔𝒋𝒋 .. 𝑠𝑗𝑝 𝑿𝒋 𝒓𝟏𝒋 𝑿𝒋 𝟎 0 .. 0
𝑟2𝑗 .. 𝟏 .. 𝑟𝑗𝑝 .. 𝒔𝒋𝒋
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. ..
[𝑿𝒑 𝑠1𝑝 𝑠2𝑝 . . 𝑠𝑗𝑝 .. 𝒔𝒑𝒑 ] 𝑿 𝟏 ]𝑝𝑥𝑝 𝑿
[ 𝒑 0 0 .. 0 . . 𝒔𝒑𝒑 ]
𝑝𝑥𝑝 [ 𝒑 𝑟1𝑝 𝑟2𝑝 .. 𝑟𝑗𝑝 .. 𝑝𝑥𝑝
(𝒏 − 𝟏)𝑺 = (𝑋 ∗ )𝑇 𝑋 ∗
̃ )𝑇 X
(𝒏 − 𝟏)𝑹 = (X ̃
̃ )𝑇 X
Now what is SSCP. It’s in a nutshell, (𝑋 ∗ )𝑇 𝑋 ∗ ; (X ̃; (𝑋)𝑇 X. keep in mind these three products.
8. Hands on Practice
1. Mean Vector: I will use a simple example to find the mean victor by using R. you can use MATLAB or
excel or any other available software packages.
Import DS from (Dahman, 2018b) ds<-read.csv("https://mfr.ca-1.osf.io/render?url=https://osf.io/k3v2r/?
action=download%26mode=render",header=TRUE,sep=",")
P a g e 5|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb
2. Covariance Matrix: I will use the same dataset to find the covariance by using R. you can use MATLAB or
excel or any other available software packages
Import DS from (Dahman, 2018b) ds<-read.csv("https://mfr.ca-1.osf.io/render?url=https://osf.io/k3v2r/?
action=download%26mode=render",header=TRUE,sep=",")
Correlation Matrix: I will use the same dataset to find the correlation matrix by using R. you can use MATLAB
or excel or any other available software packages
Import DS from (Dahman, 2018b) ds<-read.csv("https://mfr.ca-1.osf.io/render?url=https://osf.io/k3v2r/
?action=download%26mode=render",header=TRUE,sep=",")
Observe that the correlation between the variables is not exist. Example, correlation between depth and diameter is -4.631*10-2 = -0.04631. that means, t
here is no correlation exist. You can plot this to visualize.
plot(ds$diameter,ds$depth)
P a g e 6|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb
P a g e 7|8
Summary Papers- Applied Multivariate Statistical Modeling- Multivariate Descriptive Statistics-Chapter Six
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 31). AMSM- Multivariate Descriptive Statistics- Chapter Six. https://doi.org/10.31219/osf.io/pcdxb
P a g e 8|8