Professional Documents
Culture Documents
Exploratory Multivariate Data Analysis. Julie Josse 2015
Exploratory Multivariate Data Analysis. Julie Josse 2015
Julie Josse
1 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Correspondence Analysis
1 Data - Issues
2 Rows Study
3 Columns Study
4 Superimposed representation
5 Interpretation tools
6 To go further
2 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Data, examples
⇒ Contingency table. Symetric role of the rows and the columns
• ecology: abundance of
species i in environment j
• job - candidat: number
of person from job i
voting for candidat j
• sex, race - salary
• hobbies - students major
• open questions...
3 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
History
4 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
History
⇒ P. Corneille did wrote the verse plays by Molière and two of his
prose plays (Dom Juan and l’Avare)
5 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
History
Horseshoes in Multidimensional Scaling and Kernel Methods. P. Diaconis, S. Goel & S. Holmes
6 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Notations
Notations
8 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Aim
• Rows typology
• Columns typology
• Relationship between these two typologies
9 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Example
10 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Margins
apply(perfume,1,sum)
Relationship
12 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Relationship
13 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
P fij
Center of gravity: average row profile i fi. fi . = f.j
14 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
1 j J
1
Row profile i = conditional distribution
f ij
i 1
f i. CA compares rows
profile to average profile
I
GI f. j 1
Average row profile = marginal distribution
15 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
16 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
16 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
J
fi 0 j 2
X 1 fij
Distance between rows i and i 0: dχ22 (i, i 0 ) = −
f.j fi. fi 0 .
j=1
16 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
J
fi 0 j 2
X 1 fij
Distance between rows i and i 0: dχ22 (i, i 0 )
= −
f.j fi. fi 0 .
j=1
J 2
2
X 1 fij
Distance between row i and GI : dχ2 (i, GI ) = − f.j
f.j fi.
j=1
16 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Total inertia
⇒ Total inertia: weighted sum of squared distances of the rows
profile to the average profile
I
X I
X
Inertia(NI /GI ) = Inertia(i/GI ) = fi. dχ22 (i, GI )
i=1 i=1
I J 2
X X 1 fij
= fi. − f.j
f.j fi.
i=1 j=1
I X
J
X (fij − fi. f.j )2 χ2
= = = φ2
fi. f.j n
i=1 j=1
I
X
Find u1 maximizing fi . (OFi 1 )2 u2 ⊥u1 , etc.
f i =1
ij 1
CA = PCA fi .
, f.j
, fi . : uq eigenvectors of SM - Fq = XMuq
I
X 2
Inertia of axis q: fi . OFiq = λq
i =1
18 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Graphical outputs
CA factor map
Cinema
L_instant
0.0
Aromatics Elixir
J_adore Shalimar
Coco Mademoiselle
Pure Poison
J_adore_et
−0.5
Pleasures Chanel 5
Center of gravity:
average column profile
PJ fij
j=1 f.j × f.j = (fi. )i=1,...,I
20 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
21 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
22 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
22 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
I
fij 0 2
X 1 fij
Distance between two profiles: dχ22 (j, j 0 ) = −
fi. f.j f.j 0
i=1
22 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
I
fij 0 2
X 1 fij
Distance between two profiles: dχ22 (j, j 0 )
= −
fi. f.j f.j 0
i=1
I 2
2
X 1 fij
Distance to the average profile GJ : dχ2 (j, GJ ) = − fi.
fi. f.j
i=1
22 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Inertia - fitting
J
X
Total inertia = f.j × d 2 (j, GJ )
j=1
χ2
= = φ2
n
fij 1
CA : PCA f.j , fi . , f.j .
v2
Gj 2 j
k
G Gj 1 v1
Graphical output
vanilla
1.0
sugary
Dim 2 (21.12%)
0.5
agressive
fruity acid spicy
soft strong
0.0
light old
discreet wooded
fresh
floral
-0.5
soap
-1.0
24 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
J
1 X fij
Fiq = p Gjq
λq j=1 fi.
I
1 X fij
Gjq = p Fiq
λq i=1 f.j
26 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Superimposed reprensentation
CA factor map
vanilla
1.0
sugary
Lolita Lempika
Dim 2 (21.12%)
Angel
0.5
Cinema agressive
L_instant
fruity acidspicy
strong Shalimar
0.0
soft
J_adorelight woodedAromatics Elixir
Coco Mademoiselle old
J_adore_et
Pure Poison
discreet fresh
−0.5
floral
Chanel 5
Pleasures
soap
28 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Superimposed reprensentation
CA factor map
vanilla
1.0
sugary
Lolita Lempika
Dim 2 (21.12%)
Angel
0.5
Cinema agressive
L_instant
fruity acidspicy
strong Shalimar
0.0
soft
J_adorelight woodedAromatics Elixir
Coco Mademoiselle old
J_adore_et
Pure Poison
discreet fresh
−0.5
floral
Chanel 5
Pleasures
soap
Inertia (= eigenvalues)
30 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Inertia (= eigenvalues)
ex: recognize 3 taste (sweet, acidic, bitter) true/perceived
Perc sweet Perc acid Perc bitter Perc sweet Perc acid Perc bitter
sweet 10 0 0 sweet 10 0 0
acid 0 9 3 acid 0 7 5
bitter 0 1 7 bitter 0 3 5
Perc acid
0.5
0.5
acid
Dim 2 (27.27%)
Dim 2 (4.00%)
acid
Perc acid
sweet
0.0
Perc sweet
0.0
Perc sweet sweet
Perc bitter
bitter
bitter
-1.5 -0.5 0.0 0.5 1.0 -1.5 -1.0 -0.5 0.0 0.5
31 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
min(I −1,J−1)
X
=⇒ Φ2 = λq ≤ min(I − 1, J − 1)
q=1
32 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
min(I −1,J−1)
X
=⇒ Φ2 = λq ≤ min(I − 1, J − 1)
q=1
ex:
q recognize 3 taste
q (sweet, acidic, bitter) true/perceived
1.375
2 = 0.82 - 1.042
2 = 0.72
32 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Inertia perfume
Eigenvalues
0.4
0.3
Inertia Inertia (%)
dim 1 0.45 52.04
dim 2 0.15 17.85
0.2
.....
dim 11 0.01 0.74
Sum 0.86 100
0.1
0.0
1 2 3 4 5 6 7 8 9 10 11
33 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Interpretation tools
• Contribution to an axis q:
inertia of a point fi . F 2 2
fi . Fiq
= P iq 2 =
total inertia of the axis i fi . Fiq
λq
34 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Contributions : exemple
Contribution: example
1.5
X1 X2 X3 X4 Inertie %
a 1 1 0 0 Axe 1 0.258 83.501
1.0
b 5 10 10 0 Axe 2 0.036 11.538
c 0 10 10 5 a d Axe 3 0.015 4.96
Dim 2 (11.54%)
d 0 0 1 1
0.5
X1 X4
Axe1 Axe2
a 18.879 46.296
0.0 b X2 X3 c b 31.121 3.704
c 31.121 3.704
-0.5
d 18.879 46.296
Σ 100 100
-1.0 -0.5 0.0 0.5 1.0
Dim 1 (83.50%)
52
35 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Contributions : exemple
Contribution: example
1.5
X1 X2 X3 X4 Inertie %
a 1 1 0 0 Axe 1 0.258 83.501
1.0
b 5 10 10 0 Axe 2 0.036 11.538
c 0 10 10 5 a d Axe 3 0.015 4.96
Dim 2 (11.54%)
d 0 0 1 1
0.5
X1 X4
Axe1 Axe2
a 18.879 46.296
0.0 b X2 X3 c b 31.121 3.704
c 31.121 3.704
-0.5
d 18.879 46.296
Σ 100 100
-1.0 -0.5 0.0 0.5 1.0
Dim 1 (83.50%)
35 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Supplementary information
CA factor map
1.5
vanilla
Angel
hot oriental
●
heavy intense
0.5
Dim 2 (21.12%)
Cinéma peppery
agressive
young
L_instant
●
● heady drugs
fruity acid spicy
soft lemon strong eau.de.cologne
vegetable Aromatics
ShalimarElixir
0.0
J_adore
light
●woman male ● ● alcohol
powerful
● wooded old
CocoPureMademoiselle
Poison
J_adore_et
discreet ●
● forest
fresh
●
floral rose toilets
nature Chanel 5
shampoo
−0.5
Pleasures amber
●
musky
shower.gel
●
soap
−1.0
Dim 1 (60.46%)
Perc sweet Perc acid Perc bitter Perc sweet Perc acid Perc bitter
sweet 10 0 0 sweet 10 0 0
acid 0 9 3 acid 0 7 5
bitter 0 1 7 bitter 0 3 5
√ √ √ √
1/ λ2 = 1/ 0.375 = 1.6 1/ λ2 = 1/ 0.042 = 4.9
37 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Distributional equivalence
38 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
CA - SVD - GSVD
1 1
⇒ CA = GSVD (fij − fi. f.j , M = fi . , D = fj. )
0
GSVD (X /N − rc , Dr−1 , Dr−1 )
0 0 0 0
X /N − rc = UΛV , UDr−1 U , V Dc−1 V = I
Reconstitution in CA
⇒ with Q dimensions:
1/2 0 1/2
X̂ /N ≈ Dr PΣ1/2 Q Dc + rc
1/2 0
≈ Dr UΛ V Dc + rc
Q √
X
X̂ij /N ≈ ri cj (1 + λq uiq vjq )
q=1
Q
X 1
X̂ij /N ≈ ri cj (1 + √ Fiq Gjq )
q=1
λq
40 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Q
X
log(x̂ij ) ≈ log(N) + log(ri ) + log(cj ) + λq uiq vjq
q=1
41 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
data.P<-data_set/sum(data_set)
data.r<-apply(data.P,1,sum)
data.c<-apply(data.P,2,sum)
data.Dr<-diag(data.r)
data.Dc<-diag(data.c)
data.Drmh<-diag(1/sqrt(data.r))
data.Dcmh<-diag(1/sqrt(data.c))
data.P<-as.matrix(data.P)
data.S<-data.Drmh%*%(data.P-data.r%o%data.c)%*%data.Dcmh
data.svd<-svd(data.S)
data.rsc<-data.Drmh%*%data.svd$u
data.csc<-data.Dcmh%*%data.svd$v
data.rpc<-data.rsc%*%diag(data.svd$d)
data.cpc<-data.csc%*%diag(data.svd$d)
plot(data.rpc[,1],data.rpc[,2],type="n",pty="s")
text(data.rpc[,1],data.rpc[,2],label=rownames(data_set))
42 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
CA with FactoMineR
library(FactoMineR)
perfume <- read.table("http://factominer.free.fr/docs/perfume.txt",
header=TRUE,sep="\t",row.names = 1)
rownames(perfume)[4] <- "Cinema"
summary(res.ca)
res.ca$eig
barplot(res.ca$eig[,1], main = "Eigenvalues",
names.arg = 1:nrow(res.ca$eig))
res.ca$row$coord; res.ca$row$cos2; res.ca$row$contrib
res.ca$col$coord; res.ca$col$cos2; res.ca$col$contrib
round(prop.table(as.matrix(perfume[,1:15]),1),4)
round(prop.table(as.matrix(perfume[,1:15]),2),4)
43 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
CA in the R packages
44 / 45
Introduction Intensity Rows Columns Superimposed Interpretation To go further
Bibliography
Benzécri J. P. (1992). Correspondence Analysis Handbook. (Transl : T.K. Gopalan)
Marcel Dekker, New York.
Murtagh F. (2005). Correspondence Analysis and Data Coding with R and Java.
Chapman & Hall CRC.
45 / 45