Exploratory Multivariate Data Analysis. Julie Josse 2015

Introduction Intensity Rows Columns Superimposed Interpretation To go further
Exploratory Multivariate Data Analysis
Julie Josse
Applied mathematics department, Agronomy University, Rennes, Brittany
Stanford 2015 - Stat 300
1 / 45
Correspondence Analysis
1 Data - Issues
2 Rows Study
3 Columns Study
4 Superimposed representation
5 Interpretation tools
6 To go further
2 / 45
Data, examples
⇒ Contingency table. Symetric role of the rows and the columns
• ecology: abundance of
species i in environment j
• job - candidat: number
of person from job i
voting for candidat j
• sex, race - salary
• hobbies - students major
• open questions...
A χ2 test can be applied

Two categorical variables with I and J categories
3 / 45
History
• CA has been actively developped in 1965 ... in Rennes!

• JP. Benzécri (mathematician - linguist) - PhD B. Escofier
⇒ "French school". Geometric: rows or columns of a data matrix

are assumed to be points in a high-dimensional Euclidean space,
and the method aims to redefine the dimensions of the space so
that the principal dimensions capture the most variance possible,
allowing for lower-dimensional descriptions of the data.
"The data are king, not the model, one might

want to propose for them". "The model
should follow the data, not the inverse."
4 / 45
History
⇒ Authorship attribution. (Dominique and Cyril Labbé)
• Corneille and Molière controversy about the authorship of

several plays signed by Molière
• text-data: number of times the word i is in the text j
⇒ P. Corneille did wrote the verse plays by Molière and two of his
prose plays (Dom Juan and l’Avare)
5 / 45
History
• Hirschfeld (Hartley) (1935), Fisher (1940), Renyi (1959)

• Netherlands and Japan: Jan de Leeuw and Chikio Hayashi
• Michael Greenacre: Theory and applications of CA (1984)
⇒ Several different ways of defining and thinking about CA

• PCA, Canonical analysis, Discriminant analysis
• Spectral clustering (laplacian of graph)
• Relation with log-linear models
⇒ Rediscovery of the method many times

The Geometric Interpretation of Correspondence Analysis (Greenacre, Hastie, JASA).
Horseshoes in Multidimensional Scaling and Kernel Methods. P. Diaconis, S. Goel & S. Holmes
6 / 45
Notations
Figure : Data tables in CA.

7 / 45
Notations
Figure : Row profile and column profile.
8 / 45
Aim
• Rows typology
• Columns typology
• Relationship between these two typologies
⇒ Study the relationship (the correspondence) between the two

variables, the gap to independence
⇒ Visualize the association between levels
9 / 45
Example
12 perfumes described by 39 words:
floral fruity strong soft light ...

Angel 2 11 18 3 1 ...
Aromatics Elixir 2 3 29 2 0 ...
Chanel 5 5 0 19 3 1 ...
Cinéma 14 14 3 12 9 ...
Coco Mademoiselle 10 10 6 10 7 ...
...... . . . . .
10 / 45
Margins
apply(perfume,1,sum)
Angel Aromatics Elixir Chanel 5 Cinema

95 106 86 99
Coco Mademoiselle J_adore J_adore_et L_instant
89 106 110 85
Lolita Lempika Pleasures Pure Poison Shalimar
106 112 92 89
question: quote one or few words to describe each perfume: same

number of words/perfume
apply(perfume,2,sum)
floral fruity strong soft light

134 128 113 97 74
sugary fresh discreet spicy soap
70 62 34 31 31
vanilla acid old wooded agressive
27 24 23 19 17
woman male toilets alcohol heavy
17 17 17 16 16
drugs hot peppery rose lemon
16 15 15 15 13
oriental young candy heady musky
13 12 11 11 10
vegetable eau.de.cologne forest powerful amber
10 9 9 9 8
shower.gel intense nature shampoo
8 8 8 8
11 / 45
Relationship
• Intensity of the relationship φ2 :
X (nij − ni. n.j /n)2

χ2 = ,
ni. n.j /n
ij
X (fij − fi. f.j )2
= n ,
fi. f.j
ij
2
= nφ .
• Significativity of the relationship χ2 test:
χ2obs ∼ χ2(I −1)×(J−1)
χ2 = 615.8, p-value = 1.7e-56 ⇒ highly significant

obs
12 / 45
Relationship
• Nature of the relationship: association between categories
(fij −fi . f.j )2

• contribution to the Φ2 : fi . f.j (of a cell, row, column?)
• residuals (positive or negative association):
(fij − fi. f.j )

xij = p
fi. f.j
⇒ CA visualizes the residuals matrix X : the gap to independence

⇒ CA tells nothing about the significance since working with fij
⇒ As usual, the association structure of X is revealed using SVD
13 / 45
Rows weight and metric
Weight for the columns
• Weight of the row profile is

its mass fi.
Associated weight of the rows

• Metric to compute
distances
1
between rows f.j j=1,...,J
• CA : PCA
(X , M, D)
fij 1
PCA fi . , f.j , fi.
P fij
Center of gravity: average row profile i fi. fi . = f.j
14 / 45
Multidimensional gap to independence

fij
Independence: = f.j
fi.
1 j J 
1
Row profile i = conditional distribution
f ij
i 1
f i. CA compares rows
profile to average profile
I
GI f. j 1
Average row profile = marginal distribution
15 / 45
Rows cloud of points
16 / 45
16 / 45
J
fi 0 j 2

X 1 fij
Distance between rows i and i 0: dχ22 (i, i 0 ) = −
f.j fi. fi 0 .
j=1
16 / 45
J
fi 0 j 2

X 1 fij
Distance between rows i and i 0: dχ22 (i, i 0 )
= −
f.j fi. fi 0 .
j=1
J 2
2
X 1 fij
Distance between row i and GI : dχ2 (i, GI ) = − f.j
f.j fi.
j=1
16 / 45
Total inertia
⇒ Total inertia: weighted sum of squared distances of the rows
profile to the average profile
I
X I
X
Inertia(NI /GI ) = Inertia(i/GI ) = fi. dχ22 (i, GI )
i=1 i=1
 
I J 2
X X 1 fij
= fi.  − f.j 
f.j fi.
i=1 j=1
I X
J
X (fij − fi. f.j )2 χ2
= = = φ2
fi. f.j n
i=1 j=1
Studying the inertia of NI implies studying the gap to

independence: row profiles far from the center of gravity.
Total inertia = ||X ||2M,D = trace(X 0 DXM) = trace(SM) =
P
q λq
17 / 45
Fitting the cloud of points

Decomposition of the inertia of NI : projection of NI onto orthogonal axes maximizing
inertia
I
X
Find u1 maximizing fi . (OFi 1 )2 u2 ⊥u1 , etc.
f i =1

ij 1
CA = PCA fi .
, f.j
, fi . : uq eigenvectors of SM - Fq = XMuq
I
X 2
Inertia of axis q: fi . OFiq = λq
i =1
18 / 45
Graphical outputs
CA factor map
1.0 Lolita Lempika

Angel
0.5
Dim 2 (21.12%)
Cinema
L_instant
0.0
Aromatics Elixir
J_adore Shalimar
Coco Mademoiselle
Pure Poison
J_adore_et
−0.5
Pleasures Chanel 5
−0.5 0.0 0.5 1.0 1.5

Dim 1 (60.46%)
Figure : CA rows representation

19 / 45
Colums weight and metric
1 j J • Weight of the column profile

1
f1. is its mass f.j
• Metric to compute distances
1 fij
fi. f.j between
columns
1
fi . i=1,...,I
1
f
fI. • CA : PCA f ij , f1 , f.j
f.j .j i.
Center of gravity:
average column profile
PJ fij
j=1 f.j × f.j = (fi. )i=1,...,I
20 / 45
Multidimensional gap to independence

fij
Independence: = fi.
f.j
21 / 45
Columns cloud of points
22 / 45
22 / 45
I
fij 0 2

X 1 fij
Distance between two profiles: dχ22 (j, j 0 ) = −
fi. f.j f.j 0
i=1
22 / 45
I
fij 0 2

X 1 fij
Distance between two profiles: dχ22 (j, j 0 )
= −
fi. f.j f.j 0
i=1
I 2
2
X 1 fij
Distance to the average profile GJ : dχ2 (j, GJ ) = − fi.
fi. f.j
i=1
22 / 45
Inertia - fitting
J
X
Total inertia = f.j × d 2 (j, GJ )
j=1
χ2
= = φ2
n

fij 1
CA : PCA f.j , fi . , f.j .
v2
Gj 2 j
k
G Gj 1 v1
v1 , ..., vq , ..., vI −1 orthogonal axes maximizing inertia

23 / 45
Graphical output
vanilla
1.0
sugary
Dim 2 (21.12%)
0.5
agressive
fruity acid spicy
soft strong
0.0
light old
discreet wooded
fresh
floral
-0.5
soap
-1.0
-0.5 0.0 0.5 1.0 1.5

Dim 1 (60.46%)
Figure : CA Columns representation
24 / 45
Two clouds of points
⇒ CA: 2 weighted PCA on the row and column profiles 25 / 45

Link between representations: transition formulae
J
1 X fij
Fiq = p Gjq
λq j=1 fi.
⇒ Row i is atpthe weighted barycenter of the columns (with

coefficient 1/ λq )
I
1 X fij
Gjq = p Fiq
λq i=1 f.j
⇒ Column j is pat the weighted barycenter of the rows (with

coefficient 1/ λq )
26 / 45
Superimposed reprensentation
CA factor map
vanilla
1.0
sugary
Lolita Lempika
Dim 2 (21.12%)
Angel
0.5
Cinema agressive
L_instant
fruity acidspicy
strong Shalimar
0.0
soft
J_adorelight woodedAromatics Elixir
Coco Mademoiselle old
J_adore_et
Pure Poison
discreet fresh
−0.5
floral
Chanel 5
Pleasures
soap
−0.5 0.0 0.5 1.0 1.5 2.0

Dim 1 (60.46%)
27 / 45
Graphical representation in CA: remarks
• The barycenter represents the independence

• The distance between levels of a same variable can be
interpreted
• Representation provided are pseudo-barycentric (dilatation):
transition formulae
• It is not possible to interpret the distance between levels of the
two variables but ...
• ... it is at a weighted barycenter of all the levels
28 / 45
Superimposed reprensentation
CA factor map
vanilla
1.0
sugary
Lolita Lempika
Dim 2 (21.12%)
Angel
0.5
Cinema agressive
L_instant
fruity acidspicy
strong Shalimar
0.0
soft
J_adorelight woodedAromatics Elixir
Coco Mademoiselle old
J_adore_et
Pure Poison
discreet fresh
−0.5
floral
Chanel 5
Pleasures
soap
−0.5 0.0 0.5 1.0 1.5 2.0

Dim 1 (60.46%)
Why J’adore eau de toilette et J’adore eau de parfum are close?

Lolita has a sugar odor or a vanilla one?
29 / 45
Inertia (= eigenvalues)
CA : 0 ≤ λq ≤ 1 What is a stucture with λq = 1 ?
⇒ Partition of rows and colums in 2 clusters: exclusive association

library(FactoMineR)
don <- diag(5)
ca.don <- CA(don) ; ca.don$eig
don <- matrix(1,5,5)

ca.don <- CA(don) ; ca.don$eig
30 / 45
Inertia (= eigenvalues)
ex: recognize 3 taste (sweet, acidic, bitter) true/perceived
Perc sweet Perc acid Perc bitter Perc sweet Perc acid Perc bitter
sweet 10 0 0 sweet 10 0 0
acid 0 9 3 acid 0 7 5
bitter 0 1 7 bitter 0 3 5
eigenval %inertia %inertia eigenval %inertia %inertia

dim 1 1.00 72.72 72.72 dim 1 1.00 96 96
dim 2 0.37 27.27 100 dim 2 0.04 4 100
CA factor map CA factor map
Perc acid
0.5
0.5
acid
Dim 2 (27.27%)
Dim 2 (4.00%)
acid
Perc acid
sweet
0.0
Perc sweet
0.0
Perc sweet sweet
Perc bitter
bitter
Perc bitter -0.5

-1.0
bitter
-1.5 -0.5 0.0 0.5 1.0 -1.5 -1.0 -0.5 0.0 0.5
Dim 1 (72.73%) Dim 1 (96.00%)
31 / 45
Maximum number of dimensions - Cramer V

Rows cloud: I points in J dimensions

J dim. but 1 constraint (profile) ⇒ Q ≤ J − 1
Q ≤ min(I −1, J −1)
I points are at most in I − 1 dim. ⇒ Q ≤ I − 1
min(I −1,J−1)
X
=⇒ Φ2 = λq ≤ min(I − 1, J − 1)
q=1
32 / 45
Maximum number of dimensions - Cramer V

Rows cloud: I points in J dimensions

J dim. but 1 constraint (profile) ⇒ Q ≤ J − 1
Q ≤ min(I −1, J −1)
I points are at most in I − 1 dim. ⇒ Q ≤ I − 1
min(I −1,J−1)
X
=⇒ Φ2 = λq ≤ min(I − 1, J − 1)
q=1
Indicator of the relationship between 2 variables:

s
Φ2
Cramer V = ∈ [0; 1]
min(I − 1, J − 1)
ex:
q recognize 3 taste
q (sweet, acidic, bitter) true/perceived
1.375
2 = 0.82 - 1.042
2 = 0.72
32 / 45
Inertia perfume
Eigenvalues
0.4
0.3
Inertia Inertia (%)
dim 1 0.45 52.04
dim 2 0.15 17.85
0.2
.....
dim 11 0.01 0.74
Sum 0.86 100
0.1
0.0
1 2 3 4 5 6 7 8 9 10 11
λ1 = 0.45 1 ⇒ far from exclusive association with one row and

one column
Φ2 = 0.86 11 ⇒ far from perfect association, i.e. exclusive

association between categories of the 2 variables
33 / 45
Interpretation tools
• Quality of the representation of a point on a axis q:
projected inertia of a point on axe q fi. Fiq2

= = cos2 (θ)
total inertia of the point fi. d 2 (i, GI )
• Contribution to an axis q:
inertia of a point fi . F 2 2
fi . Fiq
= P iq 2 =
total inertia of the axis i fi . Fiq
λq
compromise weight/ distance to origin

useful for large data to select important rows/columns
34 / 45
Contributions : exemple
Contribution: example
1.5
X1 X2 X3 X4 Inertie %
a 1 1 0 0 Axe 1 0.258 83.501
1.0
b 5 10 10 0 Axe 2 0.036 11.538
c 0 10 10 5 a d Axe 3 0.015 4.96
Dim 2 (11.54%)
d 0 0 1 1
0.5
X1 X4
Axe1 Axe2
a 18.879 46.296
0.0 b X2 X3 c b 31.121 3.704
c 31.121 3.704
-0.5
d 18.879 46.296
Σ 100 100
-1.0 -0.5 0.0 0.5 1.0
Dim 1 (83.50%)
52
35 / 45
Contributions : exemple
Contribution: example
1.5
X1 X2 X3 X4 Inertie %
a 1 1 0 0 Axe 1 0.258 83.501
1.0
b 5 10 10 0 Axe 2 0.036 11.538
c 0 10 10 5 a d Axe 3 0.015 4.96
Dim 2 (11.54%)
d 0 0 1 1
0.5
X1 X4
Axe1 Axe2
a 18.879 46.296
0.0 b X2 X3 c b 31.121 3.704
c 31.121 3.704
-0.5
d 18.879 46.296
Σ 100 100
-1.0 -0.5 0.0 0.5 1.0
Dim 1 (83.50%)
⇒ Be careful, extreme points are not those which contribute the

most to the dimensions
The myth of the influential outlier (Greenacre) 52
35 / 45
Supplementary information
CA factor map
1.5
vanilla
1.0 sugary candy

Lolita Lempika
●
Angel
hot oriental
●
heavy intense
0.5
Dim 2 (21.12%)
Cinéma peppery
agressive
young
L_instant
●
● heady drugs
fruity acid spicy
soft lemon strong eau.de.cologne
vegetable Aromatics
ShalimarElixir
0.0
J_adore
light
●woman male ● ● alcohol
powerful
● wooded old
CocoPureMademoiselle
Poison
J_adore_et
discreet ●
● forest
fresh
●
floral rose toilets
nature Chanel 5
shampoo
−0.5
Pleasures amber
●
musky
shower.gel
●
soap
−1.0
−0.5 0.0 0.5 1.0 1.5 2.0
Dim 1 (60.46%)
Figure : Supplementary columns

36 / 45
Superimposed representation and inertia
Perc sweet Perc acid Perc bitter Perc sweet Perc acid Perc bitter
sweet 10 0 0 sweet 10 0 0
acid 0 9 3 acid 0 7 5
bitter 0 1 7 bitter 0 3 5
eigenval %inertia %inertia eigenval %inertia %inertia

dim 1 1.00 72.72 72.72 dim 1 1.00 96 96
dim 2 0.37 27.27 100 dim 2 0.04 4 100
√ √ √ √
1/ λ2 = 1/ 0.375 = 1.6 1/ λ2 = 1/ 0.042 = 4.9
37 / 45
Distributional equivalence
Distributional equivalence: Gathering two rows with the same

profile give the same CA results
If rows i and i 0 are proportional nij /ni 0 j = α, j = 1, ..., J

Pool these rows nij? = nij + ni 0 j
⇒ New distance between cols j and j equals old distance
Application: categories sweet and sweet taste - Job unqualified

employee and employee.
⇒ Useful for text data (singular, plural, verbs, etc.)
38 / 45
CA - SVD - GSVD
See Append p. 35 Adv PCA slides. Note r = fi. , c = fj. , fij = X /N
1 1
⇒ CA = GSVD (fij − fi. f.j , M = fi . , D = fj. )
0
GSVD (X /N − rc , Dr−1 , Dr−1 )
0 0 0 0
X /N − rc = UΛV , UDr−1 U , V Dc−1 V = I
⇒ Equivalent SVD (centered and scaled)
−1/2 0 −1/2 fij − fi. f.j

Dr (X /N − rc )Dc = p
fi. f.j
−1/2 0 −1/2 0
SVD Dr (X /N − rc )Dc = PΣ1/2 Q
−1/2 −1/2
U = Dr P: rows coordinates F = Dr PΣ1/2
−1/2 −1/2
V = Dc Q: colums coordinates G = Dc QΣ1/2
39 / 45
Reconstitution in CA
⇒ with Q dimensions:
1/2 0 1/2
X̂ /N ≈ Dr PΣ1/2 Q Dc + rc
1/2 0
≈ Dr UΛ V Dc + rc
Q √
X
X̂ij /N ≈ ri cj (1 + λq uiq vjq )
q=1
Q
X 1
X̂ij /N ≈ ri cj (1 + √ Fiq Gjq )
q=1
λq
40 / 45
Log linear model and CA

⇒ The saturated log-linear model for XI ×J is log µij = αi + βj + Γij
⇒ The Row-Column RC(Q) (Goodman, 1985) association model:
Q
X
log µij = αi + βj + λq uiq vjq
q=1
Estimation difficult: non-convexity of the rank constraint

⇒ CA reconstruction X̂ /N ≈ rc(1 + Q
P
q=1 λq uiq vjq )
⇒ Approximation (de Falguerolle, Gower, der Heijden, de Leeuw J)
Q
X
log(x̂ij ) ≈ log(N) + log(ri ) + log(cj ) + λq uiq vjq
q=1
41 / 45
Code to compute CA (Greenacre)
data.P<-data_set/sum(data_set)
data.r<-apply(data.P,1,sum)
data.c<-apply(data.P,2,sum)
data.Dr<-diag(data.r)
data.Dc<-diag(data.c)
data.Drmh<-diag(1/sqrt(data.r))
data.Dcmh<-diag(1/sqrt(data.c))
data.P<-as.matrix(data.P)
data.S<-data.Drmh%*%(data.P-data.r%o%data.c)%*%data.Dcmh
data.svd<-svd(data.S)
data.rsc<-data.Drmh%*%data.svd$u
data.csc<-data.Dcmh%*%data.svd$v
data.rpc<-data.rsc%*%diag(data.svd$d)
data.cpc<-data.csc%*%diag(data.svd$d)
plot(data.rpc[,1],data.rpc[,2],type="n",pty="s")
text(data.rpc[,1],data.rpc[,2],label=rownames(data_set))
42 / 45
CA with FactoMineR
library(FactoMineR)
perfume <- read.table("http://factominer.free.fr/docs/perfume.txt",
header=TRUE,sep="\t",row.names = 1)
rownames(perfume)[4] <- "Cinema"
res.ca <- CA(perfume, col.sup = 16:39)
plot(res.ca, invisible = "row")

plot(res.ca, invisible = c("col", "col.sup"))
plot(res.ca, cex = 0.6, selectCol = "contrib 20")
summary(res.ca)
res.ca$eig
barplot(res.ca$eig[,1], main = "Eigenvalues",
names.arg = 1:nrow(res.ca$eig))
res.ca$row$coord; res.ca$row$cos2; res.ca$row$contrib
res.ca$col$coord; res.ca$col$cos2; res.ca$col$contrib
chit = chisq.test(perfume[,1:15]); chit$exp; chit$res
round(prop.table(as.matrix(perfume[,1:15]),1),4)
round(prop.table(as.matrix(perfume[,1:15]),2),4)
43 / 45
CA in the R packages
• ade4 (Chessel et al.)

• anacor (de Leeuw and mair)
• ca (Nenadic and Greenacre)
• FactoMineR (Husson et al.)
• homals (de Leeuw)
• vegan (Dixon)
44 / 45
Bibliography
Benzécri J. P. (1992). Correspondence Analysis Handbook. (Transl : T.K. Gopalan)
Marcel Dekker, New York.
Blasius, J. & Greenacre, M. J. (2014). Visualisation and verbalisation of data. CRC.
Greenacre, M. J. (2007). Correspondence analysis in practice. Chapman & Hall CRC.
Greenacre, M. J. and Blasius, J. (2006). Multiple correspondence analysis and related

methods. Chapman & HallCRC.
Greenacre M. J. (1984). Theory and applications of correspondence analysis.

Acadamic Press.
Exploratory Multivariate Analysis by Example using R, Husson, Lê, Pagès (2010).

Chapman & Hall. Youtube playlist
Le Roux B. & Rouanet H. (2004). Geometric Data Analysis, From Correspondence

Analysis to Structured Data Analysis.
Murtagh F. (2005). Correspondence Analysis and Data Coding with R and Java.
Chapman & Hall CRC.
45 / 45

Exploratory Multivariate Data Analysis. Julie Josse 2015

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Exploratory Multivariate Data Analysis. Julie Josse 2015

Uploaded by

Copyright:

Available Formats

Introduction Intensity Rows Columns Superimposed Interpretation To go further

Exploratory Multivariate Data Analysis

Applied mathematics department, Agronomy University, Rennes, Brittany

Stanford 2015 - Stat 300

A χ2 test can be applied

• CA has been actively developped in 1965 ... in Rennes!

⇒ "French school". Geometric: rows or columns of a data matrix

"The data are king, not the model, one might

⇒ Authorship attribution. (Dominique and Cyril Labbé)

• Corneille and Molière controversy about the authorship of

• Hirschfeld (Hartley) (1935), Fisher (1940), Renyi (1959)

⇒ Several different ways of defining and thinking about CA

⇒ Rediscovery of the method many times

Figure : Data tables in CA.

Figure : Row profile and column profile.

⇒ Study the relationship (the correspondence) between the two

⇒ Visualize the association between levels

12 perfumes described by 39 words:

floral fruity strong soft light ...

Angel Aromatics Elixir Chanel 5 Cinema

question: quote one or few words to describe each perfume: same

floral fruity strong soft light

• Intensity of the relationship φ2 :

X (nij − ni. n.j /n)2

• Significativity of the relationship χ2 test:

χ2obs ∼ χ2(I −1)×(J−1)

χ2 = 615.8, p-value = 1.7e-56 ⇒ highly significant

• Nature of the relationship: association between categories

(fij −fi . f.j )2

• residuals (positive or negative association):

(fij − fi. f.j )

⇒ CA visualizes the residuals matrix X : the gap to independence

Rows weight and metric

Weight for the columns

• Weight of the row profile is

Associated weight of the rows

Multidimensional gap to independence

Rows cloud of points

Rows cloud of points

Rows cloud of points

Rows cloud of points

Studying the inertia of NI implies studying the gap to

Fitting the cloud of points

1.0 Lolita Lempika

−0.5 0.0 0.5 1.0 1.5

Figure : CA rows representation

Colums weight and metric

1 j J • Weight of the column profile

Multidimensional gap to independence

Columns cloud of points

Columns cloud of points

Columns cloud of points

Columns cloud of points

v1 , ..., vq , ..., vI −1 orthogonal axes maximizing inertia

-0.5 0.0 0.5 1.0 1.5

Figure : CA Columns representation

Two clouds of points

⇒ CA: 2 weighted PCA on the row and column profiles 25 / 45

Link between representations: transition formulae

⇒ Row i is atpthe weighted barycenter of the columns (with

⇒ Column j is pat the weighted barycenter of the rows (with

−0.5 0.0 0.5 1.0 1.5 2.0

Graphical representation in CA: remarks

λ1 = 0.45 1 ⇒ far from exclusive association with one row and

Φ2 = 0.86 11 ⇒ far from perfect association, i.e. exclusive