Clustering Course Slides

Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found
Hierarchical clustering
François Husson
Applied Mathematics Department - Rennes Agrocampus
husson@agrocampus-ouest.fr
1 / 42
1 Introduction
2 Principles of hierarchical clustering
3 Example
4 K-means : a partitioning algorithm
5 Extras
• Making more robust partitions
• Clustering in high dimensions
• Qualitative variables and clustering
• Combining with factor analysis - clustering
6 Describing classes of individuals
1 / 42
1 Introduction
3 Example
4 Partitioning algorithm : K-means
5 Extras
6 Characterizing classes of individuals
2 / 42
Introduction
• Definitions :
• Clustering is : making or building classes
• Class : set of individuals (or objects) with similar shared
characteristics
• Examples
• of clustering : animal kingdom, computer hard disk, geographic
division of France, etc.
• of classes : social classes, political classes, etc.
• Two types of clustering :
• hierarchical : tree
• partitioning methods
3 / 42
Hierarchical example : the animal kingdom
4 / 42
1 Introduction
3 Example
5 Extras
5 / 42
What data ? What goals ?
Clustering is for data tables : rows of indivi-

duals, columns of quantitative variables
4
Goals : build a tree structure that :
• shows hierarchical links between
3
individuals or groups of individuals
2
• detects a “natural” number of classes
1
in the population 0
H
6 / 42
Critères
Measuring similarity of individuals :

• Euclidean distance
• similarity indices
• etc.
Similarity between groups of individuals :

• minimum jump or single linkage
(smallest distance) x xx
x x x xxx x
• complete linkage (largest distance) x x x x
• Ward criterion
7 / 42
Algorithm
ABC
7th grouping DEFGH 4.07 {ABCDEFGH}
4
ABC DE
DE 4.72
6th grouping
FGH 4.07 1.81
3
ABC DE FG
DE 4.72
5th grouping FG 4.23 1.81
H 4.07 2.90 1.12
2
ABC D E FG
D 4.72 {ABC},{DEFGH}
4th grouping E 5.55 1.00
FG 4.07 2.01 1.81
H 4.75 3.16 2.90 1.12
{ABC},{DE},{FGH}
{ABC},{DE},{FG},{H}
1
ABC D E F G
D 4.72 {ABC},{D},{E},{FG},{H}
3rd grouping E 5.55 1.00 {ABC},{D},{E},{F},{G},{H}
F 4.07 2.01 2.06
{AC},{B},{D},{E},{F},{G},{H}
G 4.68 2.06 1.81 0.61
H 4.75 3.16 2.90 1.28 1.12 0 {A},{B},{C},{D},{E},{F},{G},{H}
B
C
F
E
G
H
AC B D E F G A B C D E F G
B 0.50 B 0.50
D 4.80 4.72 C 0.25 0.56 1st grouping
2nd grouping E 5.57 5.55 1.00 D 5.00 4.72 4.80
F 4.07 4.23 2.01 2.06 E 5.78 5.55 5.57 1.00
G 4.68 4.84 2.06 1.81 0.61 F 4.32 4.23 4.07 2.01 2.06
H 4.75 5.02 3.16 2.90 1.28 1.12 G 4.92 4.84 4.68 2.06 1.81 0.61
H 5.00 5.02 4.75 3.16 2.90 1.28 1.12
8 / 42
Trees and partitions
1.5
1.0
Hierarchical Clustering
●
Trees always end up . . . cut through !
0.5
0.0
Click to cut the tree
inertia gain
1.5
Choosing a height to cut at
1.0
gives a partition ●
0.5
0.0
Casarsa
Parkhomenko
YURKOV
Lorenzo
NOOL
BOURGUIGNON
MARTINEAU
Karlivans
BARRAS
Uldal
HERNU
Turi
Karpov
Clay
Sebrle
Schoenbeck
Ojaniemi
Barras
Qi
Smirnov
Gomez
Zsivoczky
Macey
Smith
McMULLEN
Bernard
ZSIVOCZKY
Hernu
KARPOV
SEBRLE
Terek
Pogorelov
Korkizoglou
CLAY
BERNARD
Nool
Warners
Drews
WARNERS
Schwarzl
Averyanov
Remark : given how it was made, the partition is interesting but
not optimal
9 / 42
Partition quality
When is a partition a good one ?

• If individuals placed in the same class are close to each other
• If individuals in different classes are far from each other
Mathematically speaking ?
• small within-class variability
• large between-class variability
=⇒ Two criteria. Which one to use ?
10 / 42
Partition quality
x̄k the mean of the xk , x̄qk the mean of the xk in class q
K Q I K Q I K Q I
X X X X X X X X X
(xiqk − x̄k )2 = (xiqk − x̄qk )2 + (x̄qk − x̄k )2
k=1 q=1 i=1 k=1 q=1 i=1 k=1 q=1 i=1
| {z } | {z } | {z }
total inertia within-class inertia between-class inertia
x2
x1
x x
x3
=⇒ 1 criterion only !
11 / 42
Partition quality
Partition quality is measured by :
between-class inertia
0≤ ≤1
total inertia
inertia between = 0 =⇒ ∀k, ∀q, x̄ = x̄
inertia total qk k
by variable, classes have the same means
Doesn’t allow us to classify
inertia between = 1 =⇒ ∀k, ∀q, ∀i, x = x̄

inertia total iqk qk
individuals in the same class are identical
Ideal for classifying
Warning : don’t just accept this criteria at face value : it depends

on the number of individuals and classes
12 / 42
Ward’s method
• Initialize : 1 class = 1 individual =⇒ Between-class inertia =
total inertia
• At each step : combine classes a and b that minimize the
decrease in between-class inertia
ma mb
Inertia(a) + Inertia(b) = Inertia(a ∪ b) − d 2 (a, b)
ma + mb
| {z }
to minimize
Group together objects with small
weights and avoid chain effects
+ Group together classes
8 10
++++ +
Saut minimum Ward
++ ++++
+
with similar centers of
6
4
gravity
2
xx xx
xxxxxx
−2 0
x
1
6
5
10
3
15
2
4
7
13
9
8
11
12
14
16
18
25
26
19
20
30
23
22
27
24
28
29
17
21
1
6
5
10
7
13
8
11
2
12
9
3
4
15
14
16
18
25
26
24
28
29
17
19
20
30
23
21
xx x 22
27
−2 0 2 4 6 8 10
+
Direct use for cluste-
8 10
Saut minimum Ward

+++++ + + Saut minimum Ward
++*+
***+* +
***
ring
6
***
***
4
***
xx x*x*****
2
xxx*x*xx**
−2 0
x
1
31
32
6
10
33
7
35
34
5
13
36
37
38
39
40
41
42
43
44
45
46
47
48
49
26
57
56
50
51
52
53
54
55
18
25
3
15
22
27
19
20
2
30
23
4
24
28
9
29
21
8
11
12
14
16
17
1
31
32
6
5
10
8
33
11
7
12
35
34
13
36
2
9
3
4
15
14
16
18
25
24
28
29
17
19
20
30
23
21
53
54
55
22
27
26
57
56
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
xx x
−2 0 2 4 6 8 10
13 / 42
1 Introduction
3 Example
5 Extras
14 / 42
Temperature data
• 23 individuals : European capitals
• 12 variables : mean monthly temperatures over 30 years
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Area
Amsterdam 2.9 2.5 5.7 8.2 12.5 14.8 17.1 17.1 14.5 11.4 7.0 4.4 West
Athens 9.1 9.7 11.7 15.4 20.1 24.5 27.4 27.2 23.8 19.2 14.6 11.0 South
Berlin -0.2 0.1 4.4 8.2 13.8 16.0 18.3 18.0 14.4 10.0 4.2 1.2 West
Brussels 3.3 3.3 6.7 8.9 12.8 15.6 17.8 17.8 15.0 11.1 6.7 4.4 West
Budapest -1.1 0.8 5.5 11.6 17.0 20.2 22.0 21.3 16.9 11.3 5.1 0.7 East
Copenhagen -0.4 -0.4 1.3 5.8 11.1 15.4 17.1 16.6 13.3 8.8 4.1 1.3 North
Dublin 4.8 5.0 5.9 7.8 10.4 13.3 15.0 14.6 12.7 9.7 6.7 5.4 North
Elsinki -5.8 -6.2 -2.7 3.1 10.2 14.0 17.2 14.9 9.7 5.2 0.1 -2.3 North
Kiev -5.9 -5.0 -0.3 7.4 14.3 17.8 19.4 18.5 13.7 7.5 1.2 -3.6 East
Krakow -3.7 -2.0 1.9 7.9 13.2 16.9 18.4 17.6 13.7 8.6 2.6 -1.7 East
Lisbon 10.5 11.3 12.8 14.5 16.7 19.4 21.5 21.9 20.4 17.4 13.7 11.1 South
London 3.4 4.2 5.5 8.3 11.9 15.1 16.9 16.5 14.0 10.2 6.3 4.4 North
Madrid 5.0 6.6 9.4 12.2 16.0 20.8 24.7 24.3 19.8 13.9 8.7 5.4 South
Minsk -6.9 -6.2 -1.9 5.4 12.4 15.9 17.4 16.3 11.6 5.8 0.1 -4.2 East
Moscow -9.3 -7.6 -2.0 6.0 13.0 16.6 18.3 16.7 11.2 5.1 -1.1 -6.0 East
Oslo -4.3 -3.8 -0.6 4.4 10.3 14.9 16.9 15.4 11.1 5.7 0.5 -2.9 North
Paris 3.7 3.7 7.3 9.7 13.7 16.5 19.0 18.7 16.1 12.5 7.3 5.2 West
Prague -1.3 0.2 3.6 8.8 14.3 17.6 19.3 18.7 14.9 9.4 3.8 0.3 East
Reykjavik -0.3 0.1 0.8 2.9 6.5 9.3 11.1 10.6 7.9 4.5 1.7 0.2 North
Rome 7.1 8.2 10.5 13.7 17.8 21.7 24.4 24.1 20.9 16.5 11.7 8.3 South
Sarajevo -1.4 0.8 4.9 9.3 13.8 17.0 18.9 18.7 15.2 10.5 5.1 0.8 South
Sofia -1.7 0.2 4.3 9.7 14.3 17.7 20.0 19.5 15.8 10.7 5.0 0.6 East
Stockholm -3.5 -3.5 -1.3 3.5 9.2 14.6 17.2 16.0 11.7 6.5 1.7 -1.6 North
Which cities have similar weather patterns ?

How to characterize groups of cities ?
15 / 42
Temperature data : hierarchical tree

Cluster Dendrogram
6
6
5
4
3
2
5
1
0
inertia gain
4
3
2
1
0
Brussels
Moscow
Elsinki
Copenhagen
Stockholm
Amsterdam
Budapest
Reykjavik
London
Lisbon
Sarajevo
Krakow
Dublin
Athens
Madrid
Minsk
Berlin
Prague
Paris
Rome
Sofia
Oslo
Kiev
16 / 42
Temperature data
Loss in between-inertia
when going from
6
23 clusters to 22 clusters: 0.01
3 4 5
………………………………………….…
1 2
0
5 clusters to 4 clusters: 0.60 inertia gain
2 clusters to 1 clusters: 6.76 Important loss when going
from 2 clusters to a unique
cluster thus we prefer to
keep 2 custers
Sum of losses of inertia = 12

17 / 42
Using the tree to build a partition

Should we make 2 groups ? 3 ? 4 ?
Cluster Dendrogram
7
6
Cut into 2 groups :
5
between-class inertia 6.76
4
= = 56%
total inertia 12
3
2
What can we compare this

percentage with ?
1
0
Brussels
Moscow
Elsinki
Copenhagen
Stockholm
Amsterdam
Budapest
Reykjavik
London
Lisbon
Sarajevo
Krakow
Dublin
Athens
Madrid
Minsk
Berlin
Prague
Paris
Rome
Sofia
Oslo
Kiev
18 / 42
66 % of the information is contained in this 2-class cut

What can we compare this percentage with ?
4
Moscow Kiev Budapest

2
Dim 2 (15.40%)
Minsk
Athens
Krakow Sofia Madrid
Elsinki Oslo Rome
Prague Sarajevo
Stockholm
0
Berlin
Copenhagen Paris
Brussels
London Amsterdam
Lisbon
-2
Reykjavik Dublin
-5 0 5
Dim 1 (82.90%)
19 / 42

Cluster Dendrogram
7
6
Separate cold cities into 2

5
groups :
4
between-class inertia 2.36

3
= = 20%
total inertia 12
2
1
0
Brussels
Moscow
Elsinki
Copenhagen
Stockholm
Amsterdam
Budapest
Reykjavik
London
Lisbon
Sarajevo
Krakow
Dublin
Athens
Madrid
Minsk
Berlin
Prague
Paris
Rome
Sofia
Oslo
Kiev
19 / 42
The move from 23 cities to 3 classes : 56 % + 20 % = 76 % of

the variability in the data
4
Moscow Kiev
Budapest
2
Dim 2 (15.40%)
Minsk
Krakow Athens
Prague Madrid
Elsinki Oslo Sarajevo Sofia Rome
Berlin
0
Stockholm
Copenhagen Paris
Brussels Amsterdam
London
Lisbon
-2
Reykjavik Dublin
-5 0 5
Dim 1 (82.90%)
20 / 42
Determining the number of classes

• Plot with the bars
• Starting from the tree
• Ultimate criterion :
• Depends on the use
(survey, etc.) interpretability of the
classes
Cluster Dendrogram
7
6
5
6
4
4
3
2
2
0
inertia gain
1
0
Brussels
Moscow
Elsinki
Copenhagen
Stockholm
Amsterdam
Budapest
Reykjavik
London
Lisbon
Sarajevo
Krakow
Dublin
Athens
Madrid
Minsk
Berlin
Prague
Paris
Rome
Sofia
Oslo
Kiev
20 / 42
1 Introduction
3 Example
5 Extras
21 / 42
Partitioning algorithm : K-means
Algorithm for aggregating around moving centers (K-means)
• Choose Moscow
● Kiev
●
Moscow
Kiev
Moscow
Kiev
Moscow
Kiev
2
Budapest Budapest Budapest Budapest
● ● ● ●
Minsk Minsk Minsk Minsk
●
Athens Athens Athens Athens
randomly
Krakow Sofia Krakow Sofia Krakow Sofia Krakow Sofia
1
● Prague ●
●
Madrid ● Prague
●
Madrid ● Prague
●
Madrid ● Prague
●
Madrid
● ● ● ● ● ● ●
Elsinki
● Oslo Sarajevo
●
Rome
Elsinki
Oslo Sarajevo
●
Rome
Elsinki
Oslo ●
Sarajevo
●
Rome
Elsinki
Oslo Sarajevo●
●
Rome
Dim 2 ( 15.4 %)
Dim 2 ( 15.4 %)
Dim 2 ( 15.4 %)
Dim 2 ( 15.4 %)
● ●
Stockholm Berlin Stockholm Berlin Stockholm Berlin Stockholm Berlin
● ●
●
● ● ● ●
0
Copenhagen Copenhagen Copenhagen Copenhagen
● ● ● ●
Paris Paris Paris Paris
Q centers
−1 ●
−1
−1
−1
Brussels Brussels Brussels Brussels
Amsterdam
● Amsterdam Amsterdam Amsterdam
●
London Lisbon London Lisbon London Lisbon London Lisbon
● ●
−2
−2
−2
−2
Dublin Dublin Dublin Dublin
●
Reykjavik Reykjavik Reykjavik Reykjavik
of gravity
−3
−3
−3
−3
●
−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8
Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)
• Assign the Moscow

Kiev
Moscow
Kiev
●
Moscow
Kiev
●
Moscow
Kiev
●
2
2
● ● ● ●
points to
1
1
● Prague Madrid ● Prague Madrid ● Prague Madrid ● Prague Madrid
● ● ● ●
● ● ● ●
Elsinki
Oslo ●
Sarajevo
●
Rome
Elsinki
Oslo ●
Sarajevo
●
Rome
Elsinki
Oslo Sarajevo
●
Rome
Elsinki
Oslo Sarajevo
●
Rome
Dim 2 ( 15.4 %)
Dim 2 ( 15.4 %)
Dim 2 ( 15.4 %)
Dim 2 ( 15.4 %)
● ● ●
● ●
0
0
●
Copenhagen Copenhagen Copenhagen Copenhagen
● ● ● ●
the closest
●
−1
−1
−1
−1
Amsterdam Amsterdam
● Amsterdam
● Amsterdam
●
London Lisbon ●
London Lisbon ●
London Lisbon ●
London Lisbon
● ● ●
−2
−2
−2
−2
● ● ●
center
−3
−3
−3
−3
−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8
Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)
• Calculate Moscow
Kiev
●
Moscow
Kiev
Moscow
Kiev
Moscow
Kiev
2
2
● ● ● ●
anew the
1
1
● Prague Madrid ● Prague Madrid ● Prague Madrid ● Prague Madrid
● ● ● ●
● ● ● ●
Elsinki Elsinki Elsinki Elsinki
Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo Rome
● ● ● ●
Dim 2 ( 15.4 %)
Dim 2 ( 15.4 %)
Dim 2 ( 15.4 %)
Dim 2 ( 15.4 %)
● ● ● ●
0
0
Copenhagen
●
Copenhagen
● Copenhagen ● Copenhagen ●
● ● ● ●
Q centers
● ● ● ●
−1
−1
−1
−1
Amsterdam
● Amsterdam
● Amsterdam
● Amsterdam
●
●
London Lisbon ●
London Lisbon ●
London Lisbon ●
London Lisbon
● ● ● ●
−2
−2
−2
−2
● ● ● ●
of gravity
−3
−3
−3
−3
−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8
Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)
22 / 42
1 Introduction
3 Example
5 Extras
23 / 42
Robustifying a partition obtained using hierarchical

clustering
The partition obtained by hierarchical clustering is not optimal and

can be improved or made robust using K-means
Algorithm :
• use the obtained hierarchical partition to initialize K-means
• run a few iterations of K-means
=⇒ potentially improved partition

Advantage : more robust partition
Disadvantage : loss of hierarchical structure
24 / 42
Hierarchical clustering in high dimension

• If many variables : do PCA and keep only first axes =⇒ takes
us to classical case
• If many individuals, hierarchical algorithm is too long
• Use K-means to partition into around 100 classes
• Build tree using these classes (weighted by the number of
individuals in each class)
• Gives us the “top” of the tree
25 / 42
Introduction
0.00 0.02 0.04 0.06 0.08
158
124
146
118
123
117
169
139
140
285
115
155
176
70
116
11
90
125
130
143
129
218
298
114
174
106
107
73
147
282
292
105
150
137
94
149
82
23
161
83
197
47
194
142
181
290
1
263
88
46
75
26
294
86
54
205
6
4
223
217
198
191
203
209
262
180
189
186
102
226
188
187
200
38
167
154
40
163
37
63
152
166
213
228
296
20
231
164
275
29
84
291
260
109
27
7
30
178
45
214
119
111
5
77
170
71
121
12
134
55
193
237
219
76
97
258
41
144
141
177
36
184
254
230
145
250
175
43
247
284
91
216
85
157
104
179
131
293
126
245
93
16
288
206
196
24
81
113
153
15
159
9
128
151
13
19
49
42
300
156
183
2
232
162
34
242
96
132
8
us to classical case
14
257
256
227
222
211
212
249
165
199
195
182
202
220
190
53
100
208
74
95
31
168
248
299
Cluster Dendrogram
171
287
272
297
229
261
62
66
22
44
127
148
204
273
274
135
136
239
103
98
80
286
289
210
215
221
28
Principles of hierarchical clustering
72
236
69
138
281
78
89
270
61
99
267
233
57
18
279
244
234
60
246
225
50
10
173
17
35
278
269
240
224
120
52
271
277
68
108
101
264
133
268
87
259
265
122
251
252
59
295
207
283
67
192
276
92
65
110
25
238
79
266
255
39
160
64
3
201
253
48
112
243
241
56
32
51
172
58
21
33
185
280
235
individuals in each class)
Tree from original data

Example
0.00 0.02 0.04 0.06 0.08

• Gives us the “top” of the tree
1
14
33
32
2
40
42
6
35
36
38
26
47
11
25
43
5
16
K-means
30
39
19
49
10
17
3
27
23
13
20
8
22
21
50
18
Hierarchical Classification
7
41
29
Hierarchical Clustering
31
Extras
34
Tree using classes
15
37
24
4
45
28
46
48
• Use K-means to partition into around 100 classes
9
44
12
• If many individuals, hierarchical algorithm is too long
Hierarchical clustering in high dimension
• Build tree using these classes (weighted by the number of

• If many variables : do PCA and keep only first axes =⇒ takes
Describing the classes found
25 / 42
Hierarchical clustering on qualitative data
Two strategies :
• Transform them to quantitative data

• Do MCA and keep only the first dimensions
• Do hierarchical clustering using the principal axes of the MCA
• Use measures/indices suitable for qualitative variables :
similarity indices, Jaccard index, etc.
26 / 42
Doing factor analysis followed by clustering
• Qualitative data : MCA outputs quantitative principal

components
• Factor analysis eliminates the last components, which are just
noise =⇒ more stable clustering
x.1 x.k x.K F1 FQ FK
PCA
Data Structure Noise
27 / 42
Doing factor analysis followed by clustering

• Representation of the tree and classes on two factor axes
=⇒ FA gives continuous information, the tree gives
discontinuous information. The tree hints at information
hidden in further axes
Hierarchical clustering on the factor map
cluster 1
7
cluster 2
cluster 3
6
5
4
height
3
2
Moscow Kiev 3
Budapest
1
Minsk Krakow Prague Sofia

2
Elsinki Oslo Athens 1
Berlin Sarajevo Madrid Rome 0
Stockholm Paris
Copenhagen Brussels -1
Reykjavik Dublin LondonAmsterdam Lisbon -2
-3
0
-6 -4 -2 0 2 4 6 8
Dim 1 (82.9%)
28 / 42
1 Introduction
3 Example
5 Extras
29 / 42
The class make-up : using “model individuals”

Model individuals : the ones closest to each class center
Cluster 1: Oslo Helsinki Stockholm Minsk Moscow
0.339 0.884 0.9224 0.9654 1.7664
Cluster 2: Berlin Sarajevo Brussels Prague Amsterdam
0.5764 0.7164 1.038 1.0556 1.124
Cluster 3: Rome Lisbon Madrid Athens
0.360 1.737 1.835 2.167
cluster 1
4
cluster 2
cluster 3
Moscow Kiev
Budapest
2
Minsk
Dim 2 (15.40%)
Krakow Athens
cluster 1 Prague Madrid
Elsinki Oslo Sarajevo Sofia Rome
Berlin
0
Stockholm cluster 3
Copenhagen cluster 2 Paris
Brussels Amsterdam
London
Lisbon
-2
Reykjavik Dublin
-5 0 5
Dim 1 (82.90%)
30 / 42
Characterizing/describing classes
• Goals :
• Find the variables which are most important for the partition
• Characterize a class (or group of individuals) in terms of
quantitative variables
• Sort the variables that best describe the classes
• Questions :
• Which variables best characterize the partition
• How can we characterize individuals in the 1st class ?
• Which variables describe them best ?
31 / 42
Characterizing/describing classes
Which variables best represent the partition ?
• For each quantitative variable :
• build an analysis of variance model between the quantitative
variable and the class variable
• do a Fisher test to detect class effect
• Sort the variables by increasing p-value
Eta2 P-value
October 0.8990 1.108e-10
March 0.8865 3.556e-10
November 0.8707 1.301e-09
September 0.8560 3.842e-09
April 0.8353 1.466e-08
February 0.8246 2.754e-08
December 0.7730 3.631e-07
January 0.7477 1.047e-06
August 0.7160 3.415e-06
July 0.6309 4.690e-05
May 0.5860 1.479e-04
June 0.5753 1.911e-04
32 / 42
Characterizing classes using quantitative variables
January ● ● ●
● ● ●● ●●
●● ●
●
● ●●
●● ●● ● ● ●
February ● ● ● ●● ● ●●
●● ● ●●● ● ● ● ● ●
March Elsinki ● ●● ● ●● ●● ● ● ●●● ●●● ● ● ● ● ● ●

Kiev
Minsk
April Moscow ●●● ● ●●● ●● ●●
●●●●●
● ●● ● ● ●
Oslo
Reykjavik
May Stockholm
● ● ●●
● ● ●●●●●●●
●●
●● ● ●● ● ●
Amsterdam
June Berlin ● ● ● ●●
●●●●●
●●●●
●● ●
● ● ●● ● ●
Brussels
Budapest
July Copenhagen ● ● ●●
●●●●●●
●●
● ● ●● ●● ●
Dublin
August Krakow ● ●●● ●●●●
●●●●●●● ● ●● ●● ●
London
Paris
September Prague ● ● ● ●●● ● ●●●●●
●●● ●● ● ● ●● ●
Sarajevo
Sofia
October Athens
● ●●
● ● ● ● ●● ●●●●●●●●● ● ● ● ● ●
Lisbon
November Madrid ● ●● ●● ● ●●●● ● ●●●● ● ● ● ●
Rome
December ● ●● ●●●
● ●
●●
●●
●● ● ●● ● ●
●
−10 0 10 20
33 / 42
1st idea : if the values of X for class q seem to be randomly drawn

from all the values of X , then X doesn’t characterize class q.
Random ● ● ●
● ● ●● ●●
●● ●
●● ●●
●● ●● ● ● ●
January ● ● ●
● ● ●● ●●
●● ●
●● ●●
●● ●● ● ● ●
−10 −5 0 5 10
Temperature
2nd idea : the more a random draw appears unlikely, the more X
characterizes class q.
34 / 42

Idea : use as reference a random draw of nq values from N
What values can x̄q take ? (i.e., what is the distribution of X̄q ?)
s2 N − nq

E(X̄q ) = x̄ V(X̄q ) =
nq N −1
L(X̄q ) = N because X̄q is a mean
x̄q − x̄
=⇒ Test statistic = r ∼ N (0, 1)
s2 N−nq
nq N−1
• If |test statistic| ≥ 1.96 then X characterizes class q

• and the more the test statistic is large, the better X
characterizes class q.
Idea : rank the variables by decreasing |test statistic|
35 / 42
$quanti$‘1‘
v.test Mean in Overall sd in Overall p.value
category mean category sd
July -1.99 16.80 18.90 2.450 3.33 0.046100
June -2.06 14.70 16.80 2.520 3.07 0.039600
August -2.48 15.50 18.30 2.260 3.53 0.013100
May -2.55 10.80 13.30 2.430 2.96 0.010800
September -3.14 11.00 14.70 1.670 3.68 0.001710
January -3.26 -5.14 0.17 2.630 5.07 0.001130
December -3.27 -2.91 1.84 1.830 4.52 0.001080
November -3.36 0.60 5.08 0.940 4.14 0.000781
April -3.39 4.67 8.38 1.550 3.40 0.000706
February -3.44 -4.60 0.96 2.340 5.01 0.000577
October -3.45 5.76 10.10 0.919 3.87 0.000553
March -3.68 -1.14 4.06 1.100 4.39 0.000238
36 / 42
$‘2‘
NULL
$‘3‘
September 3.81 21.20 14.70 1.54 3.68 0.000140
October 3.72 16.80 10.10 1.91 3.87 0.000201
August 3.71 24.40 18.30 1.88 3.53 0.000211
November 3.69 12.20 5.08 2.26 4.14 0.000222
July 3.60 24.50 18.90 2.09 3.33 0.000314
April 3.53 14.00 8.38 1.18 3.40 0.000413
March 3.45 11.10 4.06 1.27 4.39 0.000564
February 3.43 8.95 0.96 1.74 5.01 0.000593
June 3.39 21.60 16.80 1.86 3.07 0.000700
December 3.39 8.95 1.84 2.34 4.52 0.000706
January 3.29 7.92 0.17 2.08 5.07 0.000993
May 3.18 17.60 13.30 1.55 2.96 0.001460
37 / 42
Characterizing classes using qualitative variables
Which variables best characterize the partition ?
• For each qualitative variable, do a χ2 test between it and the

class variable
• Sort the variables by increasing p-value
$test.chi2
p.value df
Area 0.001195843 6
38 / 42
Characterizing classes using qualitative variables

Does the South category characterize the 3rd class ?
Cluster 3 Other cluster Total

South nmc = 4 1 nm = 5
Not south 0 18 18
Total nc = 4 19 n = 23
Test : H0 : nnmcc = nnm versus H1 : m abnormally overrepresented in

c
Under H0 : L(Nmc ) = H(nc , nnm , n) PH0 (Nmc ≥ nmc )
Cluster 3
Cla/Mod Mod/Cla Global p.value v.test
Area=South 80 100 21.74 0.000564 3.448
4 4 5
×100 = 80 ; ×100 = 100 ; ×100 = 21.74 ; PH(4, 5 ,23) [Nmc ≥ 4] = 0.000564
5 4 23 23
=⇒ H0 rejected, South is overrepresented in the 3rd class

Sort the categories in terms of p-values
39 / 42
Characterizing classes using factor axes
These are also quantitative variables
$‘1‘
Dim.1 -3.32 -3.37 0 0.85 3.15 0.000908
$‘2‘
Dim.3 -2.41 -0.18 0 0.22 0.36 0.015776
$‘3‘
Dim.1 3.86 5.66 0 1.26 3.15 0.000112
40 / 42
Conclusions
• Clustering can be done on tables of individuals vs quantitative

variables
⇒ MCA transforms qualitative variables into quantitative ones
• hierarchical clustering gives a hierarchical tree ⇒ number of

classes
• K-means can be used to make classes more robust
• Characterize classes by active and supplementary variables,

quantitative or qualitative
41 / 42
More
Husson
nalysis Series Lê Chapman & Hall/CRC Computer Science & Data Analysis Series
Pagès
Exploratory Multivariate
Analysis by Example Using R
Exploratory Multivariate Analysis by Example Using R
SECOND EDITION
Husson F., Lê S. & Pagès J. (2017)

Exploratory Multivariate Analysis by Example
Using R
2nd edition, 230 p., CRC/Press.
Second
Edition
François Husson • Sébastien Lê

31116 Jérôme Pagès
8-1-138-19634-6
90000
1 96346
The FactoMineR package for performing clustering :

http://factominer.free.fr/index_fr.html
Movies on Youtube :
• a Youtube channel: youtube.com/HussonFrancois
• a playlist with 11 movies in English
• a playlist with 17 movies in French
42 / 42

Clustering Course Slides

Uploaded by

Copyright:

Available Formats

You might also like

Clustering Course Slides

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Clustering Course Slides

Uploaded by

Copyright:

Available Formats

Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Applied Mathematics Department - Rennes Agrocampus

2 Principles of hierarchical clustering

4 Partitioning algorithm : K-means

6 Characterizing classes of individuals

Hierarchical example : the animal kingdom

2 Principles of hierarchical clustering

4 Partitioning algorithm : K-means

6 Characterizing classes of individuals

What data ? What goals ?

Clustering is for data tables : rows of indivi-

Measuring similarity of individuals :

Similarity between groups of individuals :

Trees and partitions

Trees always end up . . . cut through !

When is a partition a good one ?

=⇒ Two criteria. Which one to use ?

x̄k the mean of the xk , x̄qk the mean of the xk in class q

inertia between = 1 =⇒ ∀k, ∀q, ∀i, x = x̄

Warning : don’t just accept this criteria at face value : it depends

Saut minimum Ward

2 Principles of hierarchical clustering

4 Partitioning algorithm : K-means

6 Characterizing classes of individuals

Which cities have similar weather patterns ?

Temperature data : hierarchical tree

Sum of losses of inertia = 12

Using the tree to build a partition

What can we compare this

Using the tree to build a partition

66 % of the information is contained in this 2-class cut

Moscow Kiev Budapest

Using the tree to build a partition

Separate cold cities into 2

between-class inertia 2.36

Using the tree to build a partition

The move from 23 cities to 3 classes : 56 % + 20 % = 76 % of

Determining the number of classes

2 Principles of hierarchical clustering

4 Partitioning algorithm : K-means

6 Characterizing classes of individuals

Partitioning algorithm : K-means

Algorithm for aggregating around moving centers (K-means)

Athens Athens Athens Athens

Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)

• Assign the Moscow

Athens Athens Athens Athens

Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)

Athens Athens Athens Athens

Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)

2 Principles of hierarchical clustering

4 Partitioning algorithm : K-means

6 Characterizing classes of individuals

Robustifying a partition obtained using hierarchical

The partition obtained by hierarchical clustering is not optimal and

=⇒ potentially improved partition

Hierarchical clustering in high dimension

0.00 0.02 0.04 0.06 0.08