Clustering Course Slides

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering

François Husson

Applied Mathematics Department - Rennes Agrocampus

husson@agrocampus-ouest.fr

1 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering

1 Introduction
2 Principles of hierarchical clustering
3 Example
4 K-means : a partitioning algorithm
5 Extras
• Making more robust partitions
• Clustering in high dimensions
• Qualitative variables and clustering
• Combining with factor analysis - clustering
6 Describing classes of individuals

1 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

2 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Introduction

• Definitions :
• Clustering is : making or building classes
• Class : set of individuals (or objects) with similar shared
characteristics
• Examples
• of clustering : animal kingdom, computer hard disk, geographic
division of France, etc.
• of classes : social classes, political classes, etc.
• Two types of clustering :
• hierarchical : tree
• partitioning methods

3 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical example : the animal kingdom

4 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

5 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

What data ? What goals ?

Clustering is for data tables : rows of indivi-


duals, columns of quantitative variables

4
Goals : build a tree structure that :
• shows hierarchical links between

3
individuals or groups of individuals

2
• detects a “natural” number of classes
1
in the population 0

H
6 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Critères

Measuring similarity of individuals :


• Euclidean distance
• similarity indices
• etc.

Similarity between groups of individuals :


• minimum jump or single linkage
(smallest distance) x xx
x x x xxx x
• complete linkage (largest distance) x x x x
• Ward criterion

7 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Algorithm
ABC
7th grouping DEFGH 4.07 {ABCDEFGH}

4
ABC DE
DE 4.72
6th grouping
FGH 4.07 1.81

3
ABC DE FG
DE 4.72
5th grouping FG 4.23 1.81
H 4.07 2.90 1.12

2
ABC D E FG
D 4.72 {ABC},{DEFGH}
4th grouping E 5.55 1.00
FG 4.07 2.01 1.81
H 4.75 3.16 2.90 1.12
{ABC},{DE},{FGH}
{ABC},{DE},{FG},{H}

1
ABC D E F G
D 4.72 {ABC},{D},{E},{FG},{H}
3rd grouping E 5.55 1.00 {ABC},{D},{E},{F},{G},{H}
F 4.07 2.01 2.06
{AC},{B},{D},{E},{F},{G},{H}
G 4.68 2.06 1.81 0.61
H 4.75 3.16 2.90 1.28 1.12 0 {A},{B},{C},{D},{E},{F},{G},{H}

B
C

F
E

G
H
AC B D E F G A B C D E F G
B 0.50 B 0.50
D 4.80 4.72 C 0.25 0.56 1st grouping
2nd grouping E 5.57 5.55 1.00 D 5.00 4.72 4.80
F 4.07 4.23 2.01 2.06 E 5.78 5.55 5.57 1.00
G 4.68 4.84 2.06 1.81 0.61 F 4.32 4.23 4.07 2.01 2.06
H 4.75 5.02 3.16 2.90 1.28 1.12 G 4.92 4.84 4.68 2.06 1.81 0.61
H 5.00 5.02 4.75 3.16 2.90 1.28 1.12

8 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Trees and partitions

1.5
1.0
Hierarchical Clustering

Trees always end up . . . cut through !

0.5
0.0
Click to cut the tree
inertia gain

1.5
Choosing a height to cut at
1.0
gives a partition ●
0.5
0.0

Casarsa
Parkhomenko
YURKOV
Lorenzo
NOOL
BOURGUIGNON
MARTINEAU
Karlivans
BARRAS
Uldal
HERNU
Turi
Karpov
Clay
Sebrle
Schoenbeck
Ojaniemi
Barras
Qi
Smirnov
Gomez
Zsivoczky
Macey
Smith
McMULLEN
Bernard
ZSIVOCZKY
Hernu
KARPOV
SEBRLE
Terek
Pogorelov
Korkizoglou
CLAY
BERNARD
Nool
Warners
Drews
WARNERS
Schwarzl
Averyanov
Remark : given how it was made, the partition is interesting but
not optimal

9 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Partition quality

When is a partition a good one ?


• If individuals placed in the same class are close to each other
• If individuals in different classes are far from each other

Mathematically speaking ?
• small within-class variability
• large between-class variability

=⇒ Two criteria. Which one to use ?

10 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Partition quality

x̄k the mean of the xk , x̄qk the mean of the xk in class q

K Q I K Q I K Q I
X X X X X X X X X
(xiqk − x̄k )2 = (xiqk − x̄qk )2 + (x̄qk − x̄k )2
k=1 q=1 i=1 k=1 q=1 i=1 k=1 q=1 i=1
| {z } | {z } | {z }
total inertia within-class inertia between-class inertia

x2
x1
x x

x3

=⇒ 1 criterion only !
11 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Partition quality
Partition quality is measured by :
between-class inertia
0≤ ≤1
total inertia
inertia between = 0 =⇒ ∀k, ∀q, x̄ = x̄
inertia total qk k
by variable, classes have the same means
Doesn’t allow us to classify

inertia between = 1 =⇒ ∀k, ∀q, ∀i, x = x̄


inertia total iqk qk
individuals in the same class are identical
Ideal for classifying

Warning : don’t just accept this criteria at face value : it depends


on the number of individuals and classes
12 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Ward’s method
• Initialize : 1 class = 1 individual =⇒ Between-class inertia =
total inertia
• At each step : combine classes a and b that minimize the
decrease in between-class inertia
ma mb
Inertia(a) + Inertia(b) = Inertia(a ∪ b) − d 2 (a, b)
ma + mb
| {z }
to minimize
Group together objects with small
weights and avoid chain effects
+ Group together classes
8 10

++++ +
Saut minimum Ward
++ ++++
+
with similar centers of
6
4

gravity
2

xx xx
xxxxxx
−2 0

x
1
6
5
10
3
15
2
4
7
13
9
8
11
12
14
16
18
25
26
19
20
30
23
22
27
24
28
29
17
21

1
6
5
10
7
13
8
11
2
12
9
3
4
15
14
16
18
25
26
24
28
29
17
19
20
30
23
21
xx x 22
27

−2 0 2 4 6 8 10

+
Direct use for cluste-
8 10

Saut minimum Ward


+++++ + + Saut minimum Ward
++*+
***+* +
***
ring
6

***
***
4

***
xx x*x*****
2

xxx*x*xx**
−2 0

x
1
31
32
6
10
33
7
35
34
5
13
36
37
38
39
40
41
42
43
44
45
46
47
48
49
26
57
56
50
51
52
53
54
55
18
25
3
15
22
27
19
20
2
30
23
4
24
28
9
29
21
8
11
12
14
16
17

1
31
32
6
5
10
8
33
11
7
12
35
34
13
36
2
9
3
4
15
14
16
18
25
24
28
29
17
19
20
30
23
21
53
54
55
22
27
26
57
56
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

xx x
−2 0 2 4 6 8 10
13 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

14 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Temperature data
• 23 individuals : European capitals
• 12 variables : mean monthly temperatures over 30 years
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Area
Amsterdam 2.9 2.5 5.7 8.2 12.5 14.8 17.1 17.1 14.5 11.4 7.0 4.4 West
Athens 9.1 9.7 11.7 15.4 20.1 24.5 27.4 27.2 23.8 19.2 14.6 11.0 South
Berlin -0.2 0.1 4.4 8.2 13.8 16.0 18.3 18.0 14.4 10.0 4.2 1.2 West
Brussels 3.3 3.3 6.7 8.9 12.8 15.6 17.8 17.8 15.0 11.1 6.7 4.4 West
Budapest -1.1 0.8 5.5 11.6 17.0 20.2 22.0 21.3 16.9 11.3 5.1 0.7 East
Copenhagen -0.4 -0.4 1.3 5.8 11.1 15.4 17.1 16.6 13.3 8.8 4.1 1.3 North
Dublin 4.8 5.0 5.9 7.8 10.4 13.3 15.0 14.6 12.7 9.7 6.7 5.4 North
Elsinki -5.8 -6.2 -2.7 3.1 10.2 14.0 17.2 14.9 9.7 5.2 0.1 -2.3 North
Kiev -5.9 -5.0 -0.3 7.4 14.3 17.8 19.4 18.5 13.7 7.5 1.2 -3.6 East
Krakow -3.7 -2.0 1.9 7.9 13.2 16.9 18.4 17.6 13.7 8.6 2.6 -1.7 East
Lisbon 10.5 11.3 12.8 14.5 16.7 19.4 21.5 21.9 20.4 17.4 13.7 11.1 South
London 3.4 4.2 5.5 8.3 11.9 15.1 16.9 16.5 14.0 10.2 6.3 4.4 North
Madrid 5.0 6.6 9.4 12.2 16.0 20.8 24.7 24.3 19.8 13.9 8.7 5.4 South
Minsk -6.9 -6.2 -1.9 5.4 12.4 15.9 17.4 16.3 11.6 5.8 0.1 -4.2 East
Moscow -9.3 -7.6 -2.0 6.0 13.0 16.6 18.3 16.7 11.2 5.1 -1.1 -6.0 East
Oslo -4.3 -3.8 -0.6 4.4 10.3 14.9 16.9 15.4 11.1 5.7 0.5 -2.9 North
Paris 3.7 3.7 7.3 9.7 13.7 16.5 19.0 18.7 16.1 12.5 7.3 5.2 West
Prague -1.3 0.2 3.6 8.8 14.3 17.6 19.3 18.7 14.9 9.4 3.8 0.3 East
Reykjavik -0.3 0.1 0.8 2.9 6.5 9.3 11.1 10.6 7.9 4.5 1.7 0.2 North
Rome 7.1 8.2 10.5 13.7 17.8 21.7 24.4 24.1 20.9 16.5 11.7 8.3 South
Sarajevo -1.4 0.8 4.9 9.3 13.8 17.0 18.9 18.7 15.2 10.5 5.1 0.8 South
Sofia -1.7 0.2 4.3 9.7 14.3 17.7 20.0 19.5 15.8 10.7 5.0 0.6 East
Stockholm -3.5 -3.5 -1.3 3.5 9.2 14.6 17.2 16.0 11.7 6.5 1.7 -1.6 North

Which cities have similar weather patterns ?


How to characterize groups of cities ?
15 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Temperature data : hierarchical tree


Hierarchical clustering
Cluster Dendrogram

6
6

5
4
3
2
5

1
0
inertia gain
4
3
2
1
0

Brussels
Moscow

Elsinki

Copenhagen
Stockholm

Amsterdam

Budapest
Reykjavik

London
Lisbon

Sarajevo
Krakow
Dublin
Athens

Madrid

Minsk

Berlin
Prague
Paris
Rome

Sofia
Oslo

Kiev

16 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Temperature data
Loss in between-inertia
when going from

6
23 clusters to 22 clusters: 0.01

3 4 5
22 clusters to 21 clusters: 0.01
21 clusters to 20 clusters: 0.01
………………………………………….…
9 clusters to 8 clusters: 0.15

1 2
8 clusters to 7 clusters: 0.16
7 clusters to 6 clusters: 0.27
6 clusters to 5 clusters: 0.29

0
5 clusters to 4 clusters: 0.60 inertia gain
4 clusters to 3 clusters: 0.76
3 clusters to 2 clusters: 2.36
2 clusters to 1 clusters: 6.76 Important loss when going
from 2 clusters to a unique
cluster thus we prefer to
keep 2 custers

Sum of losses of inertia = 12


17 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Using the tree to build a partition


Should we make 2 groups ? 3 ? 4 ?
Hierarchical clustering
Cluster Dendrogram

7
6
Cut into 2 groups :

5
between-class inertia 6.76
4
= = 56%
total inertia 12
3
2

What can we compare this


percentage with ?
1
0

Brussels
Moscow

Elsinki

Copenhagen
Stockholm

Amsterdam

Budapest
Reykjavik

London
Lisbon

Sarajevo
Krakow
Dublin
Athens

Madrid

Minsk

Berlin
Prague
Paris
Rome

Sofia
Oslo

Kiev
18 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Using the tree to build a partition

66 % of the information is contained in this 2-class cut


What can we compare this percentage with ?
4

Moscow Kiev Budapest


2
Dim 2 (15.40%)

Minsk
Athens
Krakow Sofia Madrid
Elsinki Oslo Rome
Prague Sarajevo
Stockholm
0

Berlin
Copenhagen Paris
Brussels
London Amsterdam
Lisbon
-2

Reykjavik Dublin

-5 0 5
Dim 1 (82.90%)

19 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Using the tree to build a partition


Hierarchical clustering
Cluster Dendrogram
7
6

Separate cold cities into 2


5

groups :
4

between-class inertia 2.36


3

= = 20%
total inertia 12
2
1
0

Brussels
Moscow

Elsinki

Copenhagen
Stockholm

Amsterdam

Budapest
Reykjavik

London
Lisbon

Sarajevo
Krakow
Dublin
Athens

Madrid

Minsk

Berlin
Prague
Paris
Rome

Sofia
Oslo

Kiev

19 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Using the tree to build a partition

The move from 23 cities to 3 classes : 56 % + 20 % = 76 % of


the variability in the data
4

Moscow Kiev
Budapest
2
Dim 2 (15.40%)

Minsk
Krakow Athens
Prague Madrid
Elsinki Oslo Sarajevo Sofia Rome
Berlin
0

Stockholm
Copenhagen Paris
Brussels Amsterdam
London
Lisbon
-2

Reykjavik Dublin

-5 0 5
Dim 1 (82.90%)

20 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Determining the number of classes


• Plot with the bars
• Starting from the tree
• Ultimate criterion :
• Depends on the use
(survey, etc.) interpretability of the
classes
Hierarchical clustering
Cluster Dendrogram
7
6
5

6
4

4
3

2
2

0
inertia gain
1
0

Brussels
Moscow

Elsinki

Copenhagen
Stockholm

Amsterdam

Budapest
Reykjavik

London
Lisbon

Sarajevo
Krakow
Dublin
Athens

Madrid

Minsk

Berlin
Prague
Paris
Rome

Sofia
Oslo

Kiev

20 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

21 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Partitioning algorithm : K-means

Algorithm for aggregating around moving centers (K-means)

• Choose Moscow
● Kiev

Moscow
Kiev
Moscow
Kiev
Moscow
Kiev

2
Budapest Budapest Budapest Budapest
● ● ● ●
Minsk Minsk Minsk Minsk

Athens Athens Athens Athens

randomly
Krakow Sofia Krakow Sofia Krakow Sofia Krakow Sofia

1
● Prague ●

Madrid ● Prague

Madrid ● Prague

Madrid ● Prague

Madrid
● ● ● ● ● ● ●
Elsinki
● Oslo Sarajevo

Rome
Elsinki
Oslo Sarajevo

Rome
Elsinki
Oslo ●
Sarajevo

Rome
Elsinki
Oslo Sarajevo●

Rome

Dim 2 ( 15.4 %)

Dim 2 ( 15.4 %)

Dim 2 ( 15.4 %)

Dim 2 ( 15.4 %)
● ●
Stockholm Berlin Stockholm Berlin Stockholm Berlin Stockholm Berlin
● ●

● ● ● ●

0
Copenhagen Copenhagen Copenhagen Copenhagen
● ● ● ●
Paris Paris Paris Paris

Q centers
−1 ●

−1

−1

−1
Brussels Brussels Brussels Brussels
Amsterdam
● Amsterdam Amsterdam Amsterdam

London Lisbon London Lisbon London Lisbon London Lisbon
● ●
−2

−2

−2

−2
Dublin Dublin Dublin Dublin

Reykjavik Reykjavik Reykjavik Reykjavik

of gravity
−3

−3

−3

−3

−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8

Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)

• Assign the Moscow


Kiev
Moscow
Kiev

Moscow
Kiev

Moscow
Kiev

2

2
Budapest Budapest Budapest Budapest
● ● ● ●
Minsk Minsk Minsk Minsk

Athens Athens Athens Athens

points to
Krakow Sofia Krakow Sofia Krakow Sofia Krakow Sofia
1

1
● Prague Madrid ● Prague Madrid ● Prague Madrid ● Prague Madrid
● ● ● ●
● ● ● ●
Elsinki
Oslo ●
Sarajevo

Rome
Elsinki
Oslo ●
Sarajevo

Rome
Elsinki
Oslo Sarajevo

Rome
Elsinki
Oslo Sarajevo

Rome
Dim 2 ( 15.4 %)

Dim 2 ( 15.4 %)

Dim 2 ( 15.4 %)

Dim 2 ( 15.4 %)
Stockholm Berlin Stockholm Berlin Stockholm Berlin Stockholm Berlin
● ● ●
● ●
0

0

Copenhagen Copenhagen Copenhagen Copenhagen
● ● ● ●
Paris Paris Paris Paris

the closest

−1

−1

−1

−1
Brussels Brussels Brussels Brussels
Amsterdam Amsterdam
● Amsterdam
● Amsterdam

London Lisbon ●
London Lisbon ●
London Lisbon ●
London Lisbon
● ● ●
−2

−2

−2

−2
Dublin Dublin Dublin Dublin
● ● ●
Reykjavik Reykjavik Reykjavik Reykjavik

center
−3

−3

−3

−3
−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8

Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)

• Calculate Moscow
Kiev

Moscow
Kiev
Moscow
Kiev
Moscow
Kiev
2

2
Budapest Budapest Budapest Budapest
● ● ● ●
Minsk Minsk Minsk Minsk

Athens Athens Athens Athens

anew the
Krakow Sofia Krakow Sofia Krakow Sofia Krakow Sofia
1

1
● Prague Madrid ● Prague Madrid ● Prague Madrid ● Prague Madrid
● ● ● ●
● ● ● ●
Elsinki Elsinki Elsinki Elsinki
Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo Rome
● ● ● ●
Dim 2 ( 15.4 %)

Dim 2 ( 15.4 %)

Dim 2 ( 15.4 %)

Dim 2 ( 15.4 %)
Stockholm Berlin Stockholm Berlin Stockholm Berlin Stockholm Berlin
● ● ● ●
0

0
Copenhagen

Copenhagen
● Copenhagen ● Copenhagen ●
● ● ● ●
Paris Paris Paris Paris

Q centers
● ● ● ●
−1

−1

−1

−1
Brussels Brussels Brussels Brussels
Amsterdam
● Amsterdam
● Amsterdam
● Amsterdam


London Lisbon ●
London Lisbon ●
London Lisbon ●
London Lisbon
● ● ● ●
−2

−2

−2

−2
Dublin Dublin Dublin Dublin
● ● ● ●
Reykjavik Reykjavik Reykjavik Reykjavik

of gravity
−3

−3

−3

−3
−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8

Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)

22 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

23 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Robustifying a partition obtained using hierarchical


clustering

The partition obtained by hierarchical clustering is not optimal and


can be improved or made robust using K-means

Algorithm :
• use the obtained hierarchical partition to initialize K-means
• run a few iterations of K-means

=⇒ potentially improved partition


Advantage : more robust partition
Disadvantage : loss of hierarchical structure

24 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering in high dimension


• If many variables : do PCA and keep only first axes =⇒ takes
us to classical case
• If many individuals, hierarchical algorithm is too long
• Use K-means to partition into around 100 classes
• Build tree using these classes (weighted by the number of
individuals in each class)
• Gives us the “top” of the tree

25 / 42
Introduction

0.00 0.02 0.04 0.06 0.08

158
124
146
118
123
117
169
139
140
285
115
155
176
70
116
11
90
125
130
143
129
218
298
114
174
106
107
73
147
282
292
105
150
137
94
149
82
23
161
83
197
47
194
142
181
290
1
263
88
46
75
26
294
86
54
205
6
4
223
217
198
191
203
209
262
180
189
186
102
226
188
187
200
38
167
154
40
163
37
63
152
166
213
228
296
20
231
164
275
29
84
291
260
109
27
7
30
178
45
214
119
111
5
77
170
71
121
12
134
55
193
237
219
76
97
258
41
144
141
177
36
184
254
230
145
250
175
43
247
284
91
216
85
157
104
179
131
293
126
245
93
16
288
206
196
24
81
113
153
15
159
9
128
151
13
19
49
42
300
156
183
2
232
162
34
242
96
132
8
us to classical case

14
257
256
227
222
211
212
249
165
199
195
182
202
220
190
53
100
208
74
95
31
168
248
299
Cluster Dendrogram
171
287
272
297
229
261
62
66
22
44
127
148
204
273
274
135
136
239
103
98
80
286
289
210
215
221
28
Principles of hierarchical clustering

72
236
Hierarchical clustering

69
138
281
78
89
270
61
99
267
233
57
18
279
244
234
60
246
225
50
10
173
17
35
278
269
240
224
120
52
271
277
68
108
101
264
133
268
87
259
265
122
251
252
59
295
207
283
67
192
276
92
65
110
25
238
79
266
255
39
160
64
3
201
253
48
112
243
241
56
32
51
172
58
21
33
185
280
235
individuals in each class)

Tree from original data


Example

0.00 0.02 0.04 0.06 0.08


• Gives us the “top” of the tree

1
14
33
32
2
40
42
6
35
36
38
26
47
11
25
43
5
16
K-means

30
39
19
49
10
17
3
27
23
13
20
8
22
21
50
18
Hierarchical Classification

7
41
29
Hierarchical Clustering

31
Extras

34
Tree using classes
15
37
24
4
45
28
46
48
• Use K-means to partition into around 100 classes

9
44
12
• If many individuals, hierarchical algorithm is too long
Hierarchical clustering in high dimension

• Build tree using these classes (weighted by the number of


• If many variables : do PCA and keep only first axes =⇒ takes
Describing the classes found

25 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering on qualitative data

Two strategies :

• Transform them to quantitative data


• Do MCA and keep only the first dimensions
• Do hierarchical clustering using the principal axes of the MCA
• Use measures/indices suitable for qualitative variables :
similarity indices, Jaccard index, etc.

26 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Doing factor analysis followed by clustering

• Qualitative data : MCA outputs quantitative principal


components
• Factor analysis eliminates the last components, which are just
noise =⇒ more stable clustering
x.1 x.k x.K F1 FQ FK

PCA
Data Structure Noise

27 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Doing factor analysis followed by clustering


• Representation of the tree and classes on two factor axes
=⇒ FA gives continuous information, the tree gives
discontinuous information. The tree hints at information
hidden in further axes
Hierarchical clustering on the factor map

cluster 1
7

cluster 2
cluster 3
6
5
4
height

3
2

Moscow Kiev 3
Budapest
1

Minsk Krakow Prague Sofia


2
Elsinki Oslo Athens 1
Berlin Sarajevo Madrid Rome 0
Stockholm Paris
Copenhagen Brussels -1
Reykjavik Dublin LondonAmsterdam Lisbon -2
-3
0

-6 -4 -2 0 2 4 6 8
Dim 1 (82.9%)
28 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

29 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

The class make-up : using “model individuals”


Model individuals : the ones closest to each class center
Cluster 1: Oslo Helsinki Stockholm Minsk Moscow
0.339 0.884 0.9224 0.9654 1.7664
Cluster 2: Berlin Sarajevo Brussels Prague Amsterdam
0.5764 0.7164 1.038 1.0556 1.124
Cluster 3: Rome Lisbon Madrid Athens
0.360 1.737 1.835 2.167
cluster 1
4

cluster 2
cluster 3
Moscow Kiev
Budapest
2

Minsk
Dim 2 (15.40%)

Krakow Athens
cluster 1 Prague Madrid
Elsinki Oslo Sarajevo Sofia Rome
Berlin
0

Stockholm cluster 3
Copenhagen cluster 2 Paris
Brussels Amsterdam
London
Lisbon
-2

Reykjavik Dublin

-5 0 5
Dim 1 (82.90%)
30 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Characterizing/describing classes

• Goals :
• Find the variables which are most important for the partition
• Characterize a class (or group of individuals) in terms of
quantitative variables
• Sort the variables that best describe the classes

• Questions :
• Which variables best characterize the partition
• How can we characterize individuals in the 1st class ?
• Which variables describe them best ?

31 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Characterizing/describing classes
Which variables best represent the partition ?
• For each quantitative variable :
• build an analysis of variance model between the quantitative
variable and the class variable
• do a Fisher test to detect class effect
• Sort the variables by increasing p-value

Eta2 P-value
October 0.8990 1.108e-10
March 0.8865 3.556e-10
November 0.8707 1.301e-09
September 0.8560 3.842e-09
April 0.8353 1.466e-08
February 0.8246 2.754e-08
December 0.7730 3.631e-07
January 0.7477 1.047e-06
August 0.7160 3.415e-06
July 0.6309 4.690e-05
May 0.5860 1.479e-04
June 0.5753 1.911e-04
32 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Characterizing classes using quantitative variables

January ● ● ●
● ● ●● ●●
●● ●

● ●●
●● ●● ● ● ●

February ● ● ● ●● ● ●●
●● ● ●●● ● ● ● ● ●

March Elsinki ● ●● ● ●● ●● ● ● ●●● ●●● ● ● ● ● ● ●


Kiev
Minsk
April Moscow ●●● ● ●●● ●● ●●
●●●●●
● ●● ● ● ●

Oslo
Reykjavik
May Stockholm
● ● ●●
● ● ●●●●●●●
●●
●● ● ●● ● ●

Amsterdam
June Berlin ● ● ● ●●
●●●●●
●●●●
●● ●
● ● ●● ● ●
Brussels
Budapest
July Copenhagen ● ● ●●
●●●●●●
●●
● ● ●● ●● ●

Dublin
August Krakow ● ●●● ●●●●
●●●●●●● ● ●● ●● ●
London
Paris
September Prague ● ● ● ●●● ● ●●●●●
●●● ●● ● ● ●● ●

Sarajevo
Sofia
October Athens
● ●●
● ● ● ● ●● ●●●●●●●●● ● ● ● ● ●

Lisbon
November Madrid ● ●● ●● ● ●●●● ● ●●●● ● ● ● ●
Rome

December ● ●● ●●●
● ●
●●
●●
●● ● ●● ● ●

−10 0 10 20
33 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Characterizing classes using quantitative variables

1st idea : if the values of X for class q seem to be randomly drawn


from all the values of X , then X doesn’t characterize class q.

Random ● ● ●
● ● ●● ●●
●● ●
●● ●●
●● ●● ● ● ●

January ● ● ●
● ● ●● ●●
●● ●
●● ●●
●● ●● ● ● ●

−10 −5 0 5 10

Temperature

2nd idea : the more a random draw appears unlikely, the more X
characterizes class q.

34 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Characterizing classes using quantitative variables


Idea : use as reference a random draw of nq values from N

What values can x̄q take ? (i.e., what is the distribution of X̄q ?)
s2 N − nq
 
E(X̄q ) = x̄ V(X̄q ) =
nq N −1
L(X̄q ) = N because X̄q is a mean

x̄q − x̄
=⇒ Test statistic = r   ∼ N (0, 1)
s2 N−nq
nq N−1

• If |test statistic| ≥ 1.96 then X characterizes class q


• and the more the test statistic is large, the better X
characterizes class q.
Idea : rank the variables by decreasing |test statistic|
35 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Characterizing classes using quantitative variables

$quanti$‘1‘
v.test Mean in Overall sd in Overall p.value
category mean category sd
July -1.99 16.80 18.90 2.450 3.33 0.046100
June -2.06 14.70 16.80 2.520 3.07 0.039600
August -2.48 15.50 18.30 2.260 3.53 0.013100
May -2.55 10.80 13.30 2.430 2.96 0.010800
September -3.14 11.00 14.70 1.670 3.68 0.001710
January -3.26 -5.14 0.17 2.630 5.07 0.001130
December -3.27 -2.91 1.84 1.830 4.52 0.001080
November -3.36 0.60 5.08 0.940 4.14 0.000781
April -3.39 4.67 8.38 1.550 3.40 0.000706
February -3.44 -4.60 0.96 2.340 5.01 0.000577
October -3.45 5.76 10.10 0.919 3.87 0.000553
March -3.68 -1.14 4.06 1.100 4.39 0.000238

36 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Characterizing classes using quantitative variables

$‘2‘
NULL
$‘3‘
v.test Mean in Overall sd in Overall p.value
category mean category sd
September 3.81 21.20 14.70 1.54 3.68 0.000140
October 3.72 16.80 10.10 1.91 3.87 0.000201
August 3.71 24.40 18.30 1.88 3.53 0.000211
November 3.69 12.20 5.08 2.26 4.14 0.000222
July 3.60 24.50 18.90 2.09 3.33 0.000314
April 3.53 14.00 8.38 1.18 3.40 0.000413
March 3.45 11.10 4.06 1.27 4.39 0.000564
February 3.43 8.95 0.96 1.74 5.01 0.000593
June 3.39 21.60 16.80 1.86 3.07 0.000700
December 3.39 8.95 1.84 2.34 4.52 0.000706
January 3.29 7.92 0.17 2.08 5.07 0.000993
May 3.18 17.60 13.30 1.55 2.96 0.001460

37 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Characterizing classes using qualitative variables

Which variables best characterize the partition ?

• For each qualitative variable, do a χ2 test between it and the


class variable
• Sort the variables by increasing p-value

$test.chi2
p.value df
Area 0.001195843 6

38 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Characterizing classes using qualitative variables


Does the South category characterize the 3rd class ?

Cluster 3 Other cluster Total


South nmc = 4 1 nm = 5
Not south 0 18 18
Total nc = 4 19 n = 23

Test : H0 : nnmcc = nnm versus H1 : m abnormally overrepresented in


c
Under H0 : L(Nmc ) = H(nc , nnm , n) PH0 (Nmc ≥ nmc )
Cluster 3
Cla/Mod Mod/Cla Global p.value v.test
Area=South 80 100 21.74 0.000564 3.448

4 4 5
×100 = 80 ; ×100 = 100 ; ×100 = 21.74 ; PH(4, 5 ,23) [Nmc ≥ 4] = 0.000564
5 4 23 23

=⇒ H0 rejected, South is overrepresented in the 3rd class


Sort the categories in terms of p-values
39 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Characterizing classes using factor axes

These are also quantitative variables

$‘1‘
v.test Mean in Overall sd in Overall p.value
category mean category sd
Dim.1 -3.32 -3.37 0 0.85 3.15 0.000908
$‘2‘
v.test Mean in Overall sd in Overall p.value
category mean category sd
Dim.3 -2.41 -0.18 0 0.22 0.36 0.015776
$‘3‘
v.test Mean in Overall sd in Overall p.value
category mean category sd
Dim.1 3.86 5.66 0 1.26 3.15 0.000112

40 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Conclusions

• Clustering can be done on tables of individuals vs quantitative


variables
⇒ MCA transforms qualitative variables into quantitative ones

• hierarchical clustering gives a hierarchical tree ⇒ number of


classes

• K-means can be used to make classes more robust

• Characterize classes by active and supplementary variables,


quantitative or qualitative

41 / 42
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

More
Husson
nalysis Series Lê Chapman & Hall/CRC Computer Science & Data Analysis Series
Pagès
Exploratory Multivariate
Analysis by Example Using R
Exploratory Multivariate Analysis by Example Using R

SECOND EDITION

Husson F., Lê S. & Pagès J. (2017)


Exploratory Multivariate Analysis by Example
Using R
2nd edition, 230 p., CRC/Press.
Second
Edition

François Husson • Sébastien Lê


31116 Jérôme Pagès
8-1-138-19634-6
90000

1 96346

The FactoMineR package for performing clustering :


http://factominer.free.fr/index_fr.html
Movies on Youtube :
• a Youtube channel: youtube.com/HussonFrancois
• a playlist with 11 movies in English
• a playlist with 17 movies in French
42 / 42

You might also like