Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Cluster Analysis

Vanishree M
2

Outline
• Introduction
• Types of clusters
• Conducting cluster analysis
• Selecting clustering procedure
• Hierarchical Cluster Analysis
• Hierarchical Cluster Analysis: Example
• Hierarchical clustering – a case
• K means clustering
• K means clustering: Example
3

What is a cluster?
• Clustering refers to the grouping of records, observations, or cases into
classes of similar objects.
• A cluster is a collection of records that are similar to one another and
dissimilar to records in other clusters.
• Clustering differs from classification in that there is no target variable for
clustering.
• The clustering task does not try to classify, estimate, or predict the value of a
target variable. Instead, clustering algorithms seek to segment the entire
data set into relatively homogeneous subgroups or clusters, where the
similarity of the records within the cluster is maximized, and the similarity to
records outside this cluster is minimized.
4

Cluster Analysis - Concept


• All clustering methods have as their
goal the identification of groups of
records such that similarity within a
group is very high while the
similarity to records in other groups
is very low.
• As shown in Figure, clustering
algorithms seek to construct
clusters of records such that the
between-cluster variation is large
compared to the within-cluster
variation. This is somewhat
analogous to the concept behind
analysis of variance.
5

An ideal cluster

Variable 1

Variable 2
6

A Practical clustering situation

Variable 1

X
Variable 2
7

Cluster Analysis…

• How to measure similarity;


• How to recode categorical variables;
• How to standardize or normalize numerical variables;
• How many clusters we expect to uncover.
8

Cluster Analysis…
• For optimal performance, clustering algorithms, just like
algorithms for classification, require the data to be normalized
so that no particular variable or subset of variables dominates
the analysis. Analysts may use either the min– max
normalization or Z-score standardization
▫ Min − max normalization: X∗ = X − min( X ) / Range ( X )
▫ Z - score standardization: X∗ = X − mean ( X ) / SD ( X )
9

A Classification of Clustering Procedures


Clustering Procedures

Hierarchical Nonhierarchical Other

Agglomerative Divisive Two-Step

Linkage Variance Centroid Sequential Parallel Optimizing


Methods Methods Methods Threshold Threshold Partitioning

Ward’s
Method

Single Complete Average


Linkage Linkage Linkage
14-9
10

Conducting Cluster Analysis


Formulate the Problem

Select a Distance Measure

Select a Clustering Procedure

Decide on the Number of Clusters

Interpret and Profile Clusters

Assess the Validity of Clustering


11

Conducting Cluster Analysis: Hierarchical clustering

• Hierarchical clustering is characterized by the development of a hierarchy or


tree-like structure. Hierarchical methods can be agglomerative or divisive.
▫ Agglomerative clustering starts with each object in a separate cluster. Clusters are
formed by grouping objects into bigger and bigger clusters. This process is
continued until all objects are members of a single cluster.
▫ Divisive clustering starts with all the objects grouped in a single cluster. Clusters
are divided or split until each object is in a separate cluster.
▫ Agglomerative methods are commonly used in marketing research. They consist of
linkage methods, error sums of squares or variance methods, and centroid
methods.
12

Agglomerative and divisive clustering


13

Conducting Cluster Analysis: Select a Clustering Procedure – Linkage Method


Linkage Description
Single Linkage Minimal intercluster dissimilarity. Based on minimum distance, or the nearest neighbor
rule. At every stage, the distance between two clusters is the distance between their two
closest points.
Compute all pairwise dissimilarities between the observations in cluster A and the
observations in cluster B, and record the smallest of these dissimilarities.
Complete linkage Maximal intercluster dissimilarity. based on the maximum distance or the furthest
neighbor approach. In complete linkage, the distance between two clusters is calculated as
the distance between their two furthest points. Compute all pairwise dissimilarities
between the observations in cluster A and the observations in cluster B, and record the
largest of these dissimilarities.
Average Mean intercluster dissimilarity. The distance between two clusters is defined as the average
of the distances between all pairs of objects. Compute all pairwise dissimilarities between
the observations in cluster A and the observations in cluster B, and record the average of
these dissimilarities.
Centroid Dissimilarity between the centroid for cluster A (a mean vector of length p) and the
centroid for cluster B. Centroid linkage can result in undesirable inversions.
14

Linkage Methods of Clustering


Single Linkage
Minimum Distance

Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance

Cluster 1 Cluster 2
Average Linkage

Average Distance
Cluster 1 Cluster 2
15

Conducting Cluster Analysis: Select a Clustering Procedure – Variance Method


• The variance methods attempt to generate clusters to minimize the within-cluster variance.
• A commonly used variance method is the Ward's procedure. For each cluster, the means for
all the variables are computed. Then, for each object, the squared Euclidean distance to the
cluster means is calculated. These distances are summed for all the objects. At each stage,
the two clusters with the smallest increase in the overall sum of squares within cluster
distances are combined.
• This approach does not combine the two most similar objects successively. Instead, those
objects whose merger increases the overall within-cluster variance to the smallest possible
degree, are combined. If you expect somewhat equally sized clusters and the dataset does
not include outliers, you should always use Ward’s method
• In the centroid methods, the distance between two clusters is the distance between their
centroids (means for all the variables). Every time objects are grouped, a new centroid is
computed.
• Of the hierarchical methods, average linkage and Ward's methods have been shown to
perform better than the other procedures.
16

Other Agglomerative Clustering Methods


Ward’s Procedure

Centroid Method
17

Conducting Cluster Analysis: Nonhierarchical clustering


• The nonhierarchical clustering methods are frequently referred to as k-means
clustering. These methods include sequential threshold, parallel threshold, and
optimizing partitioning.
• In the sequential threshold method, a cluster center is selected and all objects within
a prespecified threshold value from the center are grouped together. Then a new
cluster center or seed is selected, and the process is repeated for the unclustered
points. Once an object is clustered with a seed, it is no longer considered for
clustering with subsequent seeds.
• The parallel threshold method operates similarly, except that several cluster centers
are selected simultaneously and objects within the threshold level are grouped with
the nearest center.
• The optimizing partitioning method differs from the two threshold procedures in that
objects can later be reassigned to clusters to optimize an overall criterion, such as
average within cluster distance for a given number of clusters.
18

Conducting Cluster Analysis: Select a Clustering Procedure


• It has been suggested that the hierarchical and nonhierarchical methods be
used in tandem. First, an initial clustering solution is obtained using a
hierarchical procedure, such as average linkage or Ward's. The number of
clusters and cluster centroids so obtained are used as inputs to the optimizing
partitioning method.

• Choice of a clustering method and choice of a distance measure are


interrelated. For example, squared Euclidean distances should be used with the
Ward's and centroid methods. Several nonhierarchical procedures also use
squared Euclidean distances.
19

Hierarchical Cluster Analysis: Example


• The first step is to decide on the characteristics that you will use to segment
your customers.
• Decide which clustering variables will be included in the analysis.
• For example, you may want to segment a market based on customers’ price
consciousness (x) and brand loyalty (y).
• These two variables can be measured on a 7-point scale with higher values
denoting a higher degree of price consciousness and brand loyalty. The values
of seven respondents are shown in Table and the scatter plot is shown in Fig.
20

Hierarchical Cluster Analysis: Example – Data and scatterplot


21

Clustering Algorithm – An example


• Calculate Euclidian distance and update the distance matrix
22

Clustering Algorithm – An example


• Merge two objects exhibiting the smallest distance in the matrix
• B and C and C and E have the same minimum distance. We select B and C
here
23

Clustering Algorithm – An example


• Then form a new distance matrix by considering the single linkage decision rule
• According to this rule, the distance from, for example, object A to the newly
formed cluster is the minimum of d(A, B) and d(A, C). As d(A, C)=2.236 is
smaller than d(A, B)=3, the distance from A to the newly formed cluster is
equal to d(A, C); that is, 2.236.
• Compute the distance with all other objects in the same manner
24

Clustering Algorithm – An example


• Continuing the clustering procedure, simply repeat the last step by merging
the objects in the new distance matrix that exhibit the smallest distance (in
this case, the newly formed cluster [B, C] and object E) and calculate the
distance from this new cluster to all other objects.
25

Clustering Algorithm – An example


26

Dendogram
• Read the dendrogram from
left to right.
• The vertical lines indicate the
distances at which objects
have been combined.
• For example, according to our
calculations above, objects B,
C, and E are merged at a
distance of 1.414.
27

Attitudinal Data For Clustering


Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 3
2 2 3 1 4 5 4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 3 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
8 7 3 7 4 1 4
9 2 4 3 3 6 3
10 3 5 3 6 4 6
11 1 3 2 3 5 3
12 5 4 5 4 2 4
13 2 2 1 5 4 4
14 4 6 4 6 4 7
15 6 5 4 2 1 4
16 3 5 4 6 4 7
17 4 4 7 2 2 5
18 3 7 2 6 4 3
19 4 6 3 7 2 7
20 2 3 2 4 7 2
28

Results of Hierarchical Clustering


Agglomeration Schedule Using Ward’s Procedure
Stage cluster
Clusters combined first appears
Stage Cluster 1 Cluster 2 Coefficient Cluster 1 Cluster 2 Next stage
1 14 16 1.000000 0 0 6
2 6 7 2.000000 0 0 7
3 2 13 3.500000 0 0 15
4 5 11 5.000000 0 0 11
5 3 8 6.500000 0 0 16
6 10 14 8.160000 0 1 9
7 6 12 10.166667 2 0 10
8 9 20 13.000000 0 0 11
9 4 10 15.583000 0 6 12
10 1 6 18.500000 6 7 13
11 5 9 23.000000 4 8 15
12 4 19 27.750000 9 0 17
13 1 17 33.100000 10 0 14
14 1 15 41.333000 13 0 16
15 2 5 51.833000 3 11 18
16 1 3 64.500000 14 5 19
17 4 18 79.667000 12 0 18
18 2 4 172.662000 15 17 19
19 1 2 328.600000 16 18 0
29

Results of Hierarchical Clustering


• The procedure followed by cluster analysis at Stage 1 is to cluster the two
cases that have the smallest squared Euclidean distance between them.
• Then the tool will recompute the distance measures between all single cases
and clusters (there is only one cluster of two cases after the first step).
• Next, the 2 cases (or clusters) with the smallest distance will be combined,
yielding either 2 clusters of 2 cases (with 17 cases unclustered) or one
cluster of 3 (with 18 cases unclustered). This process continues until all
cases are clustered into a single group.
30

Results of Hierarchical Clustering – Try different number of


clusters
Cluster Membership of Cases Using Ward’s Procedure
Number of Clusters
Label case 4 3 2

1 1 1 1
2 2 2 2
3 1 1 1
4 3 3 2
5 2 2 2
6 1 1 1
7 1 1 1
8 1 1 1
9 2 2 2
10 3 3 2
11 2 2 2
12 1 1 1
13 2 2 2
14 3 3 2
15 1 1 1
16 3 3 2
17 1 1 1
18 4 3 2
19 3 3 2
20 2 2 2
31

Hierarchical clustering – a case


• Thaltegos (http://www.thaltegos.com) is a German management consulting company focusing on
analytical approaches for marketing, sales, and after sales in the automotive industry. A major US car
manufacturer commissioned Thaltegos to support the launch of an innovative electric car. To better
position the car in the market, the manufacturer asked Thaltegos to provide transparency
concerning the European car market. In cooperation with a market research firm, Thaltegos gathered
data from major automotive manufacturers to develop a segmentation concept. The database
consists of the following vehicle characteristics, all of which have been measured on a ratio scale
(variable names in parentheses):
• – Engine displacement (displacement)
• – Turning moment in Nm (moment)
• – Horsepower (horsepower)
• – Length in mm (length)
• – Width in mm (width)
• – Net weight in kg (weight)
• – Trunk volume in liters (trunk)
• – Maximum speed in km/h (speed)
• – Acceleration 0–100 km/h in seconds (acceleration)
32

Hierarchical clustering – a case


33

Agglomeration schedule • In the first stage, objects 5 and 6


are merged at a distance of 0.149.
• From here onward, the resulting
cluster is labeled as indicated by
the first object involved in this
merger, which is object 5.
• The last column on the very right
tells in which stage of the
algorithm this cluster will appear
next.
• In this case, this happens in the
second step, where it is merged
with object 7 at a distance of
0.184. The resulting cluster is still
labeled 5, and so on.
34

Icicle Diagram • The diagram is read from


the bottom to the top;
the columns correspond
to the objects being
clustered, and the rows
represent the number of
clusters.
35

Scree Plot
36

Dendogram
37

Selection of clusters – Based on distance


38

Cluster Membership
• When we view the results, a
three-segment solution appears
promising.
• The first segment comprises
compact cars, whereas the second
segment contains sports cars, and
the third limousines.
• Increasing the solution by one
segment would further split up
the sports cars segment into two
sub-segments. This does not
appear to be very helpful, as now
two of the four segments
comprise only one object.
39

Conducting Cluster Analysis: Decide on the Number of Clusters


• Theoretical, conceptual, or practical considerations may suggest a certain
number of clusters.
• In hierarchical clustering, the distances at which clusters are combined can be
used as criteria. This information can be obtained from the agglomeration
schedule or from the dendrogram.
▫ The clusters having maximum distance can be treated as optimum solution
• Scree Plot - Plot the number of clusters on the x-axis (starting with the one-
cluster solution at the very left) against the distance at which objects or
clusters are combined on the y-axis. Using this plot, we then search for the
distinctive break (elbow).
40

K Means clustering
• The k -means clustering is a straightforward and effective algorithm for
finding clusters in data. The algorithm is as follows:
▫ Step 1: Ask the user how many clusters k the data set should be partitioned
into.
▫ Step 2: Randomly assign k records to be the initial cluster center locations.
▫ Step 3: For each record, find the nearest cluster center. Thus, each cluster
center “owns” a subset of the records, thereby representing a partition of
the data set. We therefore have k clusters, C 1 , C 2 , … , C k .
▫ Step 4: For each of the k clusters, find the cluster centroid, and update the
location of each cluster center to the new value of the centroid.
▫ Step 5: Repeat steps 3– 5 until convergence or termination.
41

K Means clustering
• Suppose that we have n data points ( a 1 , b 1 , c 1 ), ( a 2 , b 2 , c 2 ), … , ( a n , b
n , c n ), the centroid of these points is the of gravity of these points and is
located at point (∑ a i ∕ n , ∑ b i ∕ n ∑ c i ∕ n).
• For example, the points (1,1,1), (1,2,1), (1,3,1), and (2,1,1) would have centroid (
1+1+1+2 1 + 2 + 3 + 1 1 + 2 + 3 + 1
, , ) = ( 1 .25 , 1 .75 , 1 .00 )
4 4 4
• The algorithm terminates when the centroids no longer change. In other words,
the algorithm terminates when for all clusters C 1 , C 2 , … , C k , all the records
“owned” by each cluster center remain in that cluster. Alternatively, the
algorithm may terminate when some convergence criterion is met, such as no
significant shrinkage in the mean squared error (MSE)
42

K Means clustering - Example

x y
a 1 3
b 3 3
c 4 3
d 5 3
e 1 2
f 4 2
g 1 1
h 2 1
43

K Means clustering - Example


• Step 1 : Decide how many clusters k the data set should be partitioned into. Lets
say we are interested in two clusters. K = 2.
• Step 2 : Randomly assign k records to be the initial cluster center locations. For
this example, we assign the cluster centers to be m 1 = (1,1) and m 2 = (2,1).
• Step 3: (first pass): For each record, find the nearest cluster center. Table
contains the (rounded) Euclidean distances between each point and each cluster
center m 1 = (1,1) and m 2 = (2,1), along with an indication of which cluster
center the point is nearest to. Therefore, cluster 1 contains points { a , e , g }, and
cluster 2 contains points { b , c , d , f , h }.
44

K Means clustering - Example


45

K Means clustering - Example


• Step 4 ( first pass ): For each of the k clusters find
the cluster centroid and update the location of
each cluster center to the new value of the
centroid.
• The centroid for cluster 1 is [(1 + 1 + 1)/3, (3 + 2 +
1)/3] = (1,2). The centroid for cluster 2 is [(3 + 4 +
5 + 4 + 2)/5, (3 + 3 + 3 + 2 + 1)/5] = (3.6, 2.4).
• The clusters and centroids (triangles) at the end of
the first pass are shown in Figure
• Note that m1 has moved up to the center of the
three points in cluster 1, while m2 has moved up
and to the right a considerable distance, to the
center of the five points in cluster 2.
• Step 5: Repeat steps 3 and 4 until convergence or
termination. The centroids have moved, so go
back to step 3 for our second pass through the
algorithm.
46

K Means clustering - Example


• Step 3 ( second pass ): For each record,
find the nearest cluster center.
• Table shows the distances between each
point and each updated cluster center m
1 = (1,2) and m 2 = (3.6, 2.4), together
with the resulting cluster membership.
• There has been a shift of a single record
( h ) from cluster 2 to cluster 1. The
relatively large change in m2 has left
record h now closer to m 1 than to m 2 ,
so that record h now belongs to cluster
1.
• All other records remain in the same
clusters as previously. Therefore, cluster
1 is { a , e , g , h }, and cluster 2 is { b , c ,
d , f }.
47

K Means clustering - Example


• Step 4 ( second pass ): For each of the k
clusters, find the cluster centroid and
update the location of each cluster center to
the new value of the centroid. The new
centroid for cluster 1 is [(1 + 1 + 1 + 2)/4, (3
+ 2 + 1 + 1)/4] = (1.25, 1.75). The new
centroid for cluster 2 is [(3 + 4 + 5 + 4)/4, (3
+ 3 + 3 + 2)/4] = (4, 2.75). The clusters and
centroids at the end of the second pass are
shown in Figure. Centroids m 1 and m 2
have both moved slightly.
• Step 5: Repeat steps 3 and 4 until
convergence or termination. As the
centroids have moved, we once again return
to step 3 for our third (and as it turns out,
final) pass through the algorithm.
48

K Means clustering - Example


• Step 3 ( third pass ): For each record, find
the nearest cluster center. Table shows the
distances between each point and each
newly updated cluster center m 1 = (1.25,
1.75) and m 2 = (4, 2.75), together with the
resulting cluster membership.
• Note that no records have shifted cluster
membership from the preceding pass.
• Step 4 ( third pass ): For each of the k
clusters, find the cluster centroid and update
the location of each cluster center to the
new value of the centroid. As no records
have shifted cluster membership, the cluster
centroids therefore also remain unchanged.
• Step 5: Repeat steps 3 and 4 until
convergence or termination. As the
centroids remain unchanged, the algorithm
terminates.

You might also like