Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

591

CHAPTER 20 Clueter Analyis


he variables are measured in vastly different units, the clustering solution wi e
nTiuenced by the units of measurement. In a supermarket shopping study, attituainal
variables may be measured on a nine-point Likert-type scale: patronage, in ters
frequency of visits per month and the dollar amount spent; and brand loyalty. In teis
Pereentage of grocery shopping expenditure allocated to the favorite supermarket. In these
Cases, before clustering respondents. we must standardize the data by rescaling eacn
hierarchical clustering Vanable to have a mean of zero and a standard deviation of unity. Although standardization
A
clustering procedure character Can remove the influence of the unit of measurement. it can also reduce the differences
ized by the development of a hierar oetween groups on variables that may best discriminate groups or clusters. It 1s also
chy or tree-like structure.

agglomerative clustering desirable to eliminate outliers (cases with atypical values).'o


results. Hence, it is
Hierarchical clustering procedure OSe of different distance measures may lead to different clustering
where each object starts out in a advisable to use different measures and compare the results. Having selected a distance or
separate cluster. Clusters are formed similarity measure, we can next select a clustering procedure.
by grouping objects into bigger and
bigger clusters.
divisive chusteringR Select a Clustering Procedure
Hierarchical clustering procedure Figure 20.4 is a classification of clustering procedures. Clustering procedures can be
where all objects start out in one hierarchical or nonhierarchical. Hierarchical clustering is characterized by the develop
giant cluster. Clusters are formed by or
dividing this cluster into smaller and ment of a hierarchy or tree-like structure. Hierarchical methods can be agglomerative
divisive. Agglomerative clustering starts with each obiect in a separate cluster. Clusters
smaller clusters.
linkage methods are formed by grouping objects into bigger and bigger clusters. This process is continued
Agglomerative methods of hierar until all objects are members of a single cluster. Divisive clustering starts with all the
is in a
chical clustering that cluster objects objects grouped in a single cluster. Clusters are divided or split until each object
based on a computation of the dis separate cluster.
tance between them.
Agglomerative methods are commonly used in marketing research. They consist of
single linkage linkage methods, error sums of squares or variance methods, and centroid methods.
Linkage method that is based on Linkage methods include single linkage, complete linkage, and average linkage. The
minimum distance or the nearest single linkage method is based on minimum distance or the nearest neighbor rule. The first
neighbor rule.

Figure 20.4 Chst


A
Classification of Clustering
Procedures

Nonhierarchical
Hierarchical

Divisive
Agglomerative

Sequential Parallel Optimizing


Threshold Threshold Partitioning

Variance Centroid
Linkage Methods Methods
Methods

Ward's
Method

Complete Average
Single Linkage Linkage
Linkage
592 PART II Dala Collection, Prparutim, Analuis, and Reporting
Figure 20.5 Single Linkage
Linkage Methods of Clustering

Minimunm
Distance
Cluster| Cluster 2

Complete Linkage

Maximum
Distance

Cluster l Cluster 2

Average Linkage

Average
Distance
complete linkage
Linkage method that is based on Cluster! Cluster2
maximum distance or the furthest
neighbor approach.
arerage linkage twoobjects clustered are those that have the smallest distance between them. The next
A
linkage method based on the
average distance between all pairs shortest distance is identified, and either the third object is clustered with the first two, or a
of objects, where one member of the new two-object cluster is formed. At every stage, the distance between two clusters is the
pair is from each of the clusters. distance between their two closest points (see Figure 20.5). Two clusters are merged at any
variance methods stage by the single shortest link between them. This process is continued until allobjects
An agglomerative method of hierar are in one cluster. The single linkage method does not work well when the clusters ae
chicalclustering in which clusters
are generated to minimize the poorly defined. The complete linkage method is similar to single linkage. except that it is
within-cluster variance. based on the maximum distance or the furthest neighbor approach. In complete inkage.
the distance between two clusters is calculated as the distance between their two furthest
Ward's procedure
Variance method in which the points. The average linkage method works similarly. However, in this method. the
squared euclidean distance to the distance between two clusters is defined as the average of the distances between all pairs
cluster means is minimized. of objects, where one member of the pair is from each of the clusters (Figure 20.5). As can
centroid methods be seen, the average linkage method uses information on allpairs of distances, not merely
A variance method of hierarchical the minimum or maximum distances. For this reason, it is usually preferred to the singe
clustering in which the distance and complete linkage methods.
between two clusters is the distance
between their centroids (means for The variance methods attempt to generate clustersto minimize the within-cluster
all the variables). variance. Acommonly used variance method is the Ward's procedure. For each cluster, he
nonhierarchical custering means for all the variables are computed. Then, for each object, the squared euclidean
A
procedure that first assigns or distance to the cluster means is calculated (Figure 20.6). These distances are summed tor
determines a cluster center and then all the objects. At each stage, the two clusters with the smallest increase in the overall sunt
groups all objects within a prespeci of squares within cluster distances are combined. In the centroid methods, the distanct
fied threshold value from the center.
between two clusters is the distance between their centroids (means for all the variables.
sequential threshold method as shown in Figure 20.6. Every time objects are grouped, a new centroid is computed.
A nonhierarchical clustering method the hierarchical methods, average linkage and Ward's methods have been shown "
in which acluster center is selected
and all objects within a prespecified perform better than the other procedures. l
threshold value from the center are The second type of clustering procedures, the nonhierarchical clustering methou
grouped together. frequently referred to as k-means clustering. These methods include sequential thresno
parallel threshold method parallel threshold, and optimizing partitioning. In the sequential threshold metho,
Nonhierarchical clustering method cluster center is selected and all objects within a prespecified threshold value from u
that specifies several cluster centers center are grouped together. Then a newcluster center or seed is selected, and the pro
at once. All objects that are within a
prespecified threshold value from is repeated for the unclustered points. Once an object is clustered with a seed. t
the center are grouped together. longer considered for clustering with subsequent seeds. The parallel threshold mele
593
CHAPTER 20 Cluster Analui
Figure 20.6
Other Agglomerative Clustering Ward's Method
Methods

Centroid Method

simultaneously. and
operates similarly, except that several cluster centers are selected
optimizing partitioning method objects within the threshold level are grouped with the nearest center. The optimnizing
Nonhierarchical clustering method partitioning method differs from the two threshold procedures in that objects can later be
that allows for later reassignment of overall criterion, such as average within-cluster
reassigned to clusters to optimize an
obiects to clusters to optimize an distance for a given number of clusters.
overall criterion.
the number of
Two major disadvantages of the nonhierarchical procedures are that Furthermore.
clustersmust be prespecified and the selection of cluster centers is arbitrary.
selected. Many nonhierarchical
the clustering results may depend on how the centers are
missing values as initial
programs select the first k (k= number of clusters) cases without
order of observations in the
cluster centers. Thus, the clustering results may depend on the
methods and has merit when
data. Yet nonhierarchical clustering is faster than hierarchical
suggested that the hierarchical and
the number of objects or observations is large. It has been is obtained
nonhierarchical methods be used in tandem. First, an initial clustering solution
Ward's. The number of clusters
using a hierarchical procedure, such as average linkage oroptimizing
as inputs to the partitioning method.2
and cluster centroids so obtained are used
Choice of a clustering method and choice of a distance measure are interrelated. For
should be used with the Ward's and centroid
example, squared euclidean distances
distances.
methods. Several nonhierarchical procedures also use squared euclidean clustering. The output
We will use the Ward's procedure to illustrate hierarchical
Table 20.2. Useful information is
obtained by clustering the data of Table 20.1 is given in cases or clusters
the number of
contained in the agglomeration schedule, which shows
stage 1, with 19 clusters.
being combined at each stage. The first line represents shown in the columns labeled
Respondents 14 and 16 are combined at this stage, as
these two respondents is
"Clusters Combined." The squared euclidean distance between "Stage Cluster First
entitled
oiven under the column labeled Coefficients." The column illustrate, an entry oflat
To
Appears" indicates the stage at which a cluster is tirst formed. last column, "Next
at stage 1. The
stage 6 indicates that respondent 14 was first grouped
(respondent) or cluster is combined with
Stage." indicates the stage at which another case
column is 6, we see that at stage 6.
this one Because the number in the first line of the last Similarly, the second Jine
cluster.
respondent 10 is combined with 14 and l6 to fornm a single and7 are grouped together
respondents 6
represents stage 2 with l8 clusters. In stage 2, plot given in Figure 20.7.
Another important part of the output is contained in the iciclecase respondents labeled
in this
The columns correspond to the objects being clustered, read from bottom to
through 20.The rows correspond to the number of clusters. This figure is
considered
cases are as individual clusters. Because there are 20 respondents.
top. At first, all objects are combined, resulting in
there are 20 initial clusters. At the first slep, the two closest
PART III Data Csllatim, Prepartin, Analyi, and Reportng
TABLE 20.2
Results of Hierarchical Clustering
CASE PROCESSING SUMMARY®,b
CASES
VALID MissING
TOTAL
Percent Percent zN

20 100,0 0.0 20
Percent
100.0
"Squared Euclidean Distance used
bWard Linkage

WARD LINKAGE
AGGLOMERATION SCHEDULE
STAGE CLUSTER FIRST
CLUSTER COMBINED APPEARS
STAGE CLUSTER 1 CLUSTER 2 COEFFICIENTS CLUSTER 1 CLUSTER 2 NEXT STAGE
14 16 1.000 6
6 7 2.000 7
13 3.500 15
5.000
3 6.500 16
10 |4 8.167 9
7 6 12 10.500
9 20 13.000
9 10 15.583 (0 6 12
6 18.500 7 13
23.000 4 8
12 19 27.750
13 1 17 33.100 14
14 41.333 13 16
2 S1.833 3 11
16 3 64.500 14
17 79.667 12
18 4 172.667 15 17 19
19 2 328.600 16 18

CLUSTER MEMBERSHIP
CASE 4CLUSTERS 3 CLUSTERS 2 CLUSTERS
1
2 2
3
4 3 2
2 2
6
7
8
9
3 2
2 2
12
13 2 2
14 3

16 3 3
17
18 4 3
19 3
20
1 X X X X X X X X X
X X X X X X X X X
X X X X X X X X X
X
6 X X X X X X X X X X X X
X X X X X X X
X X X X X X X X X X X X X
XX X
X X
7 X X X X X X X X X X X X X X X
X X X X X X X X X X X X
12 X X X X X X X X X
X X X X
X X
X X X X X X X
17 X X X X X X
X X X X X X X X X X X
X
X X X X X
X
15 X X XI X X
X X X X
X X X X X X X X X
X X X
3 X X X X X X X X X X X X X
X X X X X X
X X X X
X X X X X X X X X
X X X X X X X X X X X X X X X
X X
X

X X X X X X X X
2 X X X X X X X X X X X

X X X X X
X X X X X X X X X
13 X X X X X X
X X X X X X
X X X X X X X

X X X X
X XX
5
X X X X X X X X X X X X X X X

X X X X X X X X X
X X
X X X X X X X
11 X X X X X X X X X X X X

X X X X X X
X X XX
X
X X X X X X X X X X X X X
X X X X

X X X X X X
20 X X X X X X X X X X X X X X X X X X

X XX
X
X X X X X
X X X X X X X XX X X X X
4
X X X X X X X

X X X X X X X X X X
10 X XXX X X X X X

X X X X X X
X X X
X
X X X X X X X X X X X X
14 X X X
X X X X
X X X
X X X X X X X X X
Procedure
X X X X X
X X
X
X X X X
X X X X X X X X X
16 X X X
X X X X X X X
X X X X X X X X X X X X X X Ward's
19 X X
X X X X X X X X X X Using
X X X X
18 X X X X X X

CLUSTERS Plot
lcicle
20.7
Figure
OF Vertical
NUMBER
12 3 14 15 16 17 18 19
CASE 8 9 10
2 3 4

595
PART II Data Cllacton, Prpantiwm, Analut, and Reporting
14
Using Ward's
4

Number
Case
Label

12

10 15 20 2
Rescaled Distance Cluster Combine

19 clusters. The last line of Figure 20.7shows these 19


clusters. The two cases, respondents
14 and 16, that have been combined at this stage have between them all
Xs in rows Ithrough
19. Row number 18 corresponds to the next stage, with 18 clusters. At
this stage, respondents
6 and 7 are grouped together. The column of Xs between respondents 6 and 7 has a
blankin
row 19.Thus, at this stage there are 18 clusters; 16 of them consist of individual
respondents.
and two contain two respondents each. Each subsequent step leads to the formation of a new
cluster in one of three ways: (1) two individual cases are grouped together, (2) a case is joined
to an already existing cluster, or (3) two clusters are grouped together.
Another graphic device that is useful in displaying clustering results is the
dendrogram (see Figure 20.8). The dendrogram is read from left to right. Vertical lines
represent clusters that are joined together. The position of the line on the scale indicates
the distances at which clusters were joined. Because many of the distances in the early
stages are of similar magnitude, it is difficult to tell the sequence in which some of the
earlyclusters are formed. However, it is clear that in the last two stages, the distances at
which the clusters are being combined are large. This information is useful in deciding
on the number of clusters.
It is also possible to obtain information on cluster membership of cases if the number of
clusters is specified. Although this information can be discerned from the icicle plot, a
tabular display is helpful. Table 20.2 contains the cluster membership for the cases,
depending on whether the final solution contains two, three, or four clusters. Information of
this type can be obtained for any number of clusters and is useful for deciding on the number
of clusters.

Decide on the Number of Clusters


Amajor issue in cluster analysis is deciding on the number of clusters. Although there are
no hard and fast rules, some guidelines are available:
1. Theoretical, conceptual, or practical considerations may suggest a certain number ot
clusters. For example, if the purpose of clustering is to identify market segmets.
management may want a particular number of clusters.
2. In hierarchical clustering, the distances at which clusters are combined can be used as
criteria. This information can be obtained from the agglomeration schedule or from
the dendrogram. In our case, we see from the agglomeration schedule in Table 20.
that the value in the "Coefficients" column suddenly more than doubles between
stages 17 (3 clusterS) and 18 (2 clusters). Likewise, at the last two stages of the de

You might also like