Pam Clustering Technique

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

1

PAM CLUSTERING
TECHNIQUE
SAMAKSH TANDON
CSE 7TH SEM,13000116066
CS-704C
2
What is cluster analysis

Cluster: a collection of Cluster analysis Clustering is Typical applications


data objects unsupervised
classification: no
predefined classes
Similar to one another within Grouping a set of data objects As a stand-alone tool to get
the same cluster into clusters insight into data distribution
Dissimilar to the objects in other As a preprocessing step for
clusters other algorithms
3
General application of clustering

Spatial Data Image Processing Economic Science WWW


Pattern recognition
Analysis (especially market
research)
create thematic maps in Document classification
GIS by clustering feature Cluster Weblog data to
spaces discover groups of
detect spatial clusters similar access patterns
and explain them in
spatial data mining
4
Examples of clustering analysis

Marketing: Help Land use: Identification Insurance: Identifying City-planning: Earth-quake studies:
marketers discover of areas of similar land groups of motor Identifying groups of Observed earth quake
distinct groups in their use in an earth insurance policy holders houses according to epicenters should be
customer bases, and observation database with a high average their house type, value, clustered along
then use this knowledge claim cost and geographical continent faults
to develop targeted location
marketing programs
5
 Partitioning method: Construct a partition of
a database D of n objects into a set of k
clusters.
 Given a k, find a partition of k clusters that
Partitioning optimizes the chosen partitioning criterion.
1. Global optimal: exhaustively enumerate all
Algorithms: partitions

Basic
2. Heuristic methods: k-means and k-medoids
algorithms

Concept 3. k-means (MacQueen’67): Each cluster is


represented by the center of the cluster
4. k-medoids or PAM (Partition around medoids)
(Kaufman & Rousseeuw’87): Each cluster is
represented by one of the objects in the
cluster
6
 Find representative objects, called medoids,
in clusters
 PAM (Partitioning Around Medoids, 1987)
1. starts from an initial set of medoids and
iteratively replaces one of the medoids by one

The K-Medoids
of the non-medoids if it improves the total
distance of the resulting clustering

Clustering 2. PAM works effectively for small data sets, but


does not scale well for large data sets

Method  CLARA (Kaufmann & Rousseeuw, 1990)


 CLARANS (Ng & Han, 1994): Randomized
sampling
 Focusing + spatial data structure (Ester et al.,
1995)
7

Typical k-
medoids
algorithm
(PAM)
8

 PAM (Kaufman and Rousseeuw, 1987),


built in Splus
Use real object to represent the cluster
PAM 

1. Select k representative objects arbitrarily


(Partitioning 2. For each pair of non-selected object h and

Around selected object i, calculate the total


swapping cost TC i h

Medoids) 3. For each pair of i and h,

(1987) 


If TCih < 0, i is replaced by h
Then assign each non-selected object to the
most similar representative object

4. repeat steps 2-3 until there is no change


9
PAM
Clustering:
Total
swapping
cost TCih
=∑jCjih
10
What is the problem with PAM?

 Pam is more robust than k-means in the presence of noise and


outliers because a medoid is less influenced by outliers or other
extreme values than a mean
 Pam works efficiently for small data sets but does not scale well for
large data sets.
1. O(k(n-k)2 ) for each iteration where n is # of data,k is # of clusters

 Sampling based method, CLARA(Clustering LARge Applications)

You might also like