Professional Documents
Culture Documents
Intelligent System: Lecture Notes For Chapter 7
Intelligent System: Lecture Notes For Chapter 7
Intelligent System: Lecture Notes For Chapter 7
4 Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP,
Schlumberger-UP
Oil-UP
Data Summarization
– Reduce the size of large
data sets
Clustering precipitation
in Australia
p1
p3 p4
p2
p1 p2 p3 p4
p1
p3 p4
p2
p1 p2 p3 p4
Well-separated clusters
Center-based (Prototype-based) clusters
Contiguous-based (Graph-based) clusters
Density-based clusters
Property or Conceptual
Described by an Objective Function
Well-Separated Clusters:
– A cluster is a set of points such that any point in a cluster is closer (or more
similar) to every other point in the cluster than to any point not in the cluster.
– Sometimes a threshold is used to specify that all the objects in a cluster must
be sufficiently close (or similar) to one another.
– This idealistic definition of a cluster is satisfied only when the data contains
natural clusters that are quite far from each other.
3 well-separated clusters :
Each point is closer to all of the
points in its cluster than to any
point in another cluster.
– The distance between any two points in different groups is larger than the
distance between any two points within a group.
– Well-separated clusters do not need to be globular, but can have any shape
Center-based (Prototype-based)
– A cluster is a set of objects such that an object in a cluster is
closer (more similar) to the “center” of a cluster, than to the
center of any other cluster.
– The center of a cluster is often a centroid, the average of all
the points in the cluster, or a medoid, the most “representative”
point of a cluster.
8 contiguous clusters
Density-based
– A cluster is a dense region of points, which is separated by
low-density regions, from other regions of high density.
– Used when the clusters are irregular or intertwined, and when
noise and outliers are present.
– The two circular clusters are not merged, in previous fig
because the bridge between them fades into the noise.
6 density-based clusters
2 Overlapping Circles
02/14/2018 Introduction to Data Mining, 2nd Edition 17
Clustering Algorithms
2.5
1.5
y
0.5
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
First of all, compute the sum of the square error (SSE) for some values
of k (for example 2, 4, 6, 8, etc.).
The SSE is defined as the sum of the square distance between each
member of the cluster and its centroid.
K
SSE dist 2 ( mi , x )
i 1 xCi