Professional Documents
Culture Documents
Kmeans Algorithm
Kmeans Algorithm
1
K-Means Clustering
k 2
arg min ∑ ∑ Χ j − mi
S i =1 Χ j ∈S i
Basic idea
• proposed by Hugo Steinhaus in 1956
Standard Algorithm
• proposed by Stuart Lloyd in 1957
• for a pulse-code modulation technique
The term “K-means”
• proposed by James MacQueen in 1967
Update:
90 90
n=20, k=3
80 80
70 70
Operation flow
60 60 1. Select initial centroid
50 50
(random)
40 40 2. Calculate Euclidian
30 30 distance
20 20 3. Assign group (find
10 10 minimum distance)
0
0 10 20 30 40 50 60 70 80 90
0
0 10 20 30 40 50 60 70 80 90 4. Calculate position of
new centroid
Initial positions & 1st step
grouping 5. Calculate stop
100 100
condition
90 90
80 80
70 70
60 60
50 50
40 40
30 30
20 20
K-Means clustering
• is a fast and simple algorithm
• to solve clustering problem
But the algorithm
• does not necessarily find optimal configuration
• due to initialization problem
• by random or heuristic selection
And so k-means algorithm
• can be run multiple times
• to reduce above effect.
8
References
Joaquin Perez Ortega, Ma. Del Rocio Boone Rojas, and Maria J.
Somodevilla Garica, “Research issues on K-means Algorithm:
An Experimental Trial Using Matlab”, Proceedings of the 2nd
Workshop on Semantic Web and New Technologies (SemWeb09),
Puebla, Mexico, March 23-24, 2009.