Professional Documents
Culture Documents
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-clustering/
Learning Outcome
By the end of this lecture, you should be able to understand,
explain and apply K-Means Clustering.
• Similar is the measure of similarity (distance)
between “points” to be clustered
K-Means Clustering is
one of the simplest
unsupervised machine
learning algorithms
where it is fast and
efficient in terms of its
computational cost.
K-Means Clustering in Python
#K-Means Clustering on iris flower dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
#import the dataset
df = pd.read_csv('iris.csv')
#df.head(10)
#4 columns of features
x = df.iloc[:, [0,1,2,3]].values
kmeanss = KMeans(n_clusters=5)
y_kmeanss = kmeanss.fit_predict(x)
print(y_kmeanss)
kmeanss.cluster_centers_
#to find optimum number of cluster
Error =[]
for i in range(1, 11):
kmeans = KMeans(n_clusters = i).fit(x)
kmeans.fit(x)
Error.append(kmeans.inertia_)
import matplotlib.pyplot as plt
#the elbow indicates the optimal value of K
#edit and run again using the new K
plt.plot(range(1, 11), Error)
plt.title('Elbow method')
plt.xlabel('No of clusters')
plt.ylabel('Error')
plt.show()
Elbow method gives us an
idea on what a
good k number of clusters
would be based on the sum
of squared distance (SSE)
between data points and their Estimated K=3
assigned clusters’ centroids.
homogeneity score describes the closeness of the clustering algorithm to this perfection.
completeness score describes the closeness (number of classes within the same cluster) of the
clustering algorithm to this perfection.
V measure the harmonic mean/normalized between homogeneity and completeness.
adjusted Rand index a function that computes a similarity measure between two clusters
adjusted mutual information computes a similarity measure between two clusters by chance