Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

EXPERIMENT 9

Aim: Implementation of K-Mean Clustering

COURSE OUTCOMES

CO4 Evaluate machine learning model’s performance and apply learning strategy to
improve the performance of supervised and unsupervised learning model.

CO5 Develop a suitable model for supervised and unsupervised learning algorithm and
optimize the model on the expected accuracy.

K Means Clustering
In this model Data is divided into clusters on the basis of nearest mean to each cluster.
1. Identify 2 groups in 1D Array
from sklearn.cluster import KMeans
import numpy as np

data = np.array([1,2,3,4,5,6,7,8,9,10,91,92,93,94,95,96,97,98,99,100])

kmeans = KMeans(n_clusters=2).fit(data.reshape(-1,1))
kmeans.predict(data.reshape(-1,1))

1. Identify 5 groups in 1D Array


from sklearn.cluster import KMeans
import numpy as np
data = np.array([101, 107, 106, 199, 204, 205, 207, 306, 310, 312, 312, 314, 317, 318, 380, 377,
379, 382, 466, 469, 471, 472, 557, 559, 562, 566, 569])

kmeans = KMeans(n_clusters=5).fit(data.reshape(-1,1))
kmeans.predict(data.reshape(-1,1))

2. Identify 2 groups in 2 D Array


from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
kmeans.predict([[0, 0], [12, 3]])
kmeans.predict([[11,11], [8, 9]])
kmeans.predict([[2,20], [4, 4]])
Explanation:
1 2
1 4
1 0
10 2
10 4
10 0
Ans is [1,0]
[0,0] will be predicted in Column No 1
[12,3] will be predicted in Column No 0

Similarly check [11,11] [8,9] it must come in [0,0]


And Check[2,2][4,4] it must come in [1,1]

3. Plotting K means cluster for 2D Group for 2 Clusters


from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
y_predict= kmeans.fit_predict(X)
#kmeans.predict([[0, 0], [12, 3]])

import matplotlib.pyplot as mtp

mtp.scatter(X[y_predict == 0, 0], X[y_predict == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')


#for first cluster
mtp.scatter(X[y_predict == 1, 0], X[y_predict == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
#for second cluster
mtp.xlim(0,10)
mtp.ylim(0,10)
mtp.show()

4. Plot a scatter Chart for 300 random numbers


%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set() # for plot styling
import numpy as np
from sklearn.datasets import make_blobs
X, y_true = make_blobs(n_samples=300, centers=4,
cluster_std=0.60, random_state=0)
plt.scatter(X[:, 0], X[:, 1], s=50);
# The scatter() function plots one dot for each observation. It needs two arrays of the same
length, one for the values of the x-axis, and one for values on the y-axis.
# Using : means that we take all elements in the correspond array dimension.
# s tells the size of the marker. (This is the size of the marker)

Now seeing this chart we can identify that there are 4 different clusters.
The k-means algorithm does this automatically, and in Scikit-Learn uses the typical estimator
API:

from sklearn.cluster import KMeans


kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);

5. Plot a scatter Chart for 300 random numbers (For the same data increase the clusters to 5
say)
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)

plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')


centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);

Figure 30: 5 Clusters


6. Plot a scatter Chart for 300 random numbers (For the same data increase the clusters to 6
say)

from sklearn.cluster import KMeans


kmeans = KMeans(n_clusters=6)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)

plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')


centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);

Figure 31: 6 Clusters


Similarly do the same for 7 Clusters and 8 Clusters

Figure 32: 7 Clusters

Figure 33: 12 Clusters

Viva Questions
1. What is the main difference between k-Means and k-Nearest Neighbours?
2. How is Entropy used as a Clustering Validation Measure?
3. How to determine k using the Elbow Method?
4. What is the difference between Classical k-Means and Spherical k-Means?
5. What is the difference between k-Means and k-Medians and when would you use one
over another?

You might also like