Professional Documents
Culture Documents
Experiment 3.1 K-Mean
Experiment 3.1 K-Mean
COURSE OUTCOMES
CO4 Evaluate machine learning model’s performance and apply learning strategy to
improve the performance of supervised and unsupervised learning model.
CO5 Develop a suitable model for supervised and unsupervised learning algorithm and
optimize the model on the expected accuracy.
K Means Clustering
In this model Data is divided into clusters on the basis of nearest mean to each cluster.
1. Identify 2 groups in 1D Array
from sklearn.cluster import KMeans
import numpy as np
data = np.array([1,2,3,4,5,6,7,8,9,10,91,92,93,94,95,96,97,98,99,100])
kmeans = KMeans(n_clusters=2).fit(data.reshape(-1,1))
kmeans.predict(data.reshape(-1,1))
kmeans = KMeans(n_clusters=5).fit(data.reshape(-1,1))
kmeans.predict(data.reshape(-1,1))
Now seeing this chart we can identify that there are 4 different clusters.
The k-means algorithm does this automatically, and in Scikit-Learn uses the typical estimator
API:
5. Plot a scatter Chart for 300 random numbers (For the same data increase the clusters to 5
say)
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
Viva Questions
1. What is the main difference between k-Means and k-Nearest Neighbours?
2. How is Entropy used as a Clustering Validation Measure?
3. How to determine k using the Elbow Method?
4. What is the difference between Classical k-Means and Spherical k-Means?
5. What is the difference between k-Means and k-Medians and when would you use one
over another?