Page - 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Sr.

No: 39
Name: Vanraj Pardeshi
Experiment No. 7
Aim: Implement any Clustering algorithm using an Open-source tool.
LO: LO3:- Implement appropriate Data mining methods like classification, clustering, or association mining on
large datasets using open-source tools like WEKA.
Theory:
Clustering:
Clustering is a type of unsupervised learning in machine learning where the aim is to group similar data points
together into clusters or subgroups based on their features or characteristics. The objective is to find patterns and
structures within the data that may not be immediately apparent and to uncover hidden relationships or similarities
between data points.
Clustering has a wide range of applications in fields such as image processing, data mining, bioinformatics, and
marketing, among others. Some common use cases include customer segmentation, anomaly detection, and
pattern recognition.

K-means clustering:
K-means clustering is a type of partition-based clustering algorithm used to group similar data points together
into K clusters. The algorithm works by iteratively assigning data points to the nearest cluster centroid, and then
recomputing the centroids based on the new cluster assignments. The goal is to minimize the sum of squared
distances between each data point and its assigned cluster centroid.
K-means clustering is a popular and widely used algorithm due to its simplicity and efficiency. However, it has
some limitations, such as being sensitive to the initial random initialization of the cluster centroids and requiring
a fixed number of clusters. There are also variations and extensions of K-means, such as K-Means++, which uses
a smarter initialization method to improve the quality of the resulting clusters.
Steps:
Step 1: Open Weka tool and go to “Explorer”

Page | 1
Step 2: Click on “Open file” and select your iris.arff file

Click on open you will get the following screen

Step 3: Set the number of the clusters = 3

Page | 2
Step 4: Select the “Classes to clusters evaluation” and then click on Start.

Here we notice that.


Incorrectly clustered instances: 17.0 11.3333%
So, we need to minimize this incorrectly clustered instance percentage. So, let's ignore the attributes one by one
and let's find out the attribute which gives us less percentage of incorrectly clustered instances.
Here we ignore Sepal width and we get the following results

Result:
Incorrectly clustered instances: 14.0 9.3333%
Now let’s ignore the sepal length and check the results

Page | 3
Result:
Incorrectly clustered instances: 8.0 5.3333%
So, after ignoring sepal length we get the minimum incorrectly clustered instances.
Step 5: Click on Visualize

Conclusion: From the above Experiment, I understood how to Implement k-means Clustering algorithm using an
Open-source tool.

Page | 4

You might also like