Professional Documents
Culture Documents
Evolutional Study On KNN and K-Means Algorithms (SP)
Evolutional Study On KNN and K-Means Algorithms (SP)
Algorithms
hritika16@gmail.com
2
Associate Professor, Computer Science and Engineering, JIET, Jodhpur
2
anamika.choudhary@jietjodhpur.ac.in
1. Introduction
Machine learning is a sort of learning in which the computer picks up a lot of information
without being explicitly programmed. This is a form of AI (Artificial Intelligence) application
that offers the system the capacity to automatically learn from its experience and develop
itself. Machine learning can easily handle multi-dimensional and multi-variety data in a
dynamic environment. They rely on their own experience rather than formal programming to
do this instead. The creation of computer programmers that can access the data and utilize it
for their own learning is the main emphasis of machine learning. It is simple to see patterns in
data and create future judgements that will be better by starting learning with observations of
data, such as direct experience or teaching. [1]
K-means clustering is also known as portioning clustering. Suppose we have a database of 'n'
objects and we cut the 'k' portion of the data. Each portion represents a cluster, and [K <_N] K
<_N shows us that each of the objects belong to a cluster. And it also shows that each Cluster
contains at least one object.
KNN Algorithm is a very simple, Machine Learning Algorithm, based on Supervised Machine
Learning. With the help of KNN, which works to classify the data, the category of any new
input data can be told which category it will belong to. We can also use it for regression, but it
is mostly used to solve the classification problem. [7]
By using new data and labelled examples, this sort of algorithm utilizes what it has learnt in
the past to make predictions about the future. This learning approach creates a form of inferred
function that can anticipate output values with ease by evaluating a known training dataset. By
providing the system with enough training, it may serve as the target for any new input. This
learning process also compares the output with the intended, correct output to identify flaws,
which may then be corrected in the model.
Regression:
Regression is especially used in places like finance, investing. With the help of
regression, we can find out what is the change in the value of something.
Although there are many types of regression but we will talk about some of the
main types which are mostly used inside machine learning.
i. Linear Regression:
Within Linear Regression, one predicts the result of a dependent variable, with the
help of another independent variable.
Classification:
In this, the output variables are called labels or category because the output;
categorically it means that the output is divided into two more categories, such as
categorize an email in two ways.
In other words, classification; Classifies class labels based on training data and class
label. So, when we have to solve a problem where we have to create categories or labels,
we use Classification Algorithms.
A classification models that values; It is observed that tries to draw some kind of
conclusion from them. Given one or more inputs, a classification model tries to estimate
the value of one or more results. Discrete or Real-Valued in classification; Input can be
Variable.
Clustering
The most commonly used method within Unsupervised Learning is Clustering.
Clustering is a technique in which different groups of similar types of data are
created, these groups are called clusters. Now whenever an object has to be
detected, it will check which cluster the data value of that object is matching, and it
will be identified accordingly.
In the third step KNN Algorithm extracts the number of Nearest Neighbor
point from the new data point and also find out how many Nearest Neighbor
point are billed by the new data point.
KNN Algorithm, in the fourth step, designates the category of new data point.
The new data point will belong to the category whose more Nearest Neighbor
point will be considered as that category.
This distance, represented as 'D,' guides the assignment of each data point to
the cluster whose center is nearest.
The process iteratively continues, with the recalibration of cluster centers based
on the data points within each cluster.
Each data point is assigned to some cluster. A data point is made up of a
cluster whose center is closest to that data point.
The center of the newly formed groups is recalculated. To calculate the center
of a cluster is done through all the data points contained in that cluster.
CONCLUSION
In conclusion, this research delved into the comparative analysis of the K-nearest neighbor
(KNN) and K-means clustering algorithms within the realm of machine learning. While both
algorithms share the commonality of the letter 'k,' they serve distinct purposes. K-means
clustering, an unsupervised learning algorithm, excels in grouping data into clusters, offering
valuable insights into patterns and structures. On the other hand, KNN, a supervised learning
algorithm, proves effective for classification tasks, leveraging labeled data to make predictions
for new, unlabeled instances.
The study highlighted the simplicity and versatility of the KNN algorithm, suitable for
classification, regression, and search operations. Despite its ease of implementation, challenges
such as determining the optimal 'k' value and computational costs were acknowledged.
Conversely, the K-means clustering algorithm demonstrated efficiency in grouping data based
on specified attributes. However, limitations in handling noisy data, outliers, and the
requirement to predefine the number of clusters (k) were identified.
In practical applications, the selection between these algorithms depends on the nature of the
problem and data characteristics. Understanding the nuances of KNN and Kmeans aids
practitioners in making informed choices for diverse machine learning tasks. Importantly, the
research emphasizes that, despite their similar nomenclature, these algorithms significantly
differ in functionality and applicability within the machine learning landscape.
REFERENCES
[1] Domingos, P. (2012). A Few Useful Things to Know About Machine Learning. Communications of
the ACM, 55(10), 78–87. [DOI:
10.1145/2347736.2347755]
[3] Vaishnav, H., & Choudhary, A. (Year of Publication). Evolutional Study on KNN and K-means
Algorithms. Title of the Journal or Conference, Volume(Issue), Page Range. [DOI or URL if
applicable]
[4] Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques. Morgan Kaufmann.
[5] Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys
(CSUR), 31(3), 264–323. [DOI: 10.1145/331499.331504]
[6] Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on
Information Theory, 13(1), 21–
27. [DOI: 10.1109/TIT.1967.1053964]
[7] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining,
Inference, and Prediction (2nd ed.). Springer.
[8] MacQueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations. In
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1:
Statistics (pp. 281–297). University of California Press.
[9] Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification (2nd ed.). Wiley.