Professional Documents
Culture Documents
Cluster Unsupervised
Cluster Unsupervised
Cluster Unsupervised
• Use-
– Business Analytics
– Image Processing
– Web Search
Cluster
• Clustering is a process of partitioning a set of data
(or objects) into a set of meaningful sub-classes,
called clusters.
√( 𝑋 2 − 𝑋 1 ) + ( 𝑌 2 − 𝑦 1)
C2
C1 20,20
8,8
Clustering Algorithms
• Partitioning Methods
– K-Means
– K-Medoids
• Density-Based Methods
• Hierarchical Methods
– Agglomerative Approach
– The Divisive Approach
Random Forest Classification
• Boosting
• Bagging
Evaluating Classification Model
Performance
Confusion Matrix
Precision
Precision is defined as the ratio of True Positives count
to total True Positive count made by the model.
Precision = TP/(TP+FP)
Recall
Recall is defined as the ratio of True Positives count to
the total Actual Positive count.
Recall = TP/(TP+FN)
Eg: Use case: Out of all the non-Covid patients who visited the doctor, how many
were diagnosed as non-Covid.
Er. GOURAV
Fuzzy C-Means
This algorithm works by assigning membership to each data point corresponding to
each cluster center on the basis of distance between the cluster center and the data point.
More the data is near to the cluster center more is its
membership towards the particular cluster center. Clearly, summation of membership of
each data point should be equal to one. After each iteration membership and cluster
centers are updated according to the formula:
where,
• 'n' is the number of data points.
• 'vj' represents the jth cluster center. 'm' is the fuzziness index m € [1, ∞].
• 'c' represents the number of cluster center.
• 'µij' represents the membership of ith data to jth cluster center.
• 'dij' represents the Euclidean distance between ith data and jth cluster center.
• Main objective of fuzzy c-means algorithm is to minimize:
Where:
•c is the total number of clusters.
•m is the fuzziness parameter.
•djiis the distance between data point xiand cluster centroid cj.
•μijis the membership of data point xiin cluster j.
•The parameter m controls the degree of fuzziness
Advantages
1) Gives best result for overlapped data set and comparatively better then k-means algorithm.
2) Unlike k-means where data point must exclusively belong to one cluster center here data
point is assigned
membership to each cluster center as a result of which data point may belong to more then one
cluster center.
Disadvantages
1) Apriori specification of the number of clusters.
2) With lower value of β we get the better result but at the expense of more number of iteration.
3) Euclidean distance measures can unequally weight underlying factors.
Classifications (Predicting Classes)
The k-nearest neighbors (KNN) algorithm is a non-parametric, supervised
learning classifier
Now predict the genre of movie “E” with IMDb rating 7.4 and duration 144 minutes