Esam - DWM Lab 8

Name: Esam Ashfaq Date: 21-04-2024
PRN: 21070122049
Practical No: 8
___________________________________________________________________________
Title:
Implement DBSCAN data mining algorithm using both Python and DM tool
(RapidMiner)
___________________________________________________________________________
Objective:
Students will learn and implement:
• DBSCAN data mining algorithm

___________________________________________________________________________
Description:
Clustering:
Clustering algorithms are a core component of machine learning, grouping similar data points based on their
proximity or similarity within a dataset without needing pre-existing labels or guided instruction. These
algorithms uncover inherent patterns, structures, or relationships within data across various applications like
image recognition, customer segmentation, anomaly detection, and recommendation systems.
DBSCAN:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) specifically identifies clusters as dense
regions in the data space, distinct from areas of lower density which represent noise. The core principle of
DBSCAN revolves around defining clusters and noise within the dataset. Each point in a cluster should be in
close proximity to a minimum number of neighboring points, encapsulated by a specified neighborhood radius.
To effectively implement DBSCAN, two crucial parameters must be considered:
• Eps (ε): This parameter defines the radius around a data point within which other points are
considered its neighbors. Points within this radius are classified as neighbors if the distance between
them is less than or equal to ε. Selecting an appropriate ε is critical; a small value might classify too
much data as noise, whereas a large value could merge distinct clusters, consolidating a majority of
data points into a single cluster. Determining ε can be facilitated by methods such as analyzing the k-
distance graph.
• MinPts: This parameter specifies the minimum number of neighbors (data points) within the ε radius
required to define a core point. The choice of MinPts is influenced by the dataset's size, with larger
datasets necessitating higher values of MinPts. As a general guideline, MinPts should be at least 3, and
for larger datasets, it should be greater than or equal to the number of dimensions (D) in the dataset
plus one.
K-means clustering is effective for automatically identifying and grouping similar data points, making it valuable
for exploratory data analysis and uncovering underlying data structures. Its centroid-based approach offers a
straightforward and computationally efficient method for clustering large datasets.
___________________________________________________________________________
Program code (Python):
Dataset-
Code-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import DBSCAN
X, _ = make_blobs(n_samples=500, centers=3, n_features=2, random_state=20)
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], s=50, cmap='viridis')
plt.title("Generated Data Points")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
epsilon = 1
min_samples = 5
dbscan = DBSCAN(eps=epsilon, min_samples=min_samples)
clusters = dbscan.fit_predict(X)
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis', s=50)
plt.title("DBSCAN Clustering Result")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.scatter(X[clusters == -1, 0], X[clusters == -1, 1], c='red', marker='x', s=100, label='Noise
points')
plt.colorbar(label='Cluster')
plt.legend()
plt.show()
Input and Output:
___________________________________________________________________________
Model Design (RapidMiner):
Dataset-
(1000 Data Points)
Design-
Input and Output:

___________________________________________________________________________
Conclusion:
Thus, we have implemented DBSCAN.
___________________________________________________________________________

Esam - DWM Lab 8

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Esam - DWM Lab 8

Uploaded by

Copyright:

Available Formats

Name: Esam Ashfaq Date: 21-04-2024

• DBSCAN data mining algorithm

To effectively implement DBSCAN, two crucial parameters must be considered:

X, _ = make_blobs(n_samples=500, centers=3, n_features=2, random_state=20)

(1000 Data Points)

Input and Output:

You might also like