Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)

Enhancing DBSCAN Algorithm for Data Mining


Surbhi Sharma Dr. Arvind K Sharma Dinesh Soni
M. Tech. Scholar Dept. of CSE Asst. Professor
Dept. of CSE University of Kota Dept. of CSE
Rajasthan Technical University Kota, India Rajasthan Technical University
Kota, India drarvindkumarsharma@gmail.com Kota, India
sharma.surbhi009@gmail.com

Abstract—Today data mining is widely used by companies performance. In 2016, Hahsler, Michael, and Matthew
with a strong consumer focus like retail, financial, Bolaños[11], presented a paper on Clustering data streams
communication and marketing organizations. Here technically based on shared density between micro-clusters in which
data mining is the process of extraction of required information space and time complexities are discussed.In 2013, Joshi,
from huge databases. It allows users to analyze data from many Aastha, and Rajneet Kaur[12], proposed a riview paper in
different dimensions or angles, categorize it and summarize the which a comparative study of various clustering techniques in
relationships identified. The ultimate goal of this paper is to data mining is discussed.In 2013, Nagpal , et al. [14] presented
propose a methodology for the improvement in DB-SCAN a riview paper on data clustering algorithms in which it is
algorithm to improve clustering accuracy. The proposed
observed that there is no optimal solution for handling
improvement is based on back propagation algorithm to calculate
problems with large data sets of mixed and categorical
Euclidean distance in the dynamic manner.Also this paper shows
the obtained results of implemented proposed and existing attributes. In 2012, Shah, Glory H.[17], proposed a paper in
methods and it compares the results in terms of its execution time which a new approach towards density based clustering
and accuracy. approach is discussed.In 2011, Pooja Batra, et al.[15]
presented a paper in which comparative study of density based
Keywords—Data Mining; DBSCAN; I-DBSCAN Clustering; clustering algorithms is perfomed based on several
MATLAB parameters.In 2006, Donghai, Zeng. [18], proposed a paper
which includes the Study of Clustering Algorithm Based on
Grid-Density and Spatial Partition Tree.In 2005, Moreira, et
I. INTRODUCTION al.[16] presented a paper in which density based clustering is
Data Mining is among one of the promising technology in performed on DBSCAN and SNN here, the role of the
the field of computer science[1] which is basically used for clustering algorithms is to identify clusters of POIs and then
extraction of information from a large collection of data, it use the clusters to automatically characterize geographic
mainly deals with large databases[2].Data Mining is mainly a regions.In 2004,El-Sonbaty, et al.[13], proposed a paper in
technique of analyzing data and converts that data into useful which density based clustering is performed on large
information or knowledge for decision making [3].Data datasets,Synthetic datasets are used for experimental
Mining usually takes data as its input and gives knowledge as evaluation which shows that the new clustering algorithm is
the required output. faster and more scalable than the original DBSCAN.
Data mining can be done through various approaches or by
applying a lot of algorithms available for data mining process, III. PROPOSED METHODOLOGY
among them clustering is one of the important algorithm. The density based technique is the type of algorithm in
Clustering simply means collecting and presenting similar data which density of the whole dataset is calculated and most
items [4]. The process of finding similarities between data and dense region is calculated to find similarity between the
makes groups of those similar data items into clusters is called elements of the dataset.The complete implementation of the
clustering [1]. Clustering can be performed by its various work is shown by using the flow of work in figure shown
algorithms among which some are based on density which are below :
called density based clustering algorithms, DBSCAN is also a In the existing work, technique of density based clustering
density based clustering algorithm which is used in this paper is applied in which density of whole dataset is calculated
for the process of data mining. and dense region is calculated. On the Dense region EPS
value is calculated to analyze similarity between the
II. LITERATURE REVIEW elements. The Euclidian distance is applied to analyze
similarity between the elements. The EPS is calculated in
On the basis of past literatures and articles some of the the dynamic order to achieve maximum accuracy. The
research works are discussed in this section in the domain of Euclidian distance is calculated in the static manner due to
data mining that are as follows : which accuracy is not achieved at the maximum point.In
In 2015, Ahmad M. Bakr, et al.[10],proposed a paper in this work, improvement in DBSCAN algorithm has been
which the proposed algorithm enhances the incremental proposed which calculate Euclidian distance in the iterative
clustering process which results in sig10 manner to increase accuracy of clustering.
nificant improvement in

978-1-5386-1887-5/17/$31.00 ©2017 IEEE

1634
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)

In the DBSCAN algorithm the most dense region is


calculated from the dataset. The central point is calculated
from the most dense region which is the called EPS value of
the dataset. To calculate similarity between the data points
of the data Euclidian distance is calculated from central
point to all other points. The elements which are similar is
clustered in one dataset and other are in the second dataset.
In the base paper, to improve accuracy of clustering EPS
values is calculated in the dynamic manner which leads to
the clustering of the points which are remained unclustered.
To achieve more accuracy of clustering technique of back
propagation will be applied which calculate Euclidean
distance in the dynamic manner and increase accuracy and
reduce execution time of improved DBSCAN algorithm.
A. Proposed Work
Flow diagram of our proposed work is shown in fig. 1 Fig. 2. Default Interface of Model
below.
This is the user interface of our experimental setup
implemented in MATLAB. In this, the complete interface has
4 modules that are basepaper algorithm, proposed algorithm,
performance analysis and exit which are shown in figure 2
above.

Fig. 3. Calculation of Dense Region

As shown in figure 3, the incremental DBSCAN algorithm is


Fig. 1. Flow of Proposed Work
implemented in which the most dense region is calculated and
calculated dense region is shown in the snapshot.
IV. EXPERIMENTS & IMPLEMENTATION
The improvement of DBSCAN algorithm is shown by According to the most dense region the EPS value is
using MATLAB version 2012 and the following are the steps calculated which defines radius of the cluster, As shown in
used for the implementation along with experimental setup. figure 4, the EPS value is defined according to the input
The main steps of the experimental setup are shown below. dataset. The EPS value defines the class of the dataset and
Euclidian distance from the central point is calculated
according to that similar and dissimilar values are clustered .

1635
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)

As shown in figure 6, the Euclidian distance is calculated


using back propagation algorithm and average of the
distance is taken to cluster similar and dissimilar values.

Fig. 4. Generation of Clusters

Fig. 7. Generation of Final Clusters

As shown in figure 7, the final clusters are generated


according to Euclidian distance value and results shows that
generated clusters are different from the existing clusters.

Fig. 5. Applying Back Propagation Algorithm

As shown in figure 5, the back propagation algorithm is


been applied which will calculate the Euclidian distance
dynamically to cluster similar and dissimilar values.

Fig. 8. Closet comparision of distance values

The following experimental evaluation in figure 8 shows the


closet comparision of the values of distance among the
basepaper algorithm and the proposed algorithm, which gives
better results in Improved- DBSCAN algorithm.

As shown in figure 9 the Eps comparision values, which


clearly shows the increased accuracy of clustering.

.
Fig. 6. Euclidean distance value calculation

1636
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)

As shown in the above figure, the execution time of proposed


and existing algorithm is compared and it is been analyzed that
due to dynamic calculation of euclidian distance execution
time can be reduced in the DBSCAN algorithm.

Fig. 9. Eps comparision of DBSCAN and I-DBSCAN


Fig. 12. Comparision in DBSCAN & I-DBSCAN

At last the improvement of DBSCAN algorithm and I-


V. RESULTS DISCUSSION DBSCAN algorithm has been compared together in terms
of accuracy, time, distance and EPS value as shown
In this section, the improvement of DBSCAN algorithm and I-
above.
DBSCAN algorithm has been compared together in terms of
accuracy, time, distance and EPS value as shown in table 1.
VI. CONCLUSION
The clustering is the technique in which similar and dissimilar
type of data can be clustered together to analyze complex data.
The technique of density based clustering is applied which can
cluster the similar and dissimilar type of data according to the
data density in the input dataset. In the density based
clustering the most dense region is calculated from which
similar and dissimilar type of data is calculated using
similarity technique. In the DBSCAN algorithm which is
applied in this work, the EPS value is calculated which will be
the central of the dataset. The EPS value is calculated
dynamically to achieve maximum accuracy. The technique of
Euclidian distance is applied to calculate similarity between
the data points. To increase accuracy of clustering, neural
networks technique will be applied in future which calculate
Fig. 10. Accuracy of Clustering Euclidian distance in dynamic manner

As shown in figure 10, the accuracy of proposed and existing References


algorithm is compared to check reliability of the algorithms [1] Dharni, Chetan, and Meenakshi Bnasal. "An improvement of
and it has been analyzed that accuracy of proposed algorithm DBSCAN algorithm to analyze cluster for large datasets."
is more as compared to existing algorithm. Innovation and Technology in Education (MITE), 2013 IEEE
[2] Aparna, U. R., and Shaiju Paul. "Feature selection and extraction
in data mining." Green Engineering and Technologies (IC-GET),
IEEE, 2016.
[3] Tsai, Cheng-Fa, and Yao Chiang. "Enhancement of data clustering
using TSS-DBSCAN approach for data mining." Machine
Learning and Cybernetics (ICMLC), 2016 International
Conference on. Vol. 2. IEEE, 2016.
[4] Dutt, Ashish, Maizatul Akmar Ismail, and Tutut Herawan. "A
Systematic Review on Educational Data Mining." IEEE Access
(2017).
[5] Ngai, Eric WT, Li Xiu, and Dorothy CK Chau. "Application of
data mining techniques in customer relationship management: A
literature review and classification." Expert systems with
applications 36.2 (2009): 2592-2602.
Fig. 11. Execution time comparision [6] Patil, Pritam H., et al. "Analysis of Different Data Mining Tools
using Classification, Clustering and Association Rule Mining."
International Journal of Computer Applications 93.8 (2014).

1637
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)

[7] Gupta, Swati. "A Regression Modeling Technique on Data databases."Tools with Artificial Intelligence, 2004. ICTAI 2004.
Mining." International Journal of Computer Applications 116.9 16th IEEE International Conference on. IEEE, 2004.
(2015). [14] Nagpal, Arpita, Arnan Jatain, and Deepti Gaur. "Review based on
[8] Singh, Yashpal, and Alok Singh Chauhan. "Neural networks in data clustering algorithms." Information & Communication
data mining." Journal of Theoretical and Applied Information Technologies (ICT), 2013 IEEE Conference on. IEEE, 2013.
Technology 5.6 (2009): 36-42. [15] Nagpal, Pooja Batra, and Priyanka Ahlawat Mann. "Comparative
[9] Maheshwari, Aayushi, Garima Kharbanda, and Harsh Patel. study of density based clustering algorithms." International Journal
"Association Rules in Data Mining." of Computer Applications 27.11 (2011): 421-435.
[10] Bakr, Ahmad M., Nagia M. Ghanem, and Mohamed A. Ismail. [16] Moreira, Adriano, Maribel Y. Santos, and Sofia Carneiro.
"Efficient incremental density-based algorithm for clustering large "Density-based clustering algorithms–DBSCAN and SNN."
datasets." Alexandria Engineering Journal 54.4 (2015): 1147-115 University of Minho-Portugal (2005).
[11] Hahsler, Michael, and Matthew Bolaños. "Clustering data streams [17] Shah, Glory H. "An improved DBSCAN, a density based
based on shared density between micro-clusters." IEEE clustering algorithm with parameter selection for high dimensional
Transactions on Knowledge and Data Engineering 28.6 (2016): data sets." Engineering (NUiCONE), 2012 Nirma University
1449-1461. International Conference on. IEEE, 2012.
[12] Joshi, Aastha, and Rajneet Kaur. "A review: Comparative study of [18] Donghai, Zeng. "The Study of Clustering Algorithm Based on
various clustering techniques in data mining." International Journal Grid-Density and Spatial Partition Tree." XiaMen University, PRC
of Advanced Research in Computer Science and Software (2006).
Engineering 3.3 (2013).
[13] El-Sonbaty, Yasser, M. A. Ismail, and Mohamed Farouk. "An
efficient density based clustering algorithm for large

1638

You might also like