Professional Documents
Culture Documents
Canopy With K-Means Clustering Algorithm For Big Data Analytics - AIP Conference Proceedings - AIP Publishing
Canopy With K-Means Clustering Algorithm For Big Data Analytics - AIP Conference Proceedings - AIP Publishing
Recently, Big Data is gathered from various sources in different types, and it is not
easy to analyze them by traditional methods. Apache Hadoop is a robust solution to
the problems of saving and processing large datasets by providing HDFS (Hadoop
Distributed File System) and MapReduce for storing and processing data. One of
the essential methods for analyzing big data to discover new patterns is the
clustering algorithms. In this paper, we have used the canopy clustering algorithm
provided by Distributed Machine Learning with Apache Mahout as preprocessing
step for the k-means clustering algorithm. The results showed that using Canopy as a
preprocessing step has sped up the time of managing the massive scale of the
healthcare insurance dataset, and it also reduces the execution time of the k-means
by providing initial centroids for the given dataset.
Topics
Machine learning, Data visualization, Health care
REFERENCES
1. Memon, M.A., et al, Big data analytics and its applications. arXiv preprint arX
iv:1710.04135, (2017).
Google Scholar
3. Yousif, S.A., Z.N. Sultani, and V.W. Samawi, Utilizing Arabic WordNet
Relations in Arabic Text Classification: New Feature Selection Methods. IAENG
International Journal of Computer Science, 46(4),(2019).
Google Scholar
7. Amresh Kumar, K.M., Saikat Mukherjee, Ravi Prakash G., Verification and
Validation of MapReduce Program model for Parallel k-means algorithm on
Hadoop Cluster. International Journal of Computer Applications (0975 – 8887),
72(8), (2013).
Google Scholar
9. Van Hieu, D. and P. Meesad, Fast k-means clustering for very large datasets
based on mapreduce combined with a new cutting method, in Knowledge and
Systems Engineering., Springer. p. 287–298, (2015).
Google Scholar
10. H.H., Maala, and S.A. Yousif, Cluster Trace Analysis for Performance
Enhancement in Cloud Computing Environments. Journal of Theoretical and
Applied Information Technology, 97(7), (2019).
Google Scholar
11. Yousif, S.A. and A. Al-Dulaimy, Clustering cloud workload traces to improve
the performance of cloud data centers. in Proceedings of the World Congress on
Engineering (WCE'17), (2017).
Google Scholar
12. Yousif, S.A., H.Y. Abdul-Wahed, and N.M. Al-Saidi, Extracting a new fractal
and semi-variance attributes for texture images. in AIP Conference Proceedings.
AIP Publishing LLC, (2019).
Google Scholar
14. S.A. Yousif, A.J Mohammed, N.M.G. Al-Saidi, “Texture images analysis
using fractal extracted attributes”, International Journal of Innovative Computing,
Information and Control, 16(4), (Aug. 2020).
Google Scholar
© 2021 Author(s).
Sign in
Sign In
Username Sign in via your Institution
Sign in via your Institution
Password
Reset password
Register