Welcome to Scribd!

Experiment No 7

Uploaded by

0% found this document useful (0 votes)

10 views9 pages

The document describes an experiment to implement the k-means clustering algorithm using MapReduce. It provides background on k-means clustering and how it is well-suited for parallelization using MapReduce due to its ability to handle large datasets. The algorithm initializes random centroids, calculates distances between points and centroids, assigns points to centroids, recalculates centroids, and repeats until convergence is reached. The experiment implements this process in Java using Hadoop MapReduce to cluster 1-dimensional points into 3 clusters, reading initial centroids and data, mapping points to closest centroids, reducing to new centroids, and iterating until centroids change less than 0.1.

Original Description:

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

10 views9 pages

Experiment No 7

Uploaded by

Aman Jain

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 9

Search inside document

Experiment No.

Aim : Write a program to implement Clustering algorithm using Map reduce.

Lab Outcome No. : 8.ITL 801.4
Lab Outcome : Construct algorithms for Clustering and finding association in Big Data.
Date of Performance: 8/4/22
Date of Submission: 13/4/22

Program Documentation Timely Viva Experiment Teacher

formation/ (02) Submission Answer Marks (15) Signature
Execution / (03) (03) with date
Ethical
practices (07 )
EXPERIMENT NO : 7

AIM : Write a program to implement Clustering algorithm using Map reduce.

THEORY :
K-means is one of the simplest unsupervised learning algorithms that solve the well known
clustering problem. The procedure follows a simple and easy way to classify a given data set
through a certain number of clusters (assume k clusters) fixed apriori.
The main idea is to define k centers, one for each cluster. These centers should be placed in a
cunning way because different locations cause different results. So, the better choice is to
place them as much as possible far away from each other. The next step is to take each point
belonging to a given data set and associate it to the nearest center. When no point is
pending, the first step is completed and an early group age is done. At this point we need to
re-calculate k new centroids as barycenters of the clusters resulting from the previous step.
After we have these k new centroids, a new binding has to be done between the same data set
points and the nearest new center. A loop has been generated.
As a result of this loop we may notice that the k centers change their location step by step
until no more changes are done or in other words centers do not move any more. Finally, this
algorithm aims at minimizing an objective function know as squared error function given by:

where,
‘||x i - vj||’ is the Euclidean distance between x i and v j.
‘c i’ is the number of data points in ith cluster.
‘c’ is the number of cluster centers.

K-means clustering is a classical clustering algorithm that uses an expectation maximization

like technique to partition a number of data points into k clusters.
K-means clustering is commonly used for a number of classification applications. Because k-
means is run on such large data sets, and because of certain characteristics of the algorithm, it
is a good candidate for parallelization.
The goal is to implement a framework in java for performing k-means clustering using
Hadoop MapReduce. In this problem, we have considered inputs a set of n 1-dimensional
points and desired clusters of size 3. Once the k initial centers are chosen, the distance is
calculated (Euclidean distance) from every point in the set to each of the 3 centers & points
with the corresponding center emitted by the mapper. Reducer collects all of the points of a
particular centroid and calculates a new centroid and emits.
Termination Condition : When difference between old and new centroid is less than or equal
to 0.1
ALGORITHM :
Step 1 : Initially randomly centroid is selected based on data. In our implementation we used
3 centroids.
Step 2 : The Input file contains initial centroid and data.
Step 3 : In the Mapper class the "configure" function is used to first open the file and read the
centroids and store them in the data structure(we used ArrayList).
Step 4 : Mapper read the data file and emit the nearest centroid with the point to the reducer.
Step 5 : Reducer collects all this data and calculates the new corresponding centroids and
emit.
Step 6 : In the job configuration, we are reading both files and checking if the difference
between old and new centroid is less than 0.1 then Convergence is reached else Repeat step 2
with new centroids.

CONCLUSION :
K-means Clustering algorithm is used using map reduce to reduce runtime for processing
large data sets (petabytes of data). By using map reduce in k-means algorithm it uses
commodity hardware and storage for large data.

An Efficient Incremental Clustering Algorithm
Document3 pages
An Efficient Incremental Clustering Algorithm
World of Computer Science and Information Technology Journal
No ratings yet
Assignment Clustering
Document22 pages
Assignment Clustering
Netra Raina
No ratings yet
Designer Milk
Document30 pages
Designer Milk
Parashuram Shanigaram
67% (6)
Crazy
Document6 pages
Crazy
Flexjob career
No ratings yet
Storage Technologies: Digital Assignment 1
Document16 pages
Storage Technologies: Digital Assignment 1
Yash Pawar
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
Document6 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
sinigersky
No ratings yet
SQLDM - Implementing K-Means Clustering Using SQL: Jay B.Simha
Document5 pages
SQLDM - Implementing K-Means Clustering Using SQL: Jay B.Simha
Moh Ali M
No ratings yet
ML_UNIT -2
Document13 pages
ML_UNIT -2
Dr D S Naga Malleswara Rao
No ratings yet
Exp 7
Document3 pages
Exp 7
Soban Maruf
No ratings yet
Parallel Mapreduce: K - Means Clustering Based On
Document6 pages
Parallel Mapreduce: K - Means Clustering Based On
Ashish
No ratings yet
Text Analytics Unit-3
Document11 pages
Text Analytics Unit-3
aathyukthas.ai20001
No ratings yet
Untitled
Document2 pages
Untitled
Anurag Singh
No ratings yet
DMBI5
Document9 pages
DMBI5
Shubham Jha
No ratings yet
Unsupervised Learning - Clustering Cheatsheet - Codecademy
Document5 pages
Unsupervised Learning - Clustering Cheatsheet - Codecademy
Imane Loukili
No ratings yet
A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF
Document6 pages
A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF
Ninad Samel
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
Document11 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
IAEME Publication
No ratings yet
ML Unit 5
Document50 pages
ML Unit 5
SUJATA SONWANE
No ratings yet
An Improved K-Means Clustering Algorithm
Document16 pages
An Improved K-Means Clustering Algorithm
David Moreno
No ratings yet
Data Mining and Machine Learning PDF
Document10 pages
Data Mining and Machine Learning PDF
Bidof Vic
No ratings yet
A Tutorial On Clustering Algorithms
Document4 pages
A Tutorial On Clustering Algorithms
jczerna
No ratings yet
Cui 2014
Document11 pages
Cui 2014
ARI TUBIL
No ratings yet
Project 2 Clustering Algorithms: Team Members Chaitanya Vedurupaka (50205782) Anirudh Yellapragada (50206970)
Document15 pages
Project 2 Clustering Algorithms: Team Members Chaitanya Vedurupaka (50205782) Anirudh Yellapragada (50206970)
vedurupaka
No ratings yet
Pam Clustering Technique
Document13 pages
Pam Clustering Technique
Anoy
No ratings yet
Performance Evaluation of K-Means Clustering Algorithm With Various Distance Metrics
Document5 pages
Performance Evaluation of K-Means Clustering Algorithm With Various Distance Metrics
Nkiru Edith
No ratings yet
Ijettcs 2014 04 25 123
Document5 pages
Ijettcs 2014 04 25 123
International Journal of Application or Innovation in Engineering & Management
No ratings yet
A Fast K-Means Implementation Using Coresets
Document10 pages
A Fast K-Means Implementation Using Coresets
Hiino
No ratings yet
Assignment 5
Document3 pages
Assignment 5
Pujan Patel
No ratings yet
Unit-5 Unit-5: Case Studies of Big Data Analytics Using Map-Reduce Programming
Document11 pages
Unit-5 Unit-5: Case Studies of Big Data Analytics Using Map-Reduce Programming
Chitra Madhuri Yashoda
No ratings yet
Big Data: An Optimized Approach For Cluster Initialization: Open Access Research
Document19 pages
Big Data: An Optimized Approach For Cluster Initialization: Open Access Research
Edward
No ratings yet
Lecture14 Notes
Document9 pages
Lecture14 Notes
chelsea
No ratings yet
Analysis&Comparisonof Efficient Techniquesof
Document5 pages
Analysis&Comparisonof Efficient Techniquesof
astha
No ratings yet
K-Means Clustering
Document8 pages
K-Means Clustering
Abeer Pareek
No ratings yet
Dynamic Approach To K-Means Clustering Algorithm-2
Document16 pages
Dynamic Approach To K-Means Clustering Algorithm-2
IAEME Publication
No ratings yet
Hierarchical Clustering: Required Data
Document6 pages
Hierarchical Clustering: Required Data
Hritik Agrawal
No ratings yet
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
Document12 pages
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
Anoy
No ratings yet
Jaipur National University: Project Design With Seminar
Document26 pages
Jaipur National University: Project Design With Seminar
Faizan Shaikh
100% (1)
Map/Reduce Affinity Propagation Clustering Algorithm: Wei-Chih Hung, Chun-Yen Chu, and Yi-Leh Wu
Document7 pages
Map/Reduce Affinity Propagation Clustering Algorithm: Wei-Chih Hung, Chun-Yen Chu, and Yi-Leh Wu
jefferyleclerc
No ratings yet
Project Report 2
Document11 pages
Project Report 2
seethamrajumukund
No ratings yet
K Mean
Document12 pages
K Mean
Shivram Dwivedi
No ratings yet
Unit4 Datascience
Document43 pages
Unit4 Datascience
drsaranyarcw
No ratings yet
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
Document5 pages
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
Format Seorang Legenda
No ratings yet
Dynamicclustering
Document6 pages
Dynamicclustering
kasun prabhath
No ratings yet
Clustering in AI
Document16 pages
Clustering in AI
Ram Kushwaha
No ratings yet
KNN VS Kmeans
Document3 pages
KNN VS Kmeans
Soubhagya Kumar Sahoo
No ratings yet
DM Lecture 06
Document32 pages
DM Lecture 06
Sameer Ahmad
No ratings yet
Unsupervisd Learning Algorithm
Document6 pages
Unsupervisd Learning Algorithm
Shrey Dixit
No ratings yet
Sine Cosine Based Algorithm For Data Clustering
Document5 pages
Sine Cosine Based Algorithm For Data Clustering
Anonymous lPvvgiQjR
No ratings yet
Unit 3 & 4 (p18)
Document18 pages
Unit 3 & 4 (p18)
Kashif Baig
No ratings yet
Assignment No. A6: 1 Title
Document5 pages
Assignment No. A6: 1 Title
Pallavi Vetal
No ratings yet
Create List Using Range
Document6 pages
Create List Using Range
YUKTA JOSHI
No ratings yet
An Initial Seed Selection Algorithm
Document11 pages
An Initial Seed Selection Algorithm
hamzarash090
No ratings yet
DSE Lab Assignment - Writeup - 7
Document4 pages
DSE Lab Assignment - Writeup - 7
1032212420
No ratings yet
KMEANS
Document9 pages
KMEANS
johnzenbano120
No ratings yet
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
Document8 pages
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
Prianca Kayastha
No ratings yet
K-Means Clustering Clustering Algorithms Implementation and Comparison
Document4 pages
K-Means Clustering Clustering Algorithms Implementation and Comparison
FrankySaputra
No ratings yet
Efficient K-Means Clustering Algorithm Using Feature Weight and Min-Max Normalization
Document4 pages
Efficient K-Means Clustering Algorithm Using Feature Weight and Min-Max Normalization
Roopam
No ratings yet
An Effective Evolutionary Clustering Algorithm: Hepatitis C Case Study
Document6 pages
An Effective Evolutionary Clustering Algorithm: Hepatitis C Case Study
Ahmed Ibrahim Taloba
No ratings yet
K-Means in Python - Solution
Document6 pages
K-Means in Python - Solution
Rodrigo Violante
No ratings yet
K Means Clustering
Document6 pages
K Means Clustering
Alina Corina Bala
No ratings yet
Partitioning Methods
Document26 pages
Partitioning Methods
Ahmed hussain
No ratings yet
ML Unit-Iv
Document20 pages
ML Unit-Iv
21u41a05f5
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
IRS Assignment 2
Document3 pages
IRS Assignment 2
Aman Jain
No ratings yet
RPL Assignment - 1
Document6 pages
RPL Assignment - 1
Aman Jain
No ratings yet
IRS Assignment 1
Document4 pages
IRS Assignment 1
Aman Jain
No ratings yet
Devops Assignment - 1
Document4 pages
Devops Assignment - 1
Aman Jain
No ratings yet
Devops Assignment 2
Document2 pages
Devops Assignment 2
Aman Jain
No ratings yet
A Neork of Atoms O/Alaintd With A Ncaloork O4 Als: Aman Dain B E - 5 - 6
Document5 pages
A Neork of Atoms O/Alaintd With A Ncaloork O4 Als: Aman Dain B E - 5 - 6
Aman Jain
No ratings yet
Anclalorakrlaam Jtie Alanatha: Bda Atqnment-2
Document5 pages
Anclalorakrlaam Jtie Alanatha: Bda Atqnment-2
Aman Jain
No ratings yet
Experiment No 5
Document6 pages
Experiment No 5
Aman Jain
No ratings yet
Toe Assignmunr: Eoder Dnce
Document5 pages
Toe Assignmunr: Eoder Dnce
Aman Jain
No ratings yet
Experiment No 8
Document7 pages
Experiment No 8
Aman Jain
No ratings yet
Experiment No 6
Document11 pages
Experiment No 6
Aman Jain
No ratings yet
BDA Assignment - 1
Document6 pages
BDA Assignment - 1
Aman Jain
No ratings yet
Assignment 2: Date of Assignment: 30/03/2022 Date of Submission: 12/04/2022
Document6 pages
Assignment 2: Date of Assignment: 30/03/2022 Date of Submission: 12/04/2022
Aman Jain
No ratings yet
Experiment No 4
Document9 pages
Experiment No 4
Aman Jain
No ratings yet
Experiment No 9
Document7 pages
Experiment No 9
Aman Jain
No ratings yet
Experiment No 3
Document9 pages
Experiment No 3
Aman Jain
No ratings yet
Experiment No 1
Document13 pages
Experiment No 1
Aman Jain
No ratings yet
Experiment No 2
Document9 pages
Experiment No 2
Aman Jain
No ratings yet
Prediction On Rate of Heart Attack: Name Class Roll No
Document8 pages
Prediction On Rate of Heart Attack: Name Class Roll No
Aman Jain
No ratings yet
10 Roles and Responsibilities of HR Manager in Organization
Document9 pages
10 Roles and Responsibilities of HR Manager in Organization
Sweety Aghi
No ratings yet
How DeepFakes Work
Document2 pages
How DeepFakes Work
jack sparrow
No ratings yet
Product and Design Standards For UHPFRC in France
Document9 pages
Product and Design Standards For UHPFRC in France
Sajib Das
No ratings yet
Dip Lab
Document5 pages
Dip Lab
Amber Sarwar
No ratings yet
BENLAC
Document14 pages
BENLAC
balanivalshca
No ratings yet
WGU Catalog-Sept2019 PDF
Document197 pages
WGU Catalog-Sept2019 PDF
tt99_1
No ratings yet
Quiz 2: COMP1011: Programming Fundamentals
Document4 pages
Quiz 2: COMP1011: Programming Fundamentals
Али Жакежанов
No ratings yet
IIT Guwahati: Syllabus Lectures Downloads Ask A Question
Document3 pages
IIT Guwahati: Syllabus Lectures Downloads Ask A Question
Senthil Kumar P
No ratings yet
The Morning Calm Korea Weekly - Dec. 22, 2006
Document32 pages
The Morning Calm Korea Weekly - Dec. 22, 2006
Morning Calm Weekly Newspaper
No ratings yet
Manju Light
Document2 pages
Manju Light
Solanki Ashok
No ratings yet
Mg (OH) 2 + MgSO4 + H2O体系的相关系和热力学建模及其对氢氧化镁硫酸盐水泥的影响
Document14 pages
Mg (OH) 2 + MgSO4 + H2O体系的相关系和热力学建模及其对氢氧化镁硫酸盐水泥的影响
jordan jack
No ratings yet
YMCA Annual Report 2013
Document8 pages
YMCA Annual Report 2013
cn_cadillacmi
No ratings yet
OrthoToolKit SF36 Score Report
Document3 pages
OrthoToolKit SF36 Score Report
bayu
No ratings yet
Fire Safety: Building Regulations
Document19 pages
Fire Safety: Building Regulations
Muhammad.Saim
No ratings yet
Excavation and Timbering
Document22 pages
Excavation and Timbering
Ganga Dahal
No ratings yet
Patterson Fan Manual 2014
Document15 pages
Patterson Fan Manual 2014
Raviele
No ratings yet
A Rich Life With Less Stuff by The Minimalists: Before You Watch
Document5 pages
A Rich Life With Less Stuff by The Minimalists: Before You Watch
Felipe Andrade
No ratings yet
Accessibility in Creative Spaces Tool Kit
Document37 pages
Accessibility in Creative Spaces Tool Kit
Alexandra Adda
No ratings yet
United States Court of Appeals, Third Circuit
Document8 pages
United States Court of Appeals, Third Circuit
Scribd Government Docs
No ratings yet
Estafa 315 2a
Document9 pages
Estafa 315 2a
JAYAR MENDZ
No ratings yet
Brass Gate Valve PN16: Technical Catalogue
Document10 pages
Brass Gate Valve PN16: Technical Catalogue
kikokiko Karim
No ratings yet
Corporate Policy & Strategy: Dr. Nguyễn Gia Ninh
Document37 pages
Corporate Policy & Strategy: Dr. Nguyễn Gia Ninh
Sơn Vũ
No ratings yet
Shaft Inspection Using Phased-Array Compared To Other Techniques CINDE
Document21 pages
Shaft Inspection Using Phased-Array Compared To Other Techniques CINDE
IvanRomanović
No ratings yet
Supply Chain Management of Wal-Mart
Document17 pages
Supply Chain Management of Wal-Mart
uma6677
No ratings yet
HoneywellControLinksS7999ConfigurationDisplay 732
Document32 pages
HoneywellControLinksS7999ConfigurationDisplay 732
Cristobal Guzman
No ratings yet
Auxiliary Fire Service Act: Laws of Trinidad and Tobago
Document11 pages
Auxiliary Fire Service Act: Laws of Trinidad and Tobago
Marlon Forde
No ratings yet
CSS326 24G 2S QG
Document2 pages
CSS326 24G 2S QG
Carrizo David
No ratings yet
Tolgazer DC-DCBoostConverter
Document9 pages
Tolgazer DC-DCBoostConverter
Fernando Vidal
No ratings yet