Parallel K-Means Using Map Reduce On Big Data Cluster Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Parallel K-means using Map

Reduce on Big Data Cluster


Analysis

Big Data Computing Vu Pham Machine Learning Classification Algorithm


MapReducing 1 iteration of k-means
Classify: Assign observations to closest cluster center

Map: For each data point, given ({μj},xi), emit(zi,xi)

Recenter: Revise cluster centers as mean of assigned


observations

Reduce: Average over all points in cluster j (zi=k)

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Classification step as Map
Classify: Assign observations to closest cluster center

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Recenter step as Reduce
Recenter: Revise cluster centers as mean of
assigned observations

reduce(j, x_in_cluster j : [x1, x3,…, ])


sum = 0
count = 0
for x in x_in_cluster j
sum += x
count += 1
emit(j, sum/count)

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Distributed KMeans Iterative Clustering

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Distributed KMeans Iterative Clustering

Find Nearest Center

Key is Center, Value is Movie

Average Ratings

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Summary of Parallel k-means using MapReduce

Map: classification step;


data parallel over data points

Reduce: recompute means;


data parallel over centers

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Some practical considerations
k-means needs an iterative version of MapReduce
Not standard formulation

Mapper needs to get data point and all centers


A lot of data!
Better implementation:
mapper gets many data points

Big Data Computing Vu Pham Machine Learning Classification Algorithm


Conclusion

In this lecture, we have given an overview of cluster


analysis and also discussed machine learning
classification algorithm k-means using Mapreduce for
big data analytics

Big Data Computing Vu Pham Machine Learning Classification Algorithm

You might also like