Professional Documents
Culture Documents
Kmeans Clustering On GPU: Chinmai Cchinmai@ur - Rocheter.edu
Kmeans Clustering On GPU: Chinmai Cchinmai@ur - Rocheter.edu
dataset and clusters. The master thread iterates through every atomic functions, however they do not guarantee all of the
data-set to calculate euclidean distance,thus assign label of the blocks are running simultaneously.
closest centroid for every pixel. In the next stage centroids
are updated by iterating over all labels partially calculating V. E XPERIMENTAL R ESULTS
the new centroids. A k dimensional vector n is updated in
The sequential code and parallel code was executed for
each iteration where each component nk holds the number
images lena.bmp and caterpiller.bmp. In order to observe the
of data points assigned to cluster Ck . Next another loop
speed up and utilize GPU threads it is necessary to perform
over all centroids is performed scaling each centroid Ck by
clustering on a huge file. Hence Caterpiller.bmp is used to
nk giving the final centroids. Convergence is also determined
observed the speedup. Fig 1 shows the experimental images
by checking whether the last labeling stage introduces any
used i.e lena.bmp and caterpiller.bmp. Fig 2 and 3 are the
changes in the clustering.
images obtained by CPU for K=2 and K=4. Fig 4 and 5 are the
images obtained by GPU for K=2 and K=4. Table 1 depicts
B. Parallel Implementation on GPU the execution time received by CPU and GPU for K=2,4,6.
The CPU takes the role of the master thread. First seeds Although the speedup depends on initial seeding and number
of centroids chosen by the CPU. Centroids and data points of iterations ran,but the execution time of GPU is way lesser
are uploaded to the GPU. Since the data points do not on CPU.
change over the course of the algorithm hence they are TABLE I. E XECUTION TIME FOR C ATERPILLER . BMP ON CPU AND
transferred only once. Master thread launches the GPU ker- GPU.
nel called kernel euclidean distance.GPU threads act as the
N=2 N=4 N=6
slave threads of the CPU, each GPU thread execute this CPU Execu. time 25934.9ms 19013.68ms 37407.47ms
kernel euclidean distance. The iterative process of calculating GPU Execu. time 2968.87ms 9629.31ms 6213.29ms
euclidean distance and labeling the nearest centroid to the
centroid is performed by the GPU threads in parallel. The
results from the labeling stage, namely the membership of each Fig. 1. Following images were used to test a)lena b)caterpiller
data point to a cluster in form of an index, are transferred back
to the CPU. Finally the CPU calculates the new centroid of
each cluster based on these labels and performs a convergence
check. Convergence is achieved in case no label has changed
compared to the last iteration.