Professional Documents
Culture Documents
Balanced K-Means Revisited-6
Balanced K-Means Revisited-6
The results of BKM+ are visualized for selected datasets in Figures 3, 4 and 5. Our first observation
is that the use of variable balance constraint actually improves results also compared to standard k-means.
The datasets A3 and B1 cannot be successfully clustered by k-means even if repeated 100 times or using
better initialization (Maxmin or k-means++) [11]. However, BKM+ can reach the correct clustering
because the balance constraint makes the algorithm a stochastic variant of k-means, which helps to
overcome local minima.
The two most demanding datasets for BKM+ are Santa [44] and Unbalanced; both are characterized
by variable-size clusters. The Santa dataset (see Figure 4 and Table 2) contains 1.4 million points
representing house coordinates in Finland. BKM+ algorithm finds approximately the correct clustering
but there are minor degradation at the cluster borders (SSE 3.745e+15). These errors are fixed by using
the post-processing algorithm BKM++ (SSE 3.653e+15) which yields also slightly better results than
RKM (SSE 3.683e+15).
For the Santa dataset (Figure 4), the proposed method has better time and memory efficiency
compared to RKM. It took only 19% of the processing time required by RKM, and the memory footprint
was only 8% compared to RKM (125MB vs. 1,403MB).
BKM++ could significantly (>0.5%) improve the SSE results of BKM+ in case of 8 of the 19
datasets. BKM++ results were almost identical to RKM results. There were only three cases where
SSE result of RKM differed more than 0.1% compared to BKM++. For two of these cases BKM++
had a smaller SSE (santa -0.8%, libra -1.2%, S2 +0.1%).
Figure 4. Results for the santa dataset. Best SSE out of 100 repeats.