In the Matlab version of the K-means algorithm, there is a very useful flag that indicates the action to take if a cluster loses all member observations during the optimization. There are 3 possibilities in Matlab: 1/ treat empty cluster as an error, 2/ remove any clusters that become empty, 3/ Create a new cluster consisting of the one point furthest from its centroid
Does any one know what happens in DAAL K-means in that case? I could not find anything in the documentation about this.
Thanks a lot!
It seems that the DAAL doc was updated (or I missed it the first time). Anyway, here is what they say about it:
In some cases, if no vectors are assigned to some clusters on a particular iteration, the iteration produces an empty cluster. It may occur due to bad initialization of centroids or the dataset structure. In this case, the algorithm uses the following strategy to replace the empty cluster centers and decrease the value of the overall goal function.
Feature vectors, most distant from their assigned centroids, are selected as the new cluster centers. Information about these vectors is gathered automatically during the algorithm execution.
The answer is on this page: