I am using the kmeans algorithm from DAAL and noticed what I think is a bug.
My data consists of 150 observations with 2 features. I want to classify them in 3 clusters.
When I use the deterministicDense initialisation, the algorithm uses the first 3 observations as initial centroids. However, in my particular case, observations #2 and #3 are identical, which yields to identical centroids. In that case, the kmeans fails to converge to three clusters: one of the cluster is empty, with a corresponding centoid that falls far outside the range of the input data. The kmeans algorithm has worked as if I configured for 2 clusters.
The problem does not appear with other initialisation methods because it is very unlikely that they yield to repeated initial centroids.
I do not know if this is a bug or a feature, but I think that this should be at least documented.