Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Timothé_B_
Beginner
53 Views

Possible bug with deterministicDense initialisation

Hi all,

I am using the kmeans algorithm from DAAL and noticed what I think is a bug.

My data consists of 150 observations with 2 features. I want to classify them in 3 clusters.

When I use the deterministicDense initialisation, the algorithm uses the first 3 observations as initial centroids. However, in my particular case, observations #2 and #3 are identical, which yields to identical centroids. In that case, the kmeans fails to converge to three clusters: one of the cluster is empty, with a corresponding centoid that falls far outside the range of the input data. The kmeans algorithm has worked as if I configured for 2 clusters. 

The problem does not appear with other initialisation methods because it is very unlikely that they yield to repeated initial centroids.

I do not know if this is a bug or a feature, but I think that this should be at least documented.

Best regards,
 

Tim

0 Kudos
2 Replies
Gennady_F_Intel
Moderator
53 Views

and do you see such behavior with the latest 2018 u1 version of DAAL?

Timothé_B_
Beginner
53 Views

I think so, it is the version of DAAL that comes with the compilers_and_libraries_2018.1.156.

Tim

Reply