Intel® oneAPI Data Analytics Library
Learn from community members on how to build compute-intensive applications that run efficiently on Intel® architecture.
225 Discussions

Possible bug with deterministicDense initialisation

Timothé_B_
Beginner
453 Views

Hi all,

I am using the kmeans algorithm from DAAL and noticed what I think is a bug.

My data consists of 150 observations with 2 features. I want to classify them in 3 clusters.

When I use the deterministicDense initialisation, the algorithm uses the first 3 observations as initial centroids. However, in my particular case, observations #2 and #3 are identical, which yields to identical centroids. In that case, the kmeans fails to converge to three clusters: one of the cluster is empty, with a corresponding centoid that falls far outside the range of the input data. The kmeans algorithm has worked as if I configured for 2 clusters. 

The problem does not appear with other initialisation methods because it is very unlikely that they yield to repeated initial centroids.

I do not know if this is a bug or a feature, but I think that this should be at least documented.

Best regards,
 

Tim

0 Kudos
2 Replies
Gennady_F_Intel
Moderator
453 Views

and do you see such behavior with the latest 2018 u1 version of DAAL?

0 Kudos
Timothé_B_
Beginner
453 Views

I think so, it is the version of DAAL that comes with the compilers_and_libraries_2018.1.156.

Tim

0 Kudos
Reply