sklearn dbscan accereration

soman__kritik · ‎01-09-2018

I wish to know if the dbscan clustering in sklearn from idp is accelerated or not. I tried normal python and idp for clustering 4000000(samples)X3(dimensions).

normal python:

(new) Kritiks-MacBook-Air:clustering kritiksoman$ python clustering.py 
The job took  21.877681970596313  seconds to complete
(new) Kritiks-MacBook-Air:clustering kritiksoman$ source deactivate new

idp:
Kritiks-MacBook-Air:clustering kritiksoman$ source activate gdal_env
(gdal_env) Kritiks-MacBook-Air:clustering kritiksoman$ python clustering.py 
The job took  23.75513982772827  seconds to complete

I am using 2013 MacBook Air (Intel i5, 4gb ram)

Please help.

Thanks

Sergey_M_Intel2 · ‎01-10-2018

Hello,

No, dbscan clustering is not accelerated. We can discuss your request with the Intel DAAL team to prioritize this functionality for the Intel DAAL product. Please help us to understand the priority. So far this is the only request that we have for dbscan.

Thank you,

Sergey Maidanov

soman__kritik · ‎01-10-2018

Hi

I am interested in clustering Digital Elevation Model (DEM) data which has 3 dimensions. DBSCAN is very much efficient in this application and works in 10-20 seconds for 10^6 samples. But as soon as I move from prototyping my algorithm to deployment in large scale (10^8 to 10^12), the clustering process becomes so much slower that even on a server with 8 cores, the algorithm takes many many hours, which makes the overall purpose of my algorithm pointless. I want to cluster elevation data of large areas which would mean around 10^8 to 10^12 samples and 3 dimensions. I was hoping IDP would help as the performance benchmarks for sklearn are stated to be far better than regular python.

Priority wise, this clustering algorithm has several applications such as many use cases involving depth sensing, 3D point cloud, etc. I cannot use k-means as the number of clusters in the data is a priori. Please advise if any work around for this issue is available or any other accelerated clustering is available for clustering 3d spatial data.

Thanks

Kritik

Sergey Maidanov (Intel) wrote:

Hello,

No, dbscan clustering is not accelerated. We can discuss your request with the Intel DAAL team to prioritize this functionality for the Intel DAAL product. Please help us to understand the priority. So far this is the only request that we have for dbscan.

Thank you,

Sergey Maidanov

Sergey_M_Intel2 · ‎01-10-2018

Hi Kritik,

Thank you for your inputs. These are useful! I will bring these inputs to the DAAL team for discussion. We will see if we can prioritize this work soon.

Thank you,

Sergey