Intel® oneAPI Data Analytics Library
Learn from community members on how to build compute-intensive applications that run efficiently on Intel® architecture.

Serialization of KMeans Model

Harvey_S_
Beginner
603 Views
Hi
After success with SVM in batch mode, I'm now looking at KMeans. In SVM I could get the model out with something like this:
services::SharedPtr<svm::training::Result> trainingResults = algorithm.getResult();
auto model = trainingResults->get(classifier::training::model);
I could then serialize/deserialize the model and do predictions by setting the model back into the algorithm.
Kmeans seems to be different, I can get the results from
services::SharedPtr<kmeans::Result> trainingResults = algorithm.getResult();
Then get the centroids but I can't see how to do predictions from the centroids?
Do you have to load them up into the kMeans algorithm with your new data and kick it off with a special value of iterations for example?
How can I persist the model? Do I just persist the centroids numeric table?
Many thanks
0 Kudos
4 Replies
Zhang_Z_Intel
Employee
603 Views

KMeans in DAAL doesn't follow a "model training" --> "prediction" usage model. There isn't an opaque "model" object in KMeans result. That said, however, you can mimic a model object by extracting centroids from the result, serializing the centroids numeric table (together with other information such as number of clusters). And then you use these as input for clustering for your new data.

 

 

0 Kudos
Harvey_S_
Beginner
603 Views

Ok, but if I initialized the kmeans with the previously calculated centroids (as well as the number of clusters) and recomputed, then the centroids probably would have moved. Is there a way to call compute on the kmeans algorithm without it trying to recalculate the centroids - would iterations=0 do this? I guess its essentially nearest neighbour on the centroids?

Surely there has to be a way, for this particular application I may well have several billion rows so I have to segment on a sample, but I have to come out with a segmentation for all of them?

 

 

0 Kudos
Ilya_B_Intel
Employee
603 Views

Starting from upcoming release we've updated algorithm logic a bit, improved documentation and added specific examples for your case. And the only thing you'll need to do - set interations=0.

The simplest way to do it with Intel DAAL 2016 is:

kmeans::Distributed<step1Local> alg(nClusters, true);
alg.input.set(kmeans::data, data);
alg.input.set(kmeans::inputCentroids, centroids);
alg.compute();
alg.finalizeCompute();
assignments = alg.getResult()->get(kmeans::assignments);

The key here, is to use distributed version of the algorithm even if data is all local. centroids will not move in this case.

 

 

 

0 Kudos
Harvey_S_
Beginner
603 Views

Excellent, I'll try that - thanks very much.

0 Kudos
Reply