Intel® oneAPI Data Analytics Library
Learn from community members on how to build compute-intensive applications that run efficiently on Intel® architecture.

Example of setting result tables?

Alvin_S_
Beginner
869 Views

In kmeans_types.h, kmeans::Result allocates instances of HomogenNumericTable to hold the results.

Is there an example showing how to get the algorithm to store its result in SOANumericTable instances?

Thanks.

ACS

0 Kudos
8 Replies
Alvin_S_
Beginner
869 Views

Seems to be working now.

Just needed to add some extra code to populate the SOANumericTable properly.

template<typename T>
SOANumericTablePtr
allocateSoaNumericTable(size_t nColumns,size_t nRows)
{
  SOANumericTablePtr t(new SOANumericTable(nColumns,nRows));
  NumericTableDictionary *d(t->getDictionary());
  for(int i=0;i<nColumns;++i){
    d->addFeature<T>(i);
  }
  t->allocateDataMemory();
  return t;
}

Let me know if I'm missing anything.

Thanks!

ACS

0 Kudos
Andrey_N_Intel
Employee
869 Views

Hi Alvin,

In Intel(R) DAAL, by default, the compute() method of the algorithm classes (including k-means) allocates memory for results using allocate() method of the respective Result class. You also have the option to save the results into your memory what can be more effective from perspective of memory use in specific cases. In order to do that, you need to apply the following steps:

- construct the object of the respective (for example, k-means) Result class

- allocate memory/Numeric Tables for results. In case of k-means, those are centroids, value of goal function, number of iterations, and, optionally, assignments

- register the memory (more exactly, shared pointer to memory) in the Result using the method set(). Make sure your Numeric Tables derive from Intel DAAL NumericTable type and implement all necessary methods

- register the Result object in the algorithm object using method setResult()

- run computations.

Providing the examples which demonstrate such use of the library is in our plans for future releases.

You also might want to have a look at the examples in the folder examples\cpp\source\datasource of the library installation directory which demonstrate use of AOS/SOA/Homogeneous Numeric Tables.

Please, let us know, if you have more questions

Thanks,

Andrey

 

0 Kudos
Alvin_S_
Beginner
869 Views

Turns out it's not working after all.  Here's basically what I have:

  kmeans::Batch<> algorithm(nClusters, nIterations);
  algorithm.input.set(kmeans::data, numericTablePtr);

  size_t nColumns = numericTablePtr->getNumberOfColumns();
  size_t nRows = numericTablePtr->getNumberOfRows();

  SOANumericTablePtr assignmentsResult(allocateSoaNumericTable<double>(1,nRows));
  SOANumericTablePtr centroidsResult(allocateSoaNumericTable<double>(nColumns,nClusters));

  // Create our own Result object and allocate SOANumericTables to hold results.
  SharedPtr<kmeans::Result> result(new kmeans::Result());
  result->set(kmeans::assignments,assignmentsResult);
  result->set(kmeans::centroids,centroidsResult);

  algorithm.setResult(result);  // Use my Result object.

  algorithm.compute();

  NumericTablePtr assignments1(algorithm.getResult()->get(kmeans::assignments));
  NumericTablePtr centroids1(algorithm.getResult()->get(kmeans::centroids));

Unfortunately, this gives incorrect results when used with the same parameters as kmeans_batch.cpp and using kmeans.csv example data.

kMeans(...) assignments:
0.000     
4.000     
2.000     
4.000     
1.000     
1.000     
1.000     
0.000     
1.000     
0.000     
0.000     
5.000     
0.000     
1.000     
2.000     
2.000     
0.000     
3.000     
2.000     
0.000     

kMeans(...) centroids:
4.118     26.727    -14.120   57.874    -9.685    28.675    30.317    -6.586    14.098    10.541    
20.676    0.801     47.162    1.496     -18.233   -43.514   4.475     5.384     51.285    54.874    
-2.336    -3.688    -3.691    -25.112   -17.490   -6.370    -45.389   -50.949   -37.732   -25.020   
0.780     70.679    -52.177   7.757     7.493     -91.890   28.577    49.136    -53.561   16.029    
18.887    -32.584   39.780    15.388    18.220    52.567    -35.363   15.762    8.102     7.649     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     
0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     0.000     

The assignments don't match at all and the rows of zeroes are unexpected.

If I comment out the algorithm.setResult(result) line, it's basically the same as the example program and gets a matching result.

Using intel-daal-common-056-2016.0-056.noarch if that makes any difference.

Thanks.

ACS

0 Kudos
Alvin_S_
Beginner
869 Views

I figured I'd add a call to check my hand-rolled result before calling algorithm.setResult(result):

  result->check(&algorithm.input,&algorithm.parameter,kmeans::lloydDense);
  // result->check(&algorithm.input, &algorithm.parameter, algorithm.getMethod());

I can't use the second form since algorithm.getMethod() is protected.  Is check(...) not supposed to be called from the outside?

Thanks.

ACS

 

0 Kudos
Andrey_N_Intel
Employee
869 Views

Hi Alvin,

I was not able to reproduce your k-means outputs when SOA tables are used to store the results of the computations. On my side the results were equal to the results of the default example. It makes sense to sync-up on the details. I used your function allocateSoaNumericTable(size_t nColumns,size_t nRows) to allocate memory; in my case the function returns the pointer SOANumericTable*  that is used to initialize respective shared pointer:

SharedPtr<data_management::SerializationIface> centroids(allocateSoaNumericTable<double>( nColumns, nClusters ) );
...
SharedPtr<kmeans::Result> result(new kmeans::Result()); 
...
result->set(kmeans::centroids,centroids); 

Did you do the same initialization?

Also, which OS/version of the library (32 or 64 bit) what type of linking(static or dynamic), threading or sequential library do you use?

Answering your second question - check() method of the Result object is expected to be used from the outside. We plan to have the method getMethod() public in the nearest release. Thank you for this feedback.

Andrey

0 Kudos
Alvin_S_
Beginner
869 Views

Here's the hacked version of the example program:

/* file: kmeans_batch.cpp */

/*******************************************************************************
!  Copyright(C) 2014-2015 Intel Corporation. All Rights Reserved.
!*******************************************************************************
!  Content:
!    K-means clustering example program text.
!******************************************************************************/

/**
 * <a name="DAAL-EXAMPLE-CPP-KMEANS_BATCH"></a>
 * \example kmeans_batch.cpp
 */

#include "daal.h"
#include "service.h"

using namespace std;
using namespace daal;
using namespace daal::algorithms;

/* Input data set parameters */
string datasetFileName     = "rpm/intel-daal-common-056-2016.0-056.noarch/opt/intel/compilers_and_libraries_2016.0.056/linux/daal/examples/data/batch/kmeans.csv";
const size_t nObservations = 10000;

/* KMeans algorithm parameters */
const size_t nClusters   = 20;
const size_t nIterations = 5;

typedef SharedPtr<NumericTable> NumericTablePtr;
typedef SharedPtr<SOANumericTable> SOANumericTablePtr;


namespace{
  template<typename T>
    SOANumericTablePtr
    allocateSoaNumericTable(size_t nColumns,size_t nRows)
  {
    SOANumericTablePtr t(new SOANumericTable(nColumns,nRows));
    NumericTableDictionary *d(t->getDictionary());
    for(int i=0;i<nColumns;++i){
      d->addFeature<T>(i);
    }
    t->allocateDataMemory();
    return t;
  }
}

int main(int argc, char *argv[])
{
    checkArguments(argc, argv, 1, &datasetFileName);

    /* Initialize FileDataSource to retrieve input data from .csv file */
    FileDataSource<CSVFeatureManager> dataSource(datasetFileName, DataSource::doAllocateNumericTable,
                                                 DataSource::doDictionaryFromContext);

    /* Retrieve the data from input file */
    dataSource.loadDataBlock(nObservations);

    /* Create algorithm object for KMeans algorithm */
    kmeans::Batch<> algorithm(nClusters, nIterations);

    NumericTablePtr numericTablePtr(dataSource.getNumericTable());
    algorithm.input.set(kmeans::data, numericTablePtr);

    size_t nColumns = numericTablePtr->getNumberOfColumns();
    size_t nRows = numericTablePtr->getNumberOfRows();
    std::cerr<<"nColumns="<<nClusters<<std::endl;
    std::cerr<<"nRows="<<nIterations<<std::endl;
    
    NumericTablePtr assignmentsResult(allocateSoaNumericTable<int>(1,nRows));

    // works
    NumericTablePtr centroidsResult(new data_management::HomogenNumericTable<double>(nColumns, nClusters, data_management::NumericTable::doAllocate));

    // doesn't work
    // NumericTablePtr centroidsResult(allocateSoaNumericTable<double>(nColumns,nClusters));
    // Create our own Result object and allocate SOANumericTables to hold results.
    SharedPtr<kmeans::Result> result(new kmeans::Result());
    result->set(kmeans::assignments,assignmentsResult);
    result->set(kmeans::centroids,centroidsResult);

    // result->check(&algorithm.input, &algorithm.parameter, algorithm.getMethod());
    result->check(&algorithm.input,&algorithm.parameter,kmeans::lloydDense);

    algorithm.setResult(result);

    algorithm.compute();

    /* Print clusterization results */
    printNumericTable(algorithm.getResult()->get(kmeans::assignments), "First 20 cluster assignments:", 20);
    printNumericTable(algorithm.getResult()->get(kmeans::centroids  ), "First 10 dimensions of centroids:", 20, 10);

    return 0;
}

ACS

0 Kudos
Alvin_S_
Beginner
869 Views

Host is an IBM dx360 M3.  Uname info:

Linux myhost 2.6.18-308.13.1.el5 #1 SMP Thu Jul 26 05:45:09 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

Here's how I compile the thing:

gcc/4.4.6/bin/g++ \
  kmeans_batch.cpp \
  -o kmeans_batch.o \
  -c \
  -g \
  -m64 \
  -I rpm/intel-daal-common-056-2016.0-056.noarch/opt/intel/compilers_and_libraries_2016.0.056/linux/daal/include \
  -I rpm/intel-daal-common-056-2016.0-056.noarch/opt/intel/compilers_and_libraries_2016.0.056/linux/daal/examples/cpp/source/utils \
  -fPIC

gcc/4.4.6/bin/g++ \
  kmeans_batch.o \
  -o kmeans_batch \
  -g \
  -m64 \
  -L rpm/intel-tbb-libs-056-4.3.4-056.noarch/opt/intel/compilers_and_libraries_2016.0.056/linux/tbb/lib/intel64_lin/gcc4.4 \
  -L rpm/intel-openmp-l-all-056-16.0.0-056.x86_64/opt/intel/compilers_and_libraries_2016.0.056/linux/compiler/lib/intel64_lin \
  rpm/intel-daal-056-2016.0-056.x86_64/opt/intel/compilers_and_libraries_2016.0.056/linux/daal/lib/intel64_lin/libdaal_core.a \
  rpm/intel-daal-056-2016.0-056.x86_64/opt/intel/compilers_and_libraries_2016.0.056/linux/daal/lib/intel64_lin/libdaal_thread.a \
  -ltbb \
  -liomp5

Let me know if you need any further details to reproduce.

Thanks.

ACS

0 Kudos
Zhang_Z_Intel
Employee
869 Views

I can reproduce the problem. However, if I use the latest release (DAAL Beta update 3), then it works fine. This might only be an issue for update 2 or earlier releases.

0 Kudos
Reply