Intel® oneAPI Data Analytics Library
Learn from community members on how to build compute-intensive applications that run efficiently on Intel® architecture.

How to serialize/deserialize gradient boosting trees

Do__Sandra
Beginner
2,233 Views

Hello,

I am trying to train my gradient boosting trees regression model, then serialize/deserialize the result and after use prediction with it. 

When I don't serialize/deserialize between training and prediction it works. But unfortunately when I (de)serialize it I got the error :

unknown file: error: C++ exception with description "Number of columns in numeric table is incorrect
Details:
Argument name: data
" thrown in the test body

I wrote this, can you help me to find where is the problem please? :

 const NumericTablePtr featureSamplesTable(new HomogenNumericTable<double>(trainingFeatureSamples, nTrainingFeatures, nTrainingSamples));
  const NumericTablePtr targetValuesTable(new HomogenNumericTable<double>(trainingTargetValues, 1, nTrainingSamples));

  algorithms::gbt::regression::training::Batch<> algorithm;

  algorithm.input.set(algorithms::gbt::regression::training::data, featureSamplesTable);
  algorithm.input.set(algorithms::gbt::regression::training::dependentVariable, targetValuesTable);

  // Gradient Boosted Trees model config
  algorithm.parameter().maxIterations = 66;

  // Train Gradient Boosted Trees model
  const bool status = algorithm.computeNoThrow().ok();

  if (status)
  {
    const SharedPtr<algorithms::gbt::regression::training::Result> result = algorithm.getResult();

    // Serialize result 
    InputDataArchive dataArch;
    result->serialize(dataArch);
    auto length = dataArch.getSizeOfArchive();
    auto buffer = new byte[length];
    dataArch.copyArchiveToArray(buffer, length);

    // Deserialize and Evaluate model

    const NumericTablePtr testFeatureSamplesTable(new HomogenNumericTable<double>(testFeatureSamples, nTestFeatures, nTestSamples));

    algorithms::gbt::regression::prediction::Batch<> algorithmEval;

    algorithmEval.input.set(algorithms::gbt::regression::prediction::data, testFeatureSamplesTable);

    // Deserialize result
    SharedPtr<algorithms::gbt::regression::training::Result> trainingResult(new algorithms::gbt::regression::training::Result());
    OutputDataArchive dataArch2(buffer, length);
    trainingResult->deserialize(dataArch2);
    auto trainingModel = trainingResult->get(algorithms::gbt::regression::training::model);

    algorithmEval.input.set(algorithms::gbt::regression::prediction::model, trainingModel);

    // Predict values of gradient boosted trees regression
    algorithmEval.compute();
    const bool statusEval = algorithmEval.computeNoThrow().ok();

    // Retrieve the algorithm results
    if (statusEval)
    {
      const SharedPtr<algorithms::gbt::regression::prediction::Result> resultEval = algorithmEval.getResult();
      NumericTablePtr predictionResult = resultEval->get(algorithms::gbt::regression::prediction::prediction);

 

Thanks a lot,

Sandra

 

 

 

 

 

 

0 Kudos
9 Replies
Egor_S_Intel
Employee
2,233 Views

Hi Sandra,

What version of DAAL do you use? We tested your code on DAAL 2019 Update 2 release and it works correctly in both cases: with serialization and without.

One of possible reason of this exception may be incorrect number of features in test and train data sets, I mean trainingFeatureSamples and testFeatureSamples variables. They must be the same in a data set. What dimensions do you use?

Also, sending all your code should be helpful (if it is possible, of course).

0 Kudos
Do__Sandra
Beginner
2,233 Views

Thanks for your response. Actually I am using version 2018 Update 3.

Here are my simple test case and the corresponding code :

TEST(DaalTests, GradientBoostedTreesTest)
{
  int nRows = 1000;
  int nCols = 1;
  double * features = new double[nRows * nCols];
  double * labels = new double[nRows];
  for (int i = 0; i < nRows; ++i)
  {
    for (int j = 0; j < nCols; ++j)
    {
      features[i * nCols + j] = i + 1.0;
    }

    if (i < 500)
    {
      labels = 33.0;
    }
    else
    {
      labels = 55.0;
    }
    
  }

  double * predictedValues = new double[nRows];
  int status = DataAnaltyicslFunction::test(features, nCols, labels, nRows, predictedValues, features, nCols, nRows);

  ASSERT_EQ(0, status);

  ASSERT_NEAR(33.0, predictedValues[100], 1e-5);
  ASSERT_NEAR(55.0, predictedValues[600], 1e-5);

  delete[] features;
  delete[] labels;
  delete[] predictedValues;
}

 

int DataAnaltyicslFunction::test(double* const trainingFeatureSamples, int nTrainingFeatures, double* const trainingTargetValues, int nTrainingSamples, double* predictedValues, double* const testFeatureSamples, int nTestFeatures, int nTestSamples)
{
  const NumericTablePtr featureSamplesTable(new HomogenNumericTable<double>(trainingFeatureSamples, nTrainingFeatures, nTrainingSamples));
  const NumericTablePtr targetValuesTable(new HomogenNumericTable<double>(trainingTargetValues, 1, nTrainingSamples));

  algorithms::gbt::regression::training::Batch<> algorithm;

  algorithm.input.set(algorithms::gbt::regression::training::data, featureSamplesTable);
  algorithm.input.set(algorithms::gbt::regression::training::dependentVariable, targetValuesTable);

  // Gradient Boosted Trees model config
  algorithm.parameter().maxIterations = 66;

  // Train Gradient Boosted Trees model
  const bool status = algorithm.computeNoThrow().ok();

  if (status)
  {
    const SharedPtr<algorithms::gbt::regression::training::Result> result = algorithm.getResult();

    // Serialize result 
    InputDataArchive dataArch;
    result->serialize(dataArch);
    auto length = dataArch.getSizeOfArchive();
    auto buffer = new byte[length];
    dataArch.copyArchiveToArray(buffer, length);

    // Deserialize and Evaluate model

    const NumericTablePtr testFeatureSamplesTable(new HomogenNumericTable<double>(testFeatureSamples, nTestFeatures, nTestSamples));

    algorithms::gbt::regression::prediction::Batch<> algorithmEval;

    algorithmEval.input.set(algorithms::gbt::regression::prediction::data, testFeatureSamplesTable);

    // Deserialize result
    SharedPtr<algorithms::gbt::regression::training::Result> trainingResult(new algorithms::gbt::regression::training::Result());
    OutputDataArchive dataArch2(buffer, length);
    trainingResult->deserialize(dataArch2);
    auto trainingModel = trainingResult->get(algorithms::gbt::regression::training::model);

    algorithmEval.input.set(algorithms::gbt::regression::prediction::model, trainingModel);

    // Predict values of gradient boosted trees regression
    algorithmEval.compute();
    const bool statusEval = algorithmEval.computeNoThrow().ok();

    // Retrieve the algorithm results
    if (statusEval)
    {
      const SharedPtr<algorithms::gbt::regression::prediction::Result> resultEval = algorithmEval.getResult();
      NumericTablePtr predictionResult = resultEval->get(algorithms::gbt::regression::prediction::prediction);

      BlockDescriptor<double> block;
      predictionResult->getBlockOfRows(0, predictionResult->getNumberOfRows(), readOnly, block);
      memcpy(predictedValues, block.getBlockPtr(), block.getNumberOfRows() * sizeof(double));
    }
    return statusEval ? 0 : 1;
  }
  return status ? 0 : 1;
}

Thanks a lot for your help,

Sandra

 

 

0 Kudos
Alexey_P_Intel
Employee
2,233 Views

Hi Sandra,

We reproduced the issue for DAAL 2018 Update 3 version on your code. But it has been fixed in latest versions of DAAL.

In your case I offer to you download and use already built DAAL 2019 Update 1.1 version from GitHub: https://github.com/intel/daal/releases/tag/2019_u1.1

This version solves your issues and additionally contains special performance optimization for training stage of Gradient Boosted Trees, I think it should be useful for you.

DAAL 2019 Update 2 fixes your problem, but doesn’t have these optimizations. If it is okay for you – you can get from official site or github.

Also, DAAL 2019 Update 3 is coming, it will contain the optimizations and will be available in all our distribution channels.

 

Thanks,
Alexey

0 Kudos
Do__Sandra
Beginner
2,233 Views

Hi Alexey,

Thanks a lot for your response. It is very helpful.

Ok we will use update 1.1 while waiting for update 3.

 

0 Kudos
Do__Sandra
Beginner
2,233 Views

And do you have an idea when update 3 will come, please ?

Thanks again

0 Kudos
Alexey_P_Intel
Employee
2,233 Views

Intel DAAL Update 3 will come in the end  of Q2. We will inform you about it in this thread.

Let us know if you have additional questions.

 

Best regards,
Alexey

0 Kudos
Egor_S_Intel
Employee
2,233 Views

Hi Sandra,

Intel DAAL 2019 Update 3 version was released on GitHub

Let us know if you need help with Intel DAAL.

0 Kudos
Do__Sandra
Beginner
2,233 Views

Thanks a lot!

0 Kudos
Gennady_F_Intel
Moderator
2,233 Views

Hi Sandra,

Intel DAAL v.2019 u3 is not available for available and ready for download. The fix of this issue available into this update. Could you check and let us know the results! 

thanks, Gennady

0 Kudos
Reply