- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to train my gradient boosting trees regression model, then serialize/deserialize the result and after use prediction with it.
When I don't serialize/deserialize between training and prediction it works. But unfortunately when I (de)serialize it I got the error :
unknown file: error: C++ exception with description "Number of columns in numeric table is incorrect
Details:
Argument name: data
" thrown in the test body
I wrote this, can you help me to find where is the problem please? :
const NumericTablePtr featureSamplesTable(new HomogenNumericTable<double>(trainingFeatureSamples, nTrainingFeatures, nTrainingSamples));
const NumericTablePtr targetValuesTable(new HomogenNumericTable<double>(trainingTargetValues, 1, nTrainingSamples));
algorithms::gbt::regression::training::Batch<> algorithm;
algorithm.input.set(algorithms::gbt::regression::training::data, featureSamplesTable);
algorithm.input.set(algorithms::gbt::regression::training::dependentVariable, targetValuesTable);
// Gradient Boosted Trees model config
algorithm.parameter().maxIterations = 66;
// Train Gradient Boosted Trees model
const bool status = algorithm.computeNoThrow().ok();
if (status)
{
const SharedPtr<algorithms::gbt::regression::training::Result> result = algorithm.getResult();
// Serialize result
InputDataArchive dataArch;
result->serialize(dataArch);
auto length = dataArch.getSizeOfArchive();
auto buffer = new byte[length];
dataArch.copyArchiveToArray(buffer, length);
// Deserialize and Evaluate model
const NumericTablePtr testFeatureSamplesTable(new HomogenNumericTable<double>(testFeatureSamples, nTestFeatures, nTestSamples));
algorithms::gbt::regression::prediction::Batch<> algorithmEval;
algorithmEval.input.set(algorithms::gbt::regression::prediction::data, testFeatureSamplesTable);
// Deserialize result
SharedPtr<algorithms::gbt::regression::training::Result> trainingResult(new algorithms::gbt::regression::training::Result());
OutputDataArchive dataArch2(buffer, length);
trainingResult->deserialize(dataArch2);
auto trainingModel = trainingResult->get(algorithms::gbt::regression::training::model);
algorithmEval.input.set(algorithms::gbt::regression::prediction::model, trainingModel);
// Predict values of gradient boosted trees regression
algorithmEval.compute();
const bool statusEval = algorithmEval.computeNoThrow().ok();
// Retrieve the algorithm results
if (statusEval)
{
const SharedPtr<algorithms::gbt::regression::prediction::Result> resultEval = algorithmEval.getResult();
NumericTablePtr predictionResult = resultEval->get(algorithms::gbt::regression::prediction::prediction);
Thanks a lot,
Sandra
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sandra,
What version of DAAL do you use? We tested your code on DAAL 2019 Update 2 release and it works correctly in both cases: with serialization and without.
One of possible reason of this exception may be incorrect number of features in test and train data sets, I mean trainingFeatureSamples and testFeatureSamples variables. They must be the same in a data set. What dimensions do you use?
Also, sending all your code should be helpful (if it is possible, of course).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your response. Actually I am using version 2018 Update 3.
Here are my simple test case and the corresponding code :
TEST(DaalTests, GradientBoostedTreesTest)
{
int nRows = 1000;
int nCols = 1;
double * features = new double[nRows * nCols];
double * labels = new double[nRows];
for (int i = 0; i < nRows; ++i)
{
for (int j = 0; j < nCols; ++j)
{
features[i * nCols + j] = i + 1.0;
}
if (i < 500)
{
labels = 33.0;
}
else
{
labels = 55.0;
}
}
double * predictedValues = new double[nRows];
int status = DataAnaltyicslFunction::test(features, nCols, labels, nRows, predictedValues, features, nCols, nRows);
ASSERT_EQ(0, status);
ASSERT_NEAR(33.0, predictedValues[100], 1e-5);
ASSERT_NEAR(55.0, predictedValues[600], 1e-5);
delete[] features;
delete[] labels;
delete[] predictedValues;
}
int DataAnaltyicslFunction::test(double* const trainingFeatureSamples, int nTrainingFeatures, double* const trainingTargetValues, int nTrainingSamples, double* predictedValues, double* const testFeatureSamples, int nTestFeatures, int nTestSamples)
{
const NumericTablePtr featureSamplesTable(new HomogenNumericTable<double>(trainingFeatureSamples, nTrainingFeatures, nTrainingSamples));
const NumericTablePtr targetValuesTable(new HomogenNumericTable<double>(trainingTargetValues, 1, nTrainingSamples));
algorithms::gbt::regression::training::Batch<> algorithm;
algorithm.input.set(algorithms::gbt::regression::training::data, featureSamplesTable);
algorithm.input.set(algorithms::gbt::regression::training::dependentVariable, targetValuesTable);
// Gradient Boosted Trees model config
algorithm.parameter().maxIterations = 66;
// Train Gradient Boosted Trees model
const bool status = algorithm.computeNoThrow().ok();
if (status)
{
const SharedPtr<algorithms::gbt::regression::training::Result> result = algorithm.getResult();
// Serialize result
InputDataArchive dataArch;
result->serialize(dataArch);
auto length = dataArch.getSizeOfArchive();
auto buffer = new byte[length];
dataArch.copyArchiveToArray(buffer, length);
// Deserialize and Evaluate model
const NumericTablePtr testFeatureSamplesTable(new HomogenNumericTable<double>(testFeatureSamples, nTestFeatures, nTestSamples));
algorithms::gbt::regression::prediction::Batch<> algorithmEval;
algorithmEval.input.set(algorithms::gbt::regression::prediction::data, testFeatureSamplesTable);
// Deserialize result
SharedPtr<algorithms::gbt::regression::training::Result> trainingResult(new algorithms::gbt::regression::training::Result());
OutputDataArchive dataArch2(buffer, length);
trainingResult->deserialize(dataArch2);
auto trainingModel = trainingResult->get(algorithms::gbt::regression::training::model);
algorithmEval.input.set(algorithms::gbt::regression::prediction::model, trainingModel);
// Predict values of gradient boosted trees regression
algorithmEval.compute();
const bool statusEval = algorithmEval.computeNoThrow().ok();
// Retrieve the algorithm results
if (statusEval)
{
const SharedPtr<algorithms::gbt::regression::prediction::Result> resultEval = algorithmEval.getResult();
NumericTablePtr predictionResult = resultEval->get(algorithms::gbt::regression::prediction::prediction);
BlockDescriptor<double> block;
predictionResult->getBlockOfRows(0, predictionResult->getNumberOfRows(), readOnly, block);
memcpy(predictedValues, block.getBlockPtr(), block.getNumberOfRows() * sizeof(double));
}
return statusEval ? 0 : 1;
}
return status ? 0 : 1;
}
Thanks a lot for your help,
Sandra
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sandra,
We reproduced the issue for DAAL 2018 Update 3 version on your code. But it has been fixed in latest versions of DAAL.
In your case I offer to you download and use already built DAAL 2019 Update 1.1 version from GitHub: https://github.com/intel/daal/releases/tag/2019_u1.1
This version solves your issues and additionally contains special performance optimization for training stage of Gradient Boosted Trees, I think it should be useful for you.
DAAL 2019 Update 2 fixes your problem, but doesn’t have these optimizations. If it is okay for you – you can get from official site or github.
Also, DAAL 2019 Update 3 is coming, it will contain the optimizations and will be available in all our distribution channels.
Thanks,
Alexey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alexey,
Thanks a lot for your response. It is very helpful.
Ok we will use update 1.1 while waiting for update 3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And do you have an idea when update 3 will come, please ?
Thanks again
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel DAAL Update 3 will come in the end of Q2. We will inform you about it in this thread.
Let us know if you have additional questions.
Best regards,
Alexey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sandra,
Intel DAAL 2019 Update 3 version was released on GitHub.
Let us know if you need help with Intel DAAL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sandra,
Intel DAAL v.2019 u3 is not available for available and ready for download. The fix of this issue available into this update. Could you check and let us know the results!
thanks, Gennady
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page