Solved: Thanks Dmitry for the

Gusev__Dmitry · ‎04-02-2020

Hello,

I have a couple of questions regarding logistic regression DAAL algorithm's results in the training stage.

1. For the data set below the DAAL algorithm generates coefficients -1.389 0.235, while the expected results are : -4.3578 0.6622 .

Could someone please clarify, why the beta coefficients for logistic regression in the training stage are not even close to the expected results?

Here is the data set:

1,0
2,0
3,0
4,0
5,1
6,0
7,1
8,0
9,1
10,1

Here is the corresponding DAAL code:

training::Batch<> trainAlgorithm(2);

trainAlgorithm.input.set(classifier::training::data, data);

trainAlgorithm.input.set(classifier::training::labels, dependentVariables);

trainAlgorithm.parameter().penaltyL1 = 0.0f;
trainAlgorithm.parameter().penaltyL2 = 0.0f
trainAlgorithm.parameter().interceptFlag = true;
trainAlgorithm.parameter().nClasses = 2;
trainAlgorithm.compute();
training::ResultPtr trainingResult = trainAlgorithm.getResult();
logistic_regression::ModelPtr modelptr = trainingResult->get(classifier::training::model);

NumericTablePtr beta = modelptr->getBeta();

2. The second question is: How to specify the optimizationSolver parameter to switch from default SGD to LBFGS solver?

Thank you, your help is highly appreciated.

Regards,

Dmitry

Kirill_S_Intel · ‎04-03-2020

Hello Dmitry,

Thanks for reporting your question, I'm glad to help you.

Not accurate solution was obtained due to low accuracyThreshold which is 1e-4 by default. For your case SGD momentum algorithm reaches maximum number of iterations that's why we obtain not exact minimum point.

So if we set the same accuracyTreshhold we should increase the number of iterations:

auto solver = optimization_solver::sgd::Batch<float, optimization_solver::sgd::momentum>::create();
const size_t nIterations = 40000;
const float learningRate = 1e-3;
const float accuracyThreshold = 1e-4; // to get closer solution we could even reduce it
solver->parameter.learningRateSequence = HomogenNumericTable<float>::create(1, 1, NumericTable::doAllocate, learningRate);
solver->parameter.accuracyThreshold = accuracyThreshold;
solver->parameter.nIterations = nIterations;
solver->parameter.batchSize = nRows; // 10

trainAlgorithm.parameter().optimizationSolver = solver; // set optimization solver

...

algorithm.compute();
printNumericTable(solver->getResult()->get(optimization_solver::iterative_solver::nIterations), "Number of iterations performed:");

printNumericTable(modelptr->getBeta(), "Logistic Regression coefficients:");

We will get next results:

Number of iterations performed:
30024.000

Logistic Regression coefficients:
-4.327 0.658

So to get more exact minimum point we can use 'double' FPtype for input table, for created solver, and for created algorithm.

Here example how to use L-BFGS solver:

auto solver = optimization_solver::lbfgs::Batch<double>::create();
const size_t nIterations = 20;
const double accuracyThreshold = 1e-9;
solver->parameter.accuracyThreshold = accuracyThreshold;
solver->parameter.nIterations = nIterations;
solver->parameter.batchSize = nRow; // 10;

solver->parameter.L = 1;
solver->parameter.correctionPairBatchSize = nRow; // 10;

/* Create an algorithm object to train the logistic regression model */
training::Batch<double> algorithm(nClasses);

/* Pass a training data set and dependent values to the algorithm */
algorithm.input.set(classifier::training::data, trainData);
algorithm.input.set(classifier::training::labels, trainDependentVariable);

/* set optimization solver*/
algorithm.parameter().optimizationSolver = solver;

So we obtain next results:

Number of iterations performed:
15

Logistic Regression coefficients:
-4.35787 0.66223

As we set batchSize and correctionPairBatchSize equal to nRow and L parameter to 1 we obtain not stochastic optimization with automatic step selection and obtain less number of iterations.

Best regards,

Kirill

View solution in original post

Kirill_S_Intel · ‎04-03-2020

Hello Dmitry,

Thanks for reporting your question, I'm glad to help you.

Not accurate solution was obtained due to low accuracyThreshold which is 1e-4 by default. For your case SGD momentum algorithm reaches maximum number of iterations that's why we obtain not exact minimum point.

So if we set the same accuracyTreshhold we should increase the number of iterations:

auto solver = optimization_solver::sgd::Batch<float, optimization_solver::sgd::momentum>::create();
const size_t nIterations = 40000;
const float learningRate = 1e-3;
const float accuracyThreshold = 1e-4; // to get closer solution we could even reduce it
solver->parameter.learningRateSequence = HomogenNumericTable<float>::create(1, 1, NumericTable::doAllocate, learningRate);
solver->parameter.accuracyThreshold = accuracyThreshold;
solver->parameter.nIterations = nIterations;
solver->parameter.batchSize = nRows; // 10

trainAlgorithm.parameter().optimizationSolver = solver; // set optimization solver

...

algorithm.compute();
printNumericTable(solver->getResult()->get(optimization_solver::iterative_solver::nIterations), "Number of iterations performed:");

printNumericTable(modelptr->getBeta(), "Logistic Regression coefficients:");

We will get next results:

Number of iterations performed:
30024.000

Logistic Regression coefficients:
-4.327 0.658

So to get more exact minimum point we can use 'double' FPtype for input table, for created solver, and for created algorithm.

Here example how to use L-BFGS solver:

auto solver = optimization_solver::lbfgs::Batch<double>::create();
const size_t nIterations = 20;
const double accuracyThreshold = 1e-9;
solver->parameter.accuracyThreshold = accuracyThreshold;
solver->parameter.nIterations = nIterations;
solver->parameter.batchSize = nRow; // 10;

solver->parameter.L = 1;
solver->parameter.correctionPairBatchSize = nRow; // 10;

/* Create an algorithm object to train the logistic regression model */
training::Batch<double> algorithm(nClasses);

/* Pass a training data set and dependent values to the algorithm */
algorithm.input.set(classifier::training::data, trainData);
algorithm.input.set(classifier::training::labels, trainDependentVariable);

/* set optimization solver*/
algorithm.parameter().optimizationSolver = solver;

So we obtain next results:

Number of iterations performed:
15

Logistic Regression coefficients:
-4.35787 0.66223

As we set batchSize and correctionPairBatchSize equal to nRow and L parameter to 1 we obtain not stochastic optimization with automatic step selection and obtain less number of iterations.

Best regards,

Kirill

Gusev__Dmitry · ‎04-05-2020

Hi Kirill,

Thanks for the clarification. I have some more cases which I would like to clarify:

1) I am getting wrong result below with the toy data set from the previous post and with trainAlgorithm.parameter().penaltyL2 = 0.000125;

Logistic Regression coefficients column vector:
-73838.070
10535.673

lbfgs optimal value: 29.492

However, when trainAlgorithm.parameter().penaltyL2 = 0.0005; the result is correct.

My code is below:

auto solver = optimization_solver::lbfgs::Batch<float>::create();

solver->parameter.accuracyThreshold = accuracyThreshold;
solver->parameter.nIterations = nIterations;
solver->parameter.batchSize = 10;
solver->parameter.L = 1;
solver->parameter.correctionPairBatchSize = 100;
//const float learningRate = 1e-3;

//solver->parameter.stepLengthSequence = HomogenNumericTable<float>::create(1, 1, NumericTable::doAllocate, learningRate);

training::Batch<> trainAlgorithm(2);

trainAlgorithm.input.set(classifier::training::data, data);
trainAlgorithm.input.set(classifier::training::labels, dependentVariables);
trainAlgorithm.parameter().optimizationSolver = solver;

trainAlgorithm.parameter().penaltyL1 = 0.0f;
trainAlgorithm.parameter().penaltyL2 = 0.000125;
trainAlgorithm.parameter().interceptFlag = true;

trainAlgorithm.compute();

2) I am also getting wrong result with more realistic data set (the 1372x5 data csv file is attached) for both zero and positive L2 penalties. Specifically, for zero L2 penalty the result is below. The penalty does not help to stabilize the coefficients:

Logistic Regression coefficients column vector:
2269896543625417850880.000
-993017962874825342976.000
-1080473856619942248448.000
1924941782054602801152.000
-2224122660700265906176.000

lbfgs optimal value: 32.745 //wrong

while correct Optimal Value = 0.01823630

Please advise, I appreciate your help,

Thanks,

Dmitry.

Kirill_S_Intel · ‎04-10-2020

Hi Dmitry,

Seems I found the reason why coefficients are not stabilized.

As it was noticed in previous code lines: it's important to set batchSize and correctionPairBatchSize to number_of_rows(1372) in training data set (and L=1) to get deterministic l-bfgs.

So with correct parameters I have managed to get right results for your data:

Number of iterations performed:
35.00000

Logistic Regression coefficients:
7.32236 -7.86140 -4.19118 -5.28824 -0.60444

Log Loss value:
0.01818

Please just align parameters:

nRows = trainData->getNumberOfRows();

solver->parameter.batchSize = nRows;
solver->parameter.L = 1;
solver->parameter.correctionPairBatchSize = nRows;

Best regards,

Kirill

Gusev__Dmitry · ‎04-11-2020

Hi Kirill,

Thanks for the information. I have a couple of more questions:

1) Does your recommendation to use deterministic lbfgs mean that lbfgs solver works only in full batch mode (not in stochastic or mini batch manner)?

2) Does it mean that the default batchSize = 10, correctionPairBatchSize = 100 will most likely generate wrong result? ( see https://software.intel.com/en-us/daal-programming-guide-computation-8)

3) What optimization solver would you recommend to solve the logistic loss optimization problem with the data set attached? The deterministic lbfgs generates unreasonable result (optimal value = 1892504043520.00000000), saga solver does better (optimal value = 0.58294517), but the correct optimal value should be 0.4783. ( penaltyL2 = 0, interceptFlag = false)

Thank you for your help,

Regards

Dmitry.

Kirill_S_Intel · ‎04-13-2020

Hi Dmitry,

1) The recommendation was given due to deterministic lbfgs performs much less iterations (it's applicable for sgd, adagrad solvers also). If you want to use stochastic or mini batch mode please set more iterations and more accurate step length.

2) It depends on data set, and provided step length, accuracy threshold, number of iterations. Default values also can produce good results.

3) As I see if we use saga solver and set more accurate threshold and bigger number of iterations we can obtain closer solution

Logistic Regression coefficients:
0.00000 0.00105 0.00264 0.00099 0.00409 0.01316 0.00001 -0.00668 -0.00507 0.02397 0.00058 -0.00006

Log Loss value:
0.48798

And with lbfgs solver we can obtain even exact minimum point(with provided small step length):

const double learningRate = 1e-7;
const double accuracyThreshold = 1e-10;
const size_t nRows = trainData->getNumberOfRows();

auto solver = optimization_solver::lbfgs::Batch<double>::create();//optimization_solver::sgd::Batch<double, optimization_solver::sgd::momentum>::create();
solver->parameter.stepLengthSequence = HomogenNumericTable<double>::create(1, 1, NumericTable::doAllocate, learningRate);
solver->parameter.accuracyThreshold = accuracyThreshold;
solver->parameter.nIterations = nIterations;
solver->parameter.batchSize = nRows;
solver->parameter.L = 2; // to disable automatic step selection
solver->parameter.correctionPairBatchSize = nRows;

Number of iterations performed:
59269.00000

Logistic Regression coefficients:
0.00000 -0.21268 0.00183 -0.00042 0.15487 0.00198 0.00000 -0.17484 -0.07494 0.03599 0.00022 -0.61153

Log Loss value:
0.47835

Best regards,

Kirill

Adweidh_Intel · ‎04-17-2020

Dear Dmitry,

Could you please confirm whether your issue is fixed or not ?

Gusev__Dmitry · ‎04-17-2020

I have received a very professional explanation on why the lbfgs algorithm generated wrong results for the data set. I expected, however, that the lbfgs execution time would be better and the algorithm implementation would be more robust for the stochastic mode.

If there is any idea on how to make lbfgs (or any other DAAL solver) computation faster for the data set from the previous post (it is not normalized/scaled data) please share this information with me.

Your help is really appreciated.

Thanks,

Dmitry

James_S · ‎05-10-2020

Thanks Dmitry for the detailed explanation.

Hi Dmitry, we have implementation aligned with article in DAAL, but we could check and investigate possibility of improvements. Thank you!

Not expected results for logistic regression coefficients in the training stage