DAAL neural network training works very slow

Alexandr_S_1 · ‎12-23-2016

Hi,

I tried to train LeNet from official example with MNIST input data http://yann.lecun.com/exdb/mnist/

My problem is very long time training(about 20 minutes) only 1 iteration.

/*LeNet training*/
void train()
{
	const size_t _batchSize = 10;
	double learningRate = 0.01;

	SharedPtr<optimization_solver::sgd::Batch<float> > sgdAlgorithm(new optimization_solver::sgd::Batch<float>());
	(*(HomogenNumericTable<double>::cast(sgdAlgorithm->parameter.learningRateSequence)))[0][0] = learningRate;

	training::TopologyPtr topology = configureNet();

	training::Batch<> net;

	net.parameter.batchSize = _batchSize;
	net.parameter.optimizationSolver = sgdAlgorithm;
	net.parameter.optimizationSolver->parameter->nIterations = 1;

	net.initialize(_trainingData->getDimensions(), *topology);

	net.input.set(training::data, _trainingData);
	net.input.set(training::groundTruth, _trainingGroundTruth);
	
	net.compute();

	_predictionModel = net.getResult()->get(training::model)->getPredictionModel<double>();
}

I tried to increase _batchSize but still have long time computation.

My hardware characteristics:

Intel(R) Core(TM) i5-3450 CPU @ 3.10GHz, 4 cores. 16 GB RAM.

Is it normal for my hardware or something went wrong?

Ying_H_Intel · ‎01-04-2017

Hi Alex,

Are you working 32bit or 64bit application with sequential DAAL library or threaded library?

If possible, could you provide one small test case include the test files so we can try at our sides? (as Daria did in https://software.intel.com/en-us/forums/intel-data-analytics-acceleration-library/topic/706491)?

Best Regards,

Ying

Olga_R_Intel · ‎01-11-2017

Hi Alexander,

I reproduced the behavior you described earlier by running daal_lenet sample on Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz using Linux version of Intel DAAL 2017 U1 in 64 bit dynamic mode. In those settings, it took ~15 minutes to run one epoch which is processing of the full dataset of size 50000 with batch size 10.

As discussed in another forum thread, https://software.intel.com/en-us/forums/intel-data-analytics-acceleration-library/topic/705133, the present version of the library does not use number of iterations provided into respective optimization solver and calculates number of iterations from the size of the dataset and size of batch. Thus, call to net.compute() passes through the whole dataset and trains the model using the batches of the respective size. To execute only one iteration that processes one batch of the samples, please decrease number of elements in dataset. To do it set TrainDataCount variable in daal_lenet.cpp:26 equal to your batchSize to have data only for one iteration.
My experiments show that one iteration takes for 0.23 seconds (batchSize = 10) and 2.28 seconds (batchSize = 100) on the server described above.

We consider support for number of iterations and accuracy threshold in the neural network training algorithm in the future releases of the library.
Please, let us know if it answers your question

Alexandr_S_1 · ‎01-11-2017

Thanks, I understand now

Alfredo_L_ · ‎01-19-2017

For what it's worth, if you want to train a neural net of appreciable size with DAAL you'll either want to do it on a larger server (eg; 32 cores+) or on a machine with Xeon Phi Coprocessors. The performance afforded by massively parallel devices (eg; GPUs or accelerators like Xeon Phi) is quite significant for network training.