Solved: Daal neural network training

Alexandr_S_1 · ‎12-23-2016

Hi,

I decided to try train neural network to recognize hand-written symbols using sample from page https://software.intel.com/en-us/node/682103#DAAL-EXAMPLE-CPP-NEURAL_NETWORK_BATCH
and I have several questions

1. Can you give me example on how to set input data for this:

net.input.set(training::data, trainingData);

from images in standard formats, like ".jpg",".png", etc?
Cause I want to use my own image collection, not MNIST.
2. How can I set labels(answers) for testing and training image collections?
3. Is it possible to send for training not whole batch at once, but one by one image?
4. Is it possible to decrease learning rate at every training iteration?

Best regards,
Alexander Smirnov

Ying_H_Intel · ‎12-26-2016

Hi Alexander,

The DAAL example, provide the sample code and data set under DAAL install folder, you may refer them directly.

In original the traindata is from *.csv

/* Read training data set from a .csv file and create a tensor to store input data

*/TensorPtr trainingData = readTensorFromCSV(trainDatasetFile);

TensorPtr trainingGroundTruth = readTensorFromCSV(trainGroundTruthFile);

which was defined in service.h.

So you actually need one readTensorFromJPG(trainDatasetFile); right? or convert your JPG file to CSV file. then call the exact code.

What tools do you use to decode the JPG or PNG image? For example, in the webinar https://www.codeproject.com/Articles/1151612/A-Performance-Library-for-Data-Analytics-and-Machi. we use Python package to read png image and save png image. To get the image data, I guess, you may to use one of script tools to convert all JPG files to CSV files. Then feed them into net.

Or rewrite the daal::services::SharedPtr<Tensor> readTensorFromJPG(const std::string &datasetFileName). Add JPG image decode code. Like if you are using OpenCV, use Imread should be help you get image data. then deside what size, channel to to the Tensor Array.

Here is example one image's feature data.

27,51,0,-73,21,47,64,-88,84,70,-43,96,75,-34,97,-37,-22,-49,67,79

How can I set labels(answers) for testing and training image collections?

You may add the labels(answers) using the integer as below. like use Python script to set them, when you read them.

The example labels is like

0,
0,
1,
0,
0,

3. Is it possible to send for training not whole batch at once, but one by one image?

It is possible, you may write the loop out of the net. and processing one by one image should be almost same as distribute model. (streamline model). But for good performance and if you have all test image ready, then you can process them batched.

for loop {

net.input.set(training::data, trainingData);
net.input.set(training::groundTruth, trainingGroundTruth);

/* Run the neural network training */
net.compute();

}

4. Is it possible to decrease learning rate at every training iteration?

we will check it later.

Best Regards,

Ying

from PIL import Image
from scipy.misc import imread, imsave, imresize
def processImage(self):
        ## save png file to binaries: 0, 1
        self.doodle.buffer.SaveFile('./test_digit.png', wx.BITMAP_TYPE_PNG)
        self.im = np.array(Image.open('./test_digit.png').convert('L'))
        self.im = 1*np.logical_not(self.im)

daal::services::SharedPtr<Tensor> readTensorFromCSV(const std::string &datasetFileName)
{
FileDataSource<CSVFeatureManager> dataSource(datasetFileName, DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext);
dataSource.loadDataBlock();

daal::services::SharedPtr<HomogenNumericTable<double> > ntPtr =
daal::services::staticPointerCast<HomogenNumericTable<double>, NumericTable>(dataSource.getNumericTable());

    daal::services::Collection<size_t> dims;
    dims.push_back(ntPtr->getNumberOfRows());
    size_t size = dims[0];
    if (ntPtr->getNumberOfColumns() > 1)
    {
        dims.push_back(ntPtr->getNumberOfColumns());
        size *= dims[1];
    }

    HomogenTensor<float> *tensor = new HomogenTensor<float>( dims, Tensor::doAllocate );
    float *tensorData = tensor->getArray();
    double *ntData = ntPtr->getArray();

    for(size_t i = 0; i < size; i++)
    {
        tensorData = (float)ntData;
    }

daal::services::SharedPtr<Tensor> tensorPtr(tensor);

return tensorPtr;
}

View solution in original post

Ying_H_Intel · ‎12-26-2016

Hi Alexander,

The DAAL example, provide the sample code and data set under DAAL install folder, you may refer them directly.

In original the traindata is from *.csv

/* Read training data set from a .csv file and create a tensor to store input data

*/TensorPtr trainingData = readTensorFromCSV(trainDatasetFile);

TensorPtr trainingGroundTruth = readTensorFromCSV(trainGroundTruthFile);

which was defined in service.h.

So you actually need one readTensorFromJPG(trainDatasetFile); right? or convert your JPG file to CSV file. then call the exact code.

What tools do you use to decode the JPG or PNG image? For example, in the webinar https://www.codeproject.com/Articles/1151612/A-Performance-Library-for-Data-Analytics-and-Machi. we use Python package to read png image and save png image. To get the image data, I guess, you may to use one of script tools to convert all JPG files to CSV files. Then feed them into net.

Or rewrite the daal::services::SharedPtr<Tensor> readTensorFromJPG(const std::string &datasetFileName). Add JPG image decode code. Like if you are using OpenCV, use Imread should be help you get image data. then deside what size, channel to to the Tensor Array.

Here is example one image's feature data.

27,51,0,-73,21,47,64,-88,84,70,-43,96,75,-34,97,-37,-22,-49,67,79

How can I set labels(answers) for testing and training image collections?

You may add the labels(answers) using the integer as below. like use Python script to set them, when you read them.

The example labels is like

0,
0,
1,
0,
0,

3. Is it possible to send for training not whole batch at once, but one by one image?

It is possible, you may write the loop out of the net. and processing one by one image should be almost same as distribute model. (streamline model). But for good performance and if you have all test image ready, then you can process them batched.

for loop {

net.input.set(training::data, trainingData);
net.input.set(training::groundTruth, trainingGroundTruth);

/* Run the neural network training */
net.compute();

}

4. Is it possible to decrease learning rate at every training iteration?

we will check it later.

Best Regards,

Ying

from PIL import Image
from scipy.misc import imread, imsave, imresize
def processImage(self):
        ## save png file to binaries: 0, 1
        self.doodle.buffer.SaveFile('./test_digit.png', wx.BITMAP_TYPE_PNG)
        self.im = np.array(Image.open('./test_digit.png').convert('L'))
        self.im = 1*np.logical_not(self.im)

daal::services::SharedPtr<Tensor> readTensorFromCSV(const std::string &datasetFileName)
{
FileDataSource<CSVFeatureManager> dataSource(datasetFileName, DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext);
dataSource.loadDataBlock();

daal::services::SharedPtr<HomogenNumericTable<double> > ntPtr =
daal::services::staticPointerCast<HomogenNumericTable<double>, NumericTable>(dataSource.getNumericTable());

    daal::services::Collection<size_t> dims;
    dims.push_back(ntPtr->getNumberOfRows());
    size_t size = dims[0];
    if (ntPtr->getNumberOfColumns() > 1)
    {
        dims.push_back(ntPtr->getNumberOfColumns());
        size *= dims[1];
    }

    HomogenTensor<float> *tensor = new HomogenTensor<float>( dims, Tensor::doAllocate );
    float *tensorData = tensor->getArray();
    double *ntData = ntPtr->getArray();

    for(size_t i = 0; i < size; i++)
    {
        tensorData = (float)ntData;
    }

daal::services::SharedPtr<Tensor> tensorPtr(tensor);

return tensorPtr;
}

Alexandr_S_1 · ‎12-27-2016

Hi Ying,

Thanks for your reply,

I tried to set input training and testing data from csv using function:

daal::services::SharedPtr<Tensor> readTensorFromCSV(const std::string &datasetFileName)
{
    FileDataSource<CSVFeatureManager> dataSource(datasetFileName, DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext);
    dataSource.loadDataBlock();

    daal::services::SharedPtr<HomogenNumericTable<double> > ntPtr =
        daal::services::staticPointerCast<HomogenNumericTable<double>, NumericTable>(dataSource.getNumericTable());

    daal::services::Collection<size_t> dims;
    dims.push_back(ntPtr->getNumberOfRows());
    size_t size = dims[0];
    if (ntPtr->getNumberOfColumns() > 1)
    {
        dims.push_back(ntPtr->getNumberOfColumns());
        size *= dims[1];
    }

    HomogenTensor<float> *tensor = new HomogenTensor<float>( dims, Tensor::doAllocate );
    float *tensorData = tensor->getArray();
    double *ntData = ntPtr->getArray();

    for(size_t i = 0; i < size; i++)
    {
        tensorData = (float)ntData;
    }

    daal::services::SharedPtr<Tensor> tensorPtr(tensor);

    return tensorPtr;
}

I created training csv data file for some images 28x28,1 channel for example train data like this:

1 row: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,97,255,164,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,185,254,189,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,254,254,189,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,254,254,107,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100,254,254,28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,110,254,254,28,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,110,254,254,80,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,110,254,254,64,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,197,254,254,189,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,218,254,254,210,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,254,254,189,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,254,254,189,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,29,254,254,189,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19,236,254,189,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,202,254,189,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,202,254,238,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,188,254,254,16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,191,254,254,16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,202,254,254,16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,61,215,166,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

2 row: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,226,153,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,59,249,223,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135,254,254,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,243,255,217,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,254,254,106,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,118,254,242,37,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,44,244,254,124,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,169,254,254,85,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,83,254,254,228,42,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,106,254,246,37,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,23,206,254,198,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,150,254,240,68,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,89,237,254,149,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,212,254,223,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,178,255,250,96,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,114,250,254,173,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,11,179,254,254,190,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,111,254,254,240,39,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,220,254,239,59,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,153,231,37,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

...

and training labels like this:

1 row: 0,

2 row: 2,

...

Number of rows in both files equals.

For testing I maked the same things.

Python script for generation csv-files:

import os
from PIL import Image
from array import *
from random import shuffle
from __builtin__ import list

# Load from and save to
Names = [['./training-images','train'], ['./test-images','test']]

for name in Names:
	
	FileList = []
	for dirname in os.listdir(name[0]):
		path = os.path.join(name[0],dirname)
		for filename in os.listdir(path):
			if filename.endswith(".png"):
				FileList.append(os.path.join(name[0],dirname,filename))

	shuffle(FileList)

	for filename in FileList:
		label = int(filename.split('\\')[1])
		Im = Image.open(filename)
		pixel = Im.load()
		width, height = Im.size 

		pixels = []

		for x in range(0,width):
			for y in range(0,height):
				pixels.append(pixel[y,x])
		with open(name[1] + '.csv', 'a+') as outfile:
			outfile.write(','.join([str(i) for i in pixels]) + "\n")

		with open(name[1] + '_labels.csv', 'a+') as outfile:
					outfile.write(str(label) + ',' + "\n")

So after that I set input data:

_trainingData = readTensorFromCSV(datasetFileNamesCSV[0]);
_testingData = readTensorFromCSV(datasetFileNamesCSV[2]);
_trainingGroundTruth = readTensorFromCSV(datasetFileNamesCSV[1]);
_testingGroundTruth = readTensorFromCSV(datasetFileNamesCSV[3]);

My problem is exception: daal::services::interface1::Exception at memory location 0x00F3FAC0

at function net.compute();

void train()
{
	const size_t _batchSize = 1;
	double learningRate = 0.01;

	SharedPtr<optimization_solver::sgd::Batch<float> > sgdAlgorithm(new optimization_solver::sgd::Batch<float>());
	(*(HomogenNumericTable<double>::cast(sgdAlgorithm->parameter.learningRateSequence)))[0][0] = learningRate;

	training::TopologyPtr topology = configureNet();

	training::Batch<> net;

	net.parameter.batchSize = _batchSize;
	net.parameter.optimizationSolver = sgdAlgorithm;
	//net.parameter.optimizationSolver->parameter->nIterations = 1;

	net.initialize(_trainingData->getDimensions(), *topology);

	net.input.set(training::data, _trainingData);
	net.input.set(training::groundTruth, _trainingGroundTruth);
	
	net.compute();

	_predictionModel = net.getResult()->get(training::model)->getPredictionModel<double>();
}

What am i doing wrong?

Ruslan_I_Intel · ‎01-09-2017

Hi Alexandr,

Can you please clarify NN topology you use in the example?

In your code snippet for reading csv file, the output tensor will have two dimensions, with the shape = (rows, columns). If the first layer of your topology is convolution, the input to this layer should have four dimensions, with the shape = (batch_size, channels, rows, columns), and you need “reshape” the tensor you got from CSV.

Please, let me know, if you have more questions on this topic, and we will gladly help you.

Ruslan

Olga_R_Intel · ‎01-10-2017

Hi Alexander,

This is the answer for:
4. Is it possible to decrease learning rate at every training iteration?

With present version of Intel DAAL you can modify the learning rate at each solver by running the loop over the set of batches in the dataset as demonstrated below in the code snip:

…
SharedPtr<optimization_solver::sgd::Batch<float> > sgdAlgorithm(new optimization_solver::sgd::Batch<float>());
net.parameter.optimizationSolver = sgdAlgorithm;
…
for (size_t i = 0; i < nBatches; i++)
{
       // fill in trainingDataArray with the next batch of data and trainingGroundTruthArray with respective values of the ground truth for that batch
       …  
       net.input.set(training::data, trainingDataArray);
       net.input.set(training::groundTruth, trainingGroundTruthArray); 
       (*(HomogenNumericTable<double>::cast(sgdAlgorithm->parameter.learningRateSequence)))[0][0] = nextLearningRateValue;
       net.compute();
}

In longer term we consider options to extend the library to hide this logic inside of the method compute() of the neural network object.

Please, let us know if it answers your question.