Batch size > 1 is not working with target GPU device using OpenVINO C++ API.

Shelke__Sagar · ‎04-30-2019

When "GPU" is used as target device, batch size more than 1 does not work. Loading neural network to plugin fails giving error.

inputDims=300 300 3 4
outputDims=4 1 100 7
SSD Mode
terminate called after throwing an instance of 'InferenceEngine::details::InferenceEngineException'
what(): network allocation failed: /teamcity/work/scoring_engine_build/releases_2018_R5/thirdparty/clDNN/src/detection_output.cpp at line: 128
Error has occured for: DetectionOutput_cldnn_output_postprocess
Prior box batch size(=4) is not equal to: expected value
/opt/intel/computer_vision_sdk_2018.5.455/deployment_tools/inference_engine/include/details/ie_exception_conversion.hpp:71
Aborted (core dumped)

I am using following function to load IR model.
NOTE* Batch inference work perfectly fine for CPU.
With GPU single image inference works.

 
RetValNNRead read_nn_model(int &infer_height, int &infer_width, int &num_channels)
{
	// 1. Add CPU plugins
	InferenceEnginePluginPtr _plugin = PluginDispatcher({""}).getPluginByDevice("GPU");
	InferencePlugin plugin(_plugin);
	//TODO: Write shell to make in 'ie_cpu_extension'
	// make generates 'libcpu_extension.so'
	//string s_ext_plugin = "./ie_cpu_extension/libcpu_extension.so";
    //auto extension_ptr = make_so_pointer<InferenceEngine::IExtension>(s_ext_plugin);
	//plugin.AddExtension(extension_ptr);
	
	// 2. Create an IR reader and read network files
	CNNNetReader network_reader;
	network_reader.ReadNetwork("./openvino_189_fp16/frozen_inference_graph.xml");
	network_reader.ReadWeights("./openvino_189_fp16/frozen_inference_graph.bin");
	
	CNNNetwork network = network_reader.getNetwork();
	/** Set network batch size to 4 **/
	network.setBatchSize(4);
	size_t batchSize = network.getBatchSize();
	
	//3. Configure input and output
	
	/**Get NN input information **/
	InferenceEngine::InputsDataMap input_info(network.getInputsInfo());
	InferenceEngine::SizeVector inputDims;
	
	for (auto &item : input_info) {
            auto input_data = item.second;
            input_data->setPrecision(Precision::U8);
            input_data->setLayout(Layout::NCHW);
            inputDims=input_data->getDims();
	}
	cout << "inputDims=";
        for (int i=0; i<inputDims.size(); i++) {
            cout << (int)inputDims << " ";
        }
	cout << endl;
	infer_width=inputDims[0];
    infer_height=inputDims[1];
    num_channels=inputDims[2];
	
	/**Get NN output information **/
	
	InferenceEngine::OutputsDataMap output_info(network.getOutputsInfo());
	InferenceEngine::SizeVector outputDims;
	for (auto &item : output_info) {
		auto output_data = item.second;
		output_data->setPrecision(Precision::FP32);
		output_data->setLayout(Layout::NCHW);
		outputDims=output_data->getDims();
	}
	cout << "outputDims=";
	for (int i=0; i<outputDims.size(); i++) {
		cout << (int)outputDims << " ";
	}
	cout << endl;
	if (outputDims[3]>1)
	{
		cout << "SSD Mode" << endl;
	}
	else
	{
		cout << "Single Classification Mode" << endl;
	}
	
	// 4. Load model to plugin
	ExecutableNetwork executable_network = plugin.LoadNetwork(network, {});
	
	// 5. Create infer request
	InferRequest infer_request = executable_network.CreateInferRequest();
	
	return {infer_request, input_info, output_info, outputDims};
}

Thanks for your time.

Shubha_R_Intel · ‎05-01-2019

Dearest Shelke, Sagar,

Please try the benchmark_app using your model, on a GPU, with a batch size > 1. Does it work properly ?

Also I noticed that you are using an old version of OpenVino. Please upgrade to the latest 2019 R1.0.1. Many bugs have been fixed in the latest release.

Thanks,

Shubha

Shelke__Sagar · ‎05-01-2019

Hi Shubha,

I have tested batch inference on the code from intel smart video workshop git and that works fine.

Code i am using is exactly similar but GPU batch inference still not working. I don't think it's problem with OpenVINO version because I used 2018 R5 to test standard code from smart video workshop.

Thanks,

Sagar

Shubha_R_Intel · ‎05-01-2019

Dear Sagar,

I understand. But I need you to install the latest 2019 R1.0.1 OpenVino and try the benchmark_app (which is very easy to use) because these are basic troubleshooting steps. If benchmark_app succeeds, then there is something wrong with your code. It would be easy for you to check how benchmark_app does GPU batching, and compare your code with benchmark_app.

It looks like the intel smart video workshop uses the caffe mobilenet-ssd/FP32 model. The benchmark_app is an easy straightforward way to test GPU batching without complicated code - it's a matter of passing in your model thorough -m and your batch size through -b, as well as your device through -d.

Thanks,

Shubha

Shelke__Sagar · ‎05-06-2019

Hi Shubha,

Installing latest 2019 R1.0.1 OpenVino solved the problem.

-Sagar