How to batch inference on NCS2 with the python API

Hill__Aaron · ‎02-20-2019

I am running into some trouble when trying to batch inference on the NCS2 through the python API (python 3.6.5, Host Windows 10)

I have made an MNIST model in Keras, converted it to TF and used the mo_tf.py to generate the IR. I am not sure at what step in the process the batch size is defined. So here are the things I tried and the results I got.

[Method 1]

Here I build the IR as follows

>python c:\Intel\computer_vision_sdk_2018.5.456\deployment_tools\model_optimizer\mo_tf.py --input_meta_graph ../TF_Model/tf_model.meta --input_shape [1,28,28,1] --data_type FP16 --log_level=ERROR

In python

plugin = IEPlugin(device="MYRIAD")
net = IENetwork(model=mf, weights=wf)
input_blob = next(iter(net.inputs))
out_blob = next(iter(net.outputs))
net.batch_size = 32
exec_net = plugin.load(network=net)
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_test = x_test.reshape(-1, 1, 28, 28) # why it needs this, I am unsure
res = exec_net.infer(inputs={input_blob: x_test[srt:end]})

This will work, but as the batch_size increases the classification accuracy decreases. Where at a batch size of 4096 I have 0 correct classifications which seems weird, further when I pass 4096 data samples to the NCS2 the results array comes back with 4098 results. I have no explanation for this. Nor do I understand why the accuracy is going down as the batch size is increasing. I would think the results of accuracy should be identical, regardless of batch size.

[Method 2]

Here I build the IR as follows

>python c:\Intel\computer_vision_sdk_2018.5.456\deployment_tools\model_optimizer\mo_tf.py --input_meta_graph ../TF_Model/tf_model.meta --data_type FP16 --log_level=ERROR --batch 32

In python

plugin = IEPlugin(device="MYRIAD")
net = IENetwork(model=mf, weights=wf)
input_blob = next(iter(net.inputs))
out_blob = next(iter(net.outputs))
exec_net = plugin.load(network=net)

Traceback (most recent call last):
File "<input>", line 1, in <module>
File "ie_api.pyx", line 389, in openvino.inference_engine.ie_api.IEPlugin.load
File "ie_api.pyx", line 400, in openvino.inference_engine.ie_api.IEPlugin.load
RuntimeError: Failed to infer shapes for Reshape layer with error: Invalid reshape mask (dim attribute): number of elements in input: [1,64,2,2] and output: [32,256] mismatch

Also I verified that the net.batch_size is equal to 32 without setting it explicitly, so it is getting the batch size properly from the IR files

It seems to be complaining about this in the XML file

		<layer id="10" name="flatten_1/Reshape" precision="FP16" type="Reshape">
			<data dim="32,256"/>
			<input>
				<port id="0">
					<dim>32</dim>
					<dim>64</dim>
					<dim>2</dim>
					<dim>2</dim>
				</port>
			</input>
			<output>
				<port id="1">
					<dim>32</dim>
					<dim>256</dim>
				</port>
			</output>
		</layer>

What is going wrong in these two methods and what is the right way to setup batch inferencing?

Hill__Aaron · ‎02-20-2019

Well. I found a bug in my analysis code that determined the accuracy of the inference. As a result of fixing said bug, Method 1 above works flawlessly. Results below for the interested.

I would like to know if this is the right way to do it, and why Method 2 produces such an error. What is the --batch parameter for in mo_tf.py?

Batch Size    Run Time(s)    Accuracy    Total Samples    Norm Timing to 10k samples
1          22.273         98.5         9984        22.309
32          7.108         98.5         9984        7.120
64          7.350         98.5         9984        7.362
128          6.603        98.5         9984        6.614
256          6.505        98.5         9984        6.516
512          6.299        98.53        9728        6.475
1024         5.939        98.49        9216        6.444
2048         5.288        98.35        8192        6.456
4096         5.281        98.35        8192        6.446
9984         6.426        98.5         9984        6.436
10000        6.444        98.5        10000        6.444

nikos1 · ‎02-21-2019

Could you try to validate FP32 on CPU first and then move to FP16 NCS batch 2 and finally to higher batch sizes?

I believe this is a memory layout issue either in input or output.

Hill__Aaron · ‎02-25-2019

nikos,

Thanks for the response. I want to reiterate that Method 1 above works fine (CPU and MYRIAD). So I am assuming your comment was in regards to method 2.

When I generate the IR with FP32 and batch "whatever" I am able to execute the network fine, no errors, in my python code using the CPU. It is, however, interesting that in the python code I can set the batch size to whatever I want and the network just takes it. For example, I generated the IR model with --batch set to 32 and in python code I can set the batch size equal to 1000 and feed it a full set of 1000 samples and it takes it without error. Its as if the --batch argument means nothing when in CPU land

When I move back over to MYRIAD land, generating the IR with FP16 and --batch "whatever" still generates the same errors. Here are a few examples of the errors and respective batch size when making the IR and creating the plugin. The error is generated from this line

exec_net = plugin.load(network=net)

(batch of 2)
RuntimeError: Failed to infer shapes for Reshape layer with error: Invalid reshape mask (dim attribute): number of elements in input: [1,64,2,2] and output: [2,256] mismatch

(batch of 4)
RuntimeError: Failed to infer shapes for Reshape layer with error: Invalid reshape mask (dim attribute): number of elements in input: [1,64,2,2] and output: [4,256] mismatch

(batch of 32)
RuntimeError: Failed to infer shapes for Reshape layer with error: Invalid reshape mask (dim attribute): number of elements in input: [1,64,2,2] and output: [32,256] mismatch

I am trying to understand how batching works on the MYRIAD, what the purpose of the --batch parameter is in the IR generation, and if Method 1 is a correct way to batch inference for performance improvements?

Ruslan__Ruslan · ‎03-04-2019

The same issue here.

When the batch size is specified when optimizing the model, it is not possible to run inference on NCS devices, both 1 and 2.

The problem is that batch processing (when using the model compiled to use batch size of 1) does not speed up inference on NCS devices at all, the time it takes to run inference scales linearly with the size of a batch. I was hoping that a model optimized for a specific batch size would work better, but I am having no luck even trying to run it on NCS.

Is batch processing even supposed to speed up anything on NCS (1 and/or 2)? I would think that it must, just because of data transfer overhead mitigated to a certain extent.