- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am running into some trouble when trying to batch inference on the NCS2 through the python API (python 3.6.5, Host Windows 10)
I have made an MNIST model in Keras, converted it to TF and used the mo_tf.py to generate the IR. I am not sure at what step in the process the batch size is defined. So here are the things I tried and the results I got.
[Method 1]
Here I build the IR as follows
>python c:\Intel\computer_vision_sdk_2018.5.456\deployment_tools\model_optimizer\mo_tf.py --input_meta_graph ../TF_Model/tf_model.meta --input_shape [1,28,28,1] --data_type FP16 --log_level=ERROR
In python
plugin = IEPlugin(device="MYRIAD") net = IENetwork(model=mf, weights=wf) input_blob = next(iter(net.inputs)) out_blob = next(iter(net.outputs)) net.batch_size = 32 exec_net = plugin.load(network=net) (x_train, y_train), (x_test, y_test) = mnist.load_data() x_test = x_test.reshape(-1, 1, 28, 28) # why it needs this, I am unsure res = exec_net.infer(inputs={input_blob: x_test[srt:end]})
This will work, but as the batch_size increases the classification accuracy decreases. Where at a batch size of 4096 I have 0 correct classifications which seems weird, further when I pass 4096 data samples to the NCS2 the results array comes back with 4098 results. I have no explanation for this. Nor do I understand why the accuracy is going down as the batch size is increasing. I would think the results of accuracy should be identical, regardless of batch size.
[Method 2]
Here I build the IR as follows
>python c:\Intel\computer_vision_sdk_2018.5.456\deployment_tools\model_optimizer\mo_tf.py --input_meta_graph ../TF_Model/tf_model.meta --data_type FP16 --log_level=ERROR --batch 32
In python
plugin = IEPlugin(device="MYRIAD") net = IENetwork(model=mf, weights=wf) input_blob = next(iter(net.inputs)) out_blob = next(iter(net.outputs)) exec_net = plugin.load(network=net)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "ie_api.pyx", line 389, in openvino.inference_engine.ie_api.IEPlugin.load
File "ie_api.pyx", line 400, in openvino.inference_engine.ie_api.IEPlugin.load
RuntimeError: Failed to infer shapes for Reshape layer with error: Invalid reshape mask (dim attribute): number of elements in input: [1,64,2,2] and output: [32,256] mismatch
Also I verified that the net.batch_size is equal to 32 without setting it explicitly, so it is getting the batch size properly from the IR files
It seems to be complaining about this in the XML file
<layer id="10" name="flatten_1/Reshape" precision="FP16" type="Reshape"> <data dim="32,256"/> <input> <port id="0"> <dim>32</dim> <dim>64</dim> <dim>2</dim> <dim>2</dim> </port> </input> <output> <port id="1"> <dim>32</dim> <dim>256</dim> </port> </output> </layer>
What is going wrong in these two methods and what is the right way to setup batch inferencing?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well. I found a bug in my analysis code that determined the accuracy of the inference. As a result of fixing said bug, Method 1 above works flawlessly. Results below for the interested.
I would like to know if this is the right way to do it, and why Method 2 produces such an error. What is the --batch parameter for in mo_tf.py?
Batch Size Run Time(s) Accuracy Total Samples Norm Timing to 10k samples 1 22.273 98.5 9984 22.309 32 7.108 98.5 9984 7.120 64 7.350 98.5 9984 7.362 128 6.603 98.5 9984 6.614 256 6.505 98.5 9984 6.516 512 6.299 98.53 9728 6.475 1024 5.939 98.49 9216 6.444 2048 5.288 98.35 8192 6.456 4096 5.281 98.35 8192 6.446 9984 6.426 98.5 9984 6.436 10000 6.444 98.5 10000 6.444
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you try to validate FP32 on CPU first and then move to FP16 NCS batch 2 and finally to higher batch sizes?
I believe this is a memory layout issue either in input or output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
nikos,
Thanks for the response. I want to reiterate that Method 1 above works fine (CPU and MYRIAD). So I am assuming your comment was in regards to method 2.
When I generate the IR with FP32 and batch "whatever" I am able to execute the network fine, no errors, in my python code using the CPU. It is, however, interesting that in the python code I can set the batch size to whatever I want and the network just takes it. For example, I generated the IR model with --batch set to 32 and in python code I can set the batch size equal to 1000 and feed it a full set of 1000 samples and it takes it without error. Its as if the --batch argument means nothing when in CPU land
When I move back over to MYRIAD land, generating the IR with FP16 and --batch "whatever" still generates the same errors. Here are a few examples of the errors and respective batch size when making the IR and creating the plugin. The error is generated from this line
exec_net = plugin.load(network=net)
(batch of 2)
RuntimeError: Failed to infer shapes for Reshape layer with error: Invalid reshape mask (dim attribute): number of elements in input: [1,64,2,2] and output: [2,256] mismatch
(batch of 4)
RuntimeError: Failed to infer shapes for Reshape layer with error: Invalid reshape mask (dim attribute): number of elements in input: [1,64,2,2] and output: [4,256] mismatch
(batch of 32)
RuntimeError: Failed to infer shapes for Reshape layer with error: Invalid reshape mask (dim attribute): number of elements in input: [1,64,2,2] and output: [32,256] mismatch
I am trying to understand how batching works on the MYRIAD, what the purpose of the --batch parameter is in the IR generation, and if Method 1 is a correct way to batch inference for performance improvements?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The same issue here.
When the batch size is specified when optimizing the model, it is not possible to run inference on NCS devices, both 1 and 2.
The problem is that batch processing (when using the model compiled to use batch size of 1) does not speed up inference on NCS devices at all, the time it takes to run inference scales linearly with the size of a batch. I was hoping that a model optimized for a specific batch size would work better, but I am having no luck even trying to run it on NCS.
Is batch processing even supposed to speed up anything on NCS (1 and/or 2)? I would think that it must, just because of data transfer overhead mitigated to a certain extent.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page