I'm trying to port tensorflow SSD-Mobilenet and SSDLite-Mobilenet models through OpenVINO to run it with a Movidius NCS . I was able to successfully port the model and run it. However the FPS is very low at around 1-2 FPS. The bottleneck is in Postprocessing, an operation named 'do_reshape_conf' takes up around 90% of the inference time. This is a Reshape operation over the confidence tensor. How can I fix this? I can upload the entire runtime stats here, if required. Is this a version issue? Will downgrading the OpenVINO toolkit or tensorflow fix this issue?
Anand C U
Could you try to run it under CPU to see the performance?
Normally the heterogenous mode has the issue of distribution overhead, this implies if the operation is not supported by NCS, it will be offload to CPU which causes time.
I am just wonder if we run it on single device would help.
I have exported it as FP16 model currently, hence it is failing to run on CPU. However, I will try that experiment and update the results.
About the operation being offloaded to the CPU, all the remaining 'Reshape' operations are not taking long to run.
Also, I'm running Movidius NCS on a Intel i7 Processor Laptop (for development purposes), so the CPU only mode will also have a higher FPS, is my guess.
I've attached the run time statistics here. Please take a look at that, it might help to isolate the issue.
Anand C U
Did you tried on CPU yet? Please post your results if you did.
I have submitted a support request to dev team and I will keep you updated.
In order to reproduce the issue, could you attached or point to the model you are using? Also the steps to reproduce the issue are also speed up the debugging process.
I did try the CPU mode. The inference ran at around 10 FPS. I've attached the Stats Output for the same. One point to note is the operation 'do_reshape_conf' is "NOT_RUN" when running on the CPU. This can be verified in the attachment file.
As for the model, I've tried out SSD_Mobilenet v1, SSD_Mobilenet v2, SSDLite Mobilenet all available in the Tensorflow's Object Detection Model Zoo GitHub page. All the 3 models have the same issue. The operation 'do_reshape_conf' takes ~90% of the total inference time. All the 3 models run at around 1.2-1.5 FPS on the NCS.
Anand C U
Sorry for the late response, do you still have this issue?
Here is the update from dev team:
We tried reproduce issue, but performance was good.
`do_reshape_conf` layer optimized in plugin.
Results for ssd_mobilenet_v1_coco
Please find perf report in attachment.