I am using the NCS2 for inference of a relatively simple (just 7 layers) CNN for grayscale images of size 192x192 pixels. Unfortunately, I am forced to use USB2.
The benchmark_app tells me that the total inference time is 14.2 ms, while actual inference computations on the NCS2 take about 6 ms. My conlusion is that 8 ms are "lost" for downloading input data to the NCS2. BTW: I experience this 8 ms offset with other (more complex) models as well.
When looking at the detailed report generated by the benchmark app I see as the first executed step:
Does this really mean that each 8 bit pixel of my grayscale image gets expanded to a 32 bit float by the inference_engine (= SW), which is then converted to 16 bit float by the NCS2? If this is correct, about 6 ms could be saved by downloading 8 bpp instead of 32 bpp and performing an 8bit-to-16bit conversion as a first inference step on the NCS2.
1. Is my interpretation of what's going on correct?
2. If yes, is there any chance to download 8bpp instead of 32 and to do the 8->16 conversion on the NCS2?
* I am using the OpenVino API and I have added inputInfo->setPrecision(Precision::U8) already. Unfortunately, this doesn't make any difference.
* I am using a keras/tensorflow model model which is converted from .pb to .xml/.bin by the model optimizer