I am using the NCS2 for inference of a relatively simple (just 7 layers) CNN for grayscale images of size 192x192 pixels. Unfortunately, I am forced to use USB2.
The benchmark_app tells me that the total inference time is 14.2 ms, while actual inference computations on the NCS2 take about 6 ms. My conlusion is that 8 ms are "lost" for downloading input data to the NCS2. BTW: I experience this 8 ms offset with other (more complex) models as well.
When looking at the detailed report generated by the benchmark app I see as the first executed step:
Does this really mean that each 8 bit pixel of my grayscale image gets expanded to a 32 bit float by the inference_engine (= SW), which is then converted to 16 bit float by the NCS2? If this is correct, about 6 ms could be saved by downloading 8 bpp instead of 32 bpp and performing an 8bit-to-16bit conversion as a first inference step on the NCS2.
1. Is my interpretation of what's going on correct?
2. If yes, is there any chance to download 8bpp instead of 32 and to do the 8->16 conversion on the NCS2?
* I am using the OpenVino API and I have added inputInfo->setPrecision(Precision::U8) already. Unfortunately, this doesn't make any difference.
* I am using a keras/tensorflow model model which is converted from .pb to .xml/.bin by the model optimizer
The NCS2 is meant to be used in a USB 3.0 port as per the hardware requirements.
I did however run a quick test using the Benchmark demo on a USB 3.0 port vs a USB 2.0 port and experienced about an 8 ms loss after running on a 2.0 port - the USB 2.0 port doesn't supply enough power to the NCS2, resulting in a lower latency.
If you're able to run on a USB 3.0 port and try again, you should be able to get a lower inference time.
many thanks for your response (and apologies for opening the thread twice. There was some delay and I thought my contribution went to /dev/null... )
I fully understand that the stick is meant for USB3 and, of course, I see much better performance when using a USB3 port...
Let me sharpen the point I want to make: Assuming I am correctly interpreting what is going on, 8 bit integers are blown up in software (=inference engine) to 32 bit floats. These are then downloaded to the stick and converted to 16 bit floats by NCS2-HW. If this is true, this doesn't make any sense to me as this is a waste of data download time (by a factor of 4). - And this factor is valid for USB2 as well as USB3.
Yes, you might argue that for USB3 the relative amount of time used for data download is less than for the USB2 case and it is less annoying when using USB3.
However, I am forced to use USB2 and would like to get rid of these (useless) data conversions if there is any chance to do this.
So, my question ist still:
Is there any chance to upload 8 bit integers as 8 bit integers (or maybe even (if needed) 16 bit floats - which would still give me a factor of 2) or is the only accepted format 32 bit float and therefore any data (even if one would use 1 bpp) has to be converted to 32 bit by the inference engine?
Many thanks in advance
Thank you for clarifying!
This document about performing 8-bit int inference might be of use to you, but this preview feature is only available on the CPU: https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Int8Inference.html ;
Please let me know if this answers your question.
thanks you for the link!
Of course, 8 bit quantization could be a work around for my problem. However, this does not really solve my problem because I am forced to use the NCS2 via USB2 and I am not able to perform inference on the CPU. Are there any plans to support 8 bit quantization on NCS2?
If not (or not in the near future), I would like to come back to my initial question: Is there any chance to upload 8 bit per pixel (instead of 32 bpp) and do the conversion from 8bit integer to 16 bit float on the NCS2?
Thanks in advance & best regards
Yes, it is possible to use 8 bit data as input on the NCS2. There is an example of this here: https://github.com/opencv/dldt/blob/2019/inference-engine/samples/object_detection_sample_ssd/main.c...
I'm not sure if this is doing 8 bit computation or if there is some conversion happening internally.
Please let me know if you have any further questions.
many thanks for your suggestion.
This is what I already do (see my remarks in the first post). As yourself, I would expect an 8 bit per pixel data upload. As mentioned earlier: This doesn't seem to happen. So my questions are still the same:
1. Is it correct that 8 bit data (e.g. grayscale images) gets uploaded as 32 bit floats
2. If the answer is yes, is there any chance to circumvent the data conversion that is done by the inference engine (in software)
BTW: I am pretty fine with the 16 bit float inference - it's not my intention to switch to 8 bit quantization for inference.
I think that by default, the precision for input data is FP32, but you can set the precision similar to that line of code that I linked above if you're using c++ or use this code if using python:
network_object.inputs[input_layer_name].precision = “U8”
I hope this is helpful.