Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6503 Discussions

Why processing is faster when input/output is float16?

WilsonChen0723
Beginner
1,164 Views

Why processing is faster when input/output is float16?

How to configure Uint8 for faster speed.

 

Could you please tell me why processing is faster when input/output is float16, or how to configure Uint8 for faster speed?
We are currently working on a number of models and all of them are faster when measured with input/output (-ip/-op) set to F16.

Attached is a simple model that does only conv2d as an example.
This float32 non-quantized model is more than 4 times slower using U8 than F16.

example cmd: benchmark_app.exe -m model_conv2d_1080x1920_pad_fp32.xml -nireq 1 -niter 100 -d NPU -ip U8 -op U8
-ip/-op U8 F16 F32
Median: 78.38 17.00 27.25 ms
Average: 78.24 17.23 27.27
Min: 73.73 15.66 24.08
Max: 82.73 32.77 44.18

Also, F16 is the fastest in the int8 quantization model.

example cmd: benchmark_app.exe -m model_conv2d_1080x1920_pad_int8.xml -nireq 1 -niter 100 -d NPU -ip U8 -op U8
-ip/-op U8 F16 F32
Median: 22.25 11.80 16.91 ms
Average: 22.60 12.01 17.22
Min: 20.73 10.12 15.62
Max: 39.51 20.04 28.87

This one uses a model quantized with float32, but float16 is the fastest even for a model quantized to int8.
The same trend is true for other models such as add.
As an example of a model with multiple layers, comparing profiles with the "-report_type detailed_counters" option showed differences, especially for the first and last layers (etc: FakeQuantize).


Is it internally optimized for float16?
Or is it possible to change the optimal input/output by configuration?
Since uint8 is used in NV12 and other image formats, I would like to know if there is a setting that can achieve the same speed with uint8.

0 Kudos
3 Replies
Wan_Intel
Moderator
1,136 Views

Hi WilsonChen0723,

Thanks for reaching out to us.

 

For your information, I've run Benchmark C++ Tool using face-detection-adas-0001 model with FP16 and INT8 on NPU plugin. I also observed the FPS of FP16 model is higher than INT8 model as shown in the attachment below:

 

82 FPS

fp16 npu.jpg

 

62 FPS

int8 npu.jpg

 

Let me check with the relevant team and we'll update you as soon as possible.

 

 

Regards,

Wan

 

0 Kudos
Wan_Intel
Moderator
861 Views

Hi WilsonChen0723,

Thanks for your patience.

 

For your information, I've run the Benchmark C++ Tool to infer your FP32 and INT8 model with the NPU plugin on a Ubuntu machine using the latest version of the OpenVINO toolkit. The FPS when inferencing with the INT8 model is greater than the FP32 model. Could you please infer the model with the latest version of the OpenVINO toolkit and see if the issue can be resolved?

 

 

Regards,

Wan

 

0 Kudos
Wan_Intel
Moderator
791 Views

Hello WilsonChen0723,

Thanks for your question.

 

If you need any additional information from Intel, please submit a new question as this thread will no longer be monitored.

 

 

Regards,

Wan

 

0 Kudos
Reply