Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6605 Discussions

What is the right way to perform inference process using Int8 quantized IR model?

OronG13
Beginner
2,958 Views

Hello,

I have the following device:

Intel TigerLake-LP GT2 [Iris Xe Graphics] driver: i915 v: kernel.

 

I am using my own Docker which based on this:

openvino/ubuntu22_dev - Docker Image | Docker Hub

 

Recetnly I implemeted a test application which used OpenVINO 2023.0.0 inference sequnce based on the SDK samples code by both Python\C++ interfaces.

 

I want to verify if I am using right all accuracy modes: f32, f16 & Int8.

 

First, I query the OPTIMIZATION_CAPABILITIES of the CPU & iGPU devices.

I saw that both devices supports FP32, FP16 & INT8.

But regarding to INFERENCE_PRECISION_HINT, the iGPU support f16 and the CPU support f32.

 

I used the mo tool to convert my onnx to FP32 and FP16 IR format.

Additionally, I wrote two Python scripts which used to convert IR Int8 using the nncf package tool:

  1. Take an input the onnx and generate new onnx with fake quantization nodes and than generate the IR Int8
  2. Take an input of an already generated IR of FP32 and generate a new IR Int8

Then, I executed these two IR Int8 models using OpenVINO C++ APIs on both devices.

When I tried to set the ov::hint::inference_precision to be ov::element::i8 I got a runtime error which telling me that this value is not supported.

 

So, I am trying to verify:

  1. What is the relation between OPTIMIZATION_CAPABILITIES & INFERENCE_PRECISION_HINT?
  2. Can I inference an IR Int8 with device that its INFERENCE_PRECISION_HINT doesn't support i8?
  3. Can I inference an IR Int8 with device that its OPTIMIZATION_CAPABILITIES doesn't support i8?
  4. I set the INFERENCE_PRECISION_HINT to be f16 for my iGPU and successfully load and infer the Int8 IR. Is it the right way to do that?
  5. Can you give me an example to a device which support INFERENCE_PRECISION_HINT of i8?

 

Regards,

0 Kudos
2 Replies
Megat_Intel
Moderator
2,911 Views

Hi OronG13,

Thank you for reaching out to us.

 

Yes, you are correct. As mentioned in the Supported Model Format page, both the CPU and GPU devices support the FP32, FP16, and INT-8 Model Formats. You can run inference on the INT-8 models for both CPU and GPU devices because they support INT-8 model format.

 

The OPTIMIZATION_CAPABILITIES is a metric used to list out the optimization options for the device while the INFERENCE_PRECISION_HINT is a hint for the device to use specified inference precision.

 

Regarding the Inference Precision, the selected precision will depend on the operation precision in the IR model, quantization primitives, and available hardware capabilities. The i8 data types are used for quantized operations only, therefore they are not selected automatically for non-quantized operations. You can configure the supported floating-point precision of a GPU which are f32 and f16.

 

On the other hand, as an example, the floating-point precision of a CPU is f32 and bf16 only. To support the f16 OpenVINO IR model, the plugin internally converts all the f16 values to f32, and all the calculations are performed using the native precision of f32.

 

For your information, I ran the OpenVINO™ Benchmark Tool with INT-8 model format on the GPU for both f32 and f16 precision. I share my results below.

 

GPU (f32):

 gpu f32.png

 

GPU (f16):

 gpu f16.png

 

 

Regards,

Megat

 

0 Kudos
Megat_Intel
Moderator
2,863 Views

Hi OronG13,

This thread will no longer be monitored since we have provided a suggestion. If you need any additional information from Intel, please submit a new question.

 

 

Regards,

Megat


0 Kudos
Reply