- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have the following device:
Intel TigerLake-LP GT2 [Iris Xe Graphics] driver: i915 v: kernel.
I am using my own Docker which based on this:
openvino/ubuntu22_dev - Docker Image | Docker Hub
Recetnly I implemeted a test application which used OpenVINO 2023.0.0 inference sequnce based on the SDK samples code by both Python\C++ interfaces.
I want to verify if I am using right all accuracy modes: f32, f16 & Int8.
First, I query the OPTIMIZATION_CAPABILITIES of the CPU & iGPU devices.
I saw that both devices supports FP32, FP16 & INT8.
But regarding to INFERENCE_PRECISION_HINT, the iGPU support f16 and the CPU support f32.
I used the mo tool to convert my onnx to FP32 and FP16 IR format.
Additionally, I wrote two Python scripts which used to convert IR Int8 using the nncf package tool:
- Take an input the onnx and generate new onnx with fake quantization nodes and than generate the IR Int8
- Take an input of an already generated IR of FP32 and generate a new IR Int8
Then, I executed these two IR Int8 models using OpenVINO C++ APIs on both devices.
When I tried to set the ov::hint::inference_precision to be ov::element::i8 I got a runtime error which telling me that this value is not supported.
So, I am trying to verify:
- What is the relation between OPTIMIZATION_CAPABILITIES & INFERENCE_PRECISION_HINT?
- Can I inference an IR Int8 with device that its INFERENCE_PRECISION_HINT doesn't support i8?
- Can I inference an IR Int8 with device that its OPTIMIZATION_CAPABILITIES doesn't support i8?
- I set the INFERENCE_PRECISION_HINT to be f16 for my iGPU and successfully load and infer the Int8 IR. Is it the right way to do that?
- Can you give me an example to a device which support INFERENCE_PRECISION_HINT of i8?
Regards,
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi OronG13,
Thank you for reaching out to us.
Yes, you are correct. As mentioned in the Supported Model Format page, both the CPU and GPU devices support the FP32, FP16, and INT-8 Model Formats. You can run inference on the INT-8 models for both CPU and GPU devices because they support INT-8 model format.
The OPTIMIZATION_CAPABILITIES is a metric used to list out the optimization options for the device while the INFERENCE_PRECISION_HINT is a hint for the device to use specified inference precision.
Regarding the Inference Precision, the selected precision will depend on the operation precision in the IR model, quantization primitives, and available hardware capabilities. The i8 data types are used for quantized operations only, therefore they are not selected automatically for non-quantized operations. You can configure the supported floating-point precision of a GPU which are f32 and f16.
On the other hand, as an example, the floating-point precision of a CPU is f32 and bf16 only. To support the f16 OpenVINO IR model, the plugin internally converts all the f16 values to f32, and all the calculations are performed using the native precision of f32.
For your information, I ran the OpenVINO™ Benchmark Tool with INT-8 model format on the GPU for both f32 and f16 precision. I share my results below.
GPU (f32):
GPU (f16):
Regards,
Megat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi OronG13,
This thread will no longer be monitored since we have provided a suggestion. If you need any additional information from Intel, please submit a new question.
Regards,
Megat
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page