Intel® Distribution of OpenVINO™ Toolkit
Community support and discussions about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all things computer vision-related on Intel® platforms.
5527 Discussions

Combination of fp32 and int8 inference using OpenVINO

Sengodan__Nathiyaa
188 Views

I would like to do CNN inference for a network using OpenVINO, but some layers running in fp32 and others in int8.

I can use calibration tool to generate IR and run int8 integer inference. But is there a way to make the inference for some layers in fp32 after calibrating the model ? like modifying the generated .xml file (IR) manually, etc., ?

0 Kudos
1 Solution
Shubha_R_Intel
Employee
188 Views

Dear Nathiyaa:

Currently there are two layers which can be controlled: convolution or fully connected layers. The special attribute that calibration tool points to in IR files for such layers is - quantization_level="I8" or quantization_level="FP32". This attribute actually signals the act of requesting the quantization. If it is FP32, the layer will be executed in floating point precision. If it is I8, the layer can be executed in I8 if quantization scheme allows doing this. Some fusions and activation layers might prohibit quantization even if quantization_level was set to "I8".

Hope it helps and thank you for using OpenVino !

Sincerely,

Shubha

 

View solution in original post

2 Replies
Shubha_R_Intel
Employee
189 Views

Dear Nathiyaa:

Currently there are two layers which can be controlled: convolution or fully connected layers. The special attribute that calibration tool points to in IR files for such layers is - quantization_level="I8" or quantization_level="FP32". This attribute actually signals the act of requesting the quantization. If it is FP32, the layer will be executed in floating point precision. If it is I8, the layer can be executed in I8 if quantization scheme allows doing this. Some fusions and activation layers might prohibit quantization even if quantization_level was set to "I8".

Hope it helps and thank you for using OpenVino !

Sincerely,

Shubha

 

View solution in original post

Sengodan__Nathiyaa
188 Views

Thanks Shubha. That was very helpful. On a different note, How much of performance speedup can we expect in I8 inference when compared with FP32?

Reply