Solved: Combination of fp32 and int8 inference using OpenVINO

Sengodan__Nathiyaa · ‎03-21-2019

I would like to do CNN inference for a network using OpenVINO, but some layers running in fp32 and others in int8.

I can use calibration tool to generate IR and run int8 integer inference. But is there a way to make the inference for some layers in fp32 after calibrating the model ? like modifying the generated .xml file (IR) manually, etc., ?

Shubha_R_Intel · ‎03-22-2019

Dear Nathiyaa:

Currently there are two layers which can be controlled: convolution or fully connected layers. The special attribute that calibration tool points to in IR files for such layers is - quantization_level="I8" or quantization_level="FP32". This attribute actually signals the act of requesting the quantization. If it is FP32, the layer will be executed in floating point precision. If it is I8, the layer can be executed in I8 if quantization scheme allows doing this. Some fusions and activation layers might prohibit quantization even if quantization_level was set to "I8".

Hope it helps and thank you for using OpenVino !

Sincerely,

Shubha

View solution in original post

Shubha_R_Intel · ‎03-22-2019

Dear Nathiyaa:

Currently there are two layers which can be controlled: convolution or fully connected layers. The special attribute that calibration tool points to in IR files for such layers is - quantization_level="I8" or quantization_level="FP32". This attribute actually signals the act of requesting the quantization. If it is FP32, the layer will be executed in floating point precision. If it is I8, the layer can be executed in I8 if quantization scheme allows doing this. Some fusions and activation layers might prohibit quantization even if quantization_level was set to "I8".

Hope it helps and thank you for using OpenVino !

Sincerely,

Shubha

Sengodan__Nathiyaa · ‎03-25-2019

Thanks Shubha. That was very helpful. On a different note, How much of performance speedup can we expect in I8 inference when compared with FP32?