- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would like to do CNN inference for a network using OpenVINO, but some layers running in fp32 and others in int8.
I can use calibration tool to generate IR and run int8 integer inference. But is there a way to make the inference for some layers in fp32 after calibrating the model ? like modifying the generated .xml file (IR) manually, etc., ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Nathiyaa:
Currently there are two layers which can be controlled: convolution or fully connected layers. The special attribute that calibration tool points to in IR files for such layers is - quantization_level="I8" or quantization_level="FP32". This attribute actually signals the act of requesting the quantization. If it is FP32, the layer will be executed in floating point precision. If it is I8, the layer can be executed in I8 if quantization scheme allows doing this. Some fusions and activation layers might prohibit quantization even if quantization_level was set to "I8".
Hope it helps and thank you for using OpenVino !
Sincerely,
Shubha
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Nathiyaa:
Currently there are two layers which can be controlled: convolution or fully connected layers. The special attribute that calibration tool points to in IR files for such layers is - quantization_level="I8" or quantization_level="FP32". This attribute actually signals the act of requesting the quantization. If it is FP32, the layer will be executed in floating point precision. If it is I8, the layer can be executed in I8 if quantization scheme allows doing this. Some fusions and activation layers might prohibit quantization even if quantization_level was set to "I8".
Hope it helps and thank you for using OpenVino !
Sincerely,
Shubha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Shubha. That was very helpful. On a different note, How much of performance speedup can we expect in I8 inference when compared with FP32?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page