- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see this document
https://docs.openvinotoolkit.org/latest/_inference_engine_tools_calibration_tool_README.html
I use Simplified Mode to convert my own F32 IR model to int8。 I got the int8 IR model of the target device for CPU and GPU respectively. I do inference using int8 CPU IR model using CPU, and the inference time decrease. I do inference using int8 GPU IR model using GPU, and the inference time Inference time has not changed.
I see the GPU does not support int8 IR model in https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Int8Inference.html. So, does the GPU really support the int8 inference?
In addition, I use the Simplified mode to generate the int8 iR model. The IR model generated in Simplified mode will only affect the inference accuracy? Does the inference time of the IR model generated in Simplified mode differ greatly from the Inference time of the IR model generated by step 1-4?
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Wang,
Low-Precision 8-bit Integer Inference is a "preview feature" and optimized for CPU.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hemanth Kumar G. (Intel) wrote:Hi Wang,
Low-Precision 8-bit Integer Inference is a "preview feature" and optimized for CPU.
Thank you.
Does this int8 IR Model generated by simplified mode only affect inference accuracy but does not affect Inference time?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear rongrong, wang,
The calibration tools allow conversion to INT8 using a loss of accuracy which you can live with. It's really up to you, though of course there are recommended guidelines. The idea behind INT8 is that the model may detect perfectly well even with this loss of accuracy. And yes, INT8 is supposed to improve performance. There is no reason to run an FP32 model if INT8 does the job, for INT8 will likely run faster. Keep in mind though that INT8 is still somewhat restrictive - not all layers can be converted to INT8. The INT8 reference documentation provides detailed info.
Thanks,
Shubha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Shubha R. (Intel) wrote:Dear rongrong, wang,
The calibration tools allow conversion to INT8 using a loss of accuracy which you can live with. It's really up to you, though of course there are recommended guidelines. The idea behind INT8 is that the model may detect perfectly well even with this loss of accuracy. And yes, INT8 is supposed to improve performance. There is no reason to run an FP32 model if INT8 does the job, for INT8 will likely run faster. Keep in mind though that INT8 is still somewhat restrictive - not all layers can be converted to INT8. The INT8 reference documentation provides detailed info.
Thanks,
Shubha
Thank you very much! I understand.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page