does GPU support int8 inference?

rongrong__wang · ‎08-27-2019

I see this document

https://docs.openvinotoolkit.org/latest/_inference_engine_tools_calibration_tool_README.html

I use Simplified Mode to convert my own F32 IR model to int8。 I got the int8 IR model of the target device for CPU and GPU respectively. I do inference using int8 CPU IR model using CPU, and the inference time decrease. I do inference using int8 GPU IR model using GPU, and the inference time Inference time has not changed.

I see the GPU does not support int8 IR model in https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Int8Inference.html. So, does the GPU really support the int8 inference?

In addition, I use the Simplified mode to generate the int8 iR model. The IR model generated in Simplified mode will only affect the inference accuracy? Does the inference time of the IR model generated in Simplified mode differ greatly from the Inference time of the IR model generated by step 1-4?

Thanks.

HemanthKum_G_Intel · ‎08-27-2019

Hi Wang,

Low-Precision 8-bit Integer Inference is a "preview feature" and optimized for CPU.

rongrong__wang · ‎08-27-2019

Hemanth Kumar G. (Intel) wrote:
Hi Wang,
Low-Precision 8-bit Integer Inference is a "preview feature" and optimized for CPU.

Thank you.

Does this int8 IR Model generated by simplified mode only affect inference accuracy but does not affect Inference time?

Shubha_R_Intel · ‎08-27-2019

Dear rongrong, wang,

The calibration tools allow conversion to INT8 using a loss of accuracy which you can live with. It's really up to you, though of course there are recommended guidelines. The idea behind INT8 is that the model may detect perfectly well even with this loss of accuracy. And yes, INT8 is supposed to improve performance. There is no reason to run an FP32 model if INT8 does the job, for INT8 will likely run faster. Keep in mind though that INT8 is still somewhat restrictive - not all layers can be converted to INT8. The INT8 reference documentation provides detailed info.

Thanks,

Shubha

rongrong__wang · ‎08-27-2019

Shubha R. (Intel) wrote:
Dear rongrong, wang,
The calibration tools allow conversion to INT8 using a loss of accuracy which you can live with. It's really up to you, though of course there are recommended guidelines. The idea behind INT8 is that the model may detect perfectly well even with this loss of accuracy. And yes, INT8 is supposed to improve performance. There is no reason to run an FP32 model if INT8 does the job, for INT8 will likely run faster. Keep in mind though that INT8 is still somewhat restrictive - not all layers can be converted to INT8. The INT8 reference documentation provides detailed info.
Thanks,
Shubha

Thank you very much! I understand.