Thank you for answering my question.
Since I didn't find out how to directly reply to your question, I rewrote a post answering how do I quantify the steps for int8.
My device is Intel Core i7-8700 @ 3.20GHz.I converted my ONNX model to FP32 format using OpenVino's mo.py, and got the xml file and bin file of the model. For quantization models to int8, OpenVino's official documentation gives two quantization methods.I've tried both, and here's one of them. Before using OpenVino's quantization tool, I first configured the environment so that pot instruction could be used directly in the terminal.Then,I use my own data set to generate annotation.txt file. Then configure the parameters in the json file and yaml file needed in the method.
After the relevant files are configured, the pot tool is used to call the configured json file for quantization,and the xml file and bin file in int8 format after quantization are obtained.Now I use this method to reason about 25ms for an image in FP32 format, 50ms for int8 format, and 20ms for FP16 format.
The results on my model are FP16>FP32>int8, which officially gives OpenVino CPU acceleration performance int8>FP32>FP16 is inconsistent.
Thanks for reaching out to us.
For your information, as shown in Intel® Distribution of OpenVINO™ toolkit Benchmark Results, throughput of INT8 model format will be faster than FP32 model format.
On another note, I’ve validated that the throughput of the INT8 model format is higher than the FP32 model format as shown as follows:
Throughput = higher is better (faster)
FP32 -> Throughput: 25.33 FPS
INT8 -> Throughput: 37.16 FPS
On the other hand, layers might be the issue as mentioned in this thread.
Thanks for your question.
This thread will no longer be monitored since we have provided information.
If you need any additional information from Intel, please submit a new question.