Solved: Need Assistance with Inference and Post Training Optimization in OpenVino

Aouatif · ‎10-31-2023

I've encountered an issue with inference in OpenVino.

I trained my custom YOLOV8 object detection model and converted it using OpenVino's Model Optimizer.

When I tested the model, I obtained perfect results.

However, the problem arises when I took YOLOv8's Model Optimizer and converted it using Post Training Optimization. I tested these more optimized models but received no detections on the image.

I'm unsure of the exact problem. Is it related to my inference script or when generating the POT models?

Note: I perform inference of both the Model Optimizer model and Post Training Optimization using the same script Python.

Thanks in advance.

Aznie_Intel · ‎11-01-2023

Hi Aouatif,

Have you tested your INT8 model and IR model with Benchmark App? Actually, when converting your model into a low precision the tradeoff is between the accuracy drop and performance. When a model is in low precision, it is usually performed compared to the same model in full precision but the accuracy might be worse. You can find some benchmarking results in INT8 vs FP32 Comparison on Select Networks and Platforms. The other benefit of having a model in low precision is its smaller size.

The possible step you may try to improve the performance is as follow:

Make sure the accuracy of the original uncompressed model has the value you expect. Run your POT pipeline with an empty compression config and evaluate the resulting model metric. Compare this uncompressed model accuracy metric value with your reference.
Run your compression pipeline with a single compression algorithm (Default Quantization or Accuracy-aware Quantization) without any parameter values specified in the config (except for preset and stat_subset_size). Make sure you get the desirable accuracy drop/performance gain in this case.

Finally, if you have done the steps above and the problem persists, you could try to compress your model using the Neural Network Compression Framework (NNCF). Note that NNCF usage requires you to have a PyTorch or TensorFlow 2 based training pipeline of your model to perform Quantization-aware Training. See Model Optimization Guide for more details.

Hope this helps.

Regards,

Aznie

View solution in original post

Aznie_Intel · ‎10-31-2023

Hi Aouatif,

Thanks for reaching out.

Which OpenVINO version you are using? For your information, Post-training Optimization Tool is deprecated since OpenVINO 2023.0. Neural Network Compression Framework (NNCF) is recommended for the post-training quantization instead.

You may share your model files (IR and INT8) for us to further check from our end.

Regards,

Aznie

Aouatif · ‎11-01-2023

Hello Aznie_Intel,

First and foremost, I'd like to express my gratitude for your prompt response.

I'm providing the OpenVino version used:

## 2022.3.0-9038-b84161848ea-releases/2022/3 ##

I converted my custom PyTorch model to ONNX, using data_type FP32 to the OpenVino Model Optimizer, and finally to POT. I verified that when inspecting these models in Netron, they displayed an FP32 tensor input shape of [1, 3, 640, 640].

I repeated the same process using data_type FP16, and all the models performed well, except for the POT models.

The command used for generating the POT models is:

## pot -q default -m custom_model.xml -w custom_model.bin --engine simplified --data-source path_dataset/ --output-dir output_folder/ ##

Is there a specific issue with generating the POT models?

Best regards,
Aouatif

Aznie_Intel · ‎11-01-2023

Hi Aouatif,

Have you tested your INT8 model and IR model with Benchmark App? Actually, when converting your model into a low precision the tradeoff is between the accuracy drop and performance. When a model is in low precision, it is usually performed compared to the same model in full precision but the accuracy might be worse. You can find some benchmarking results in INT8 vs FP32 Comparison on Select Networks and Platforms. The other benefit of having a model in low precision is its smaller size.

The possible step you may try to improve the performance is as follow:

Make sure the accuracy of the original uncompressed model has the value you expect. Run your POT pipeline with an empty compression config and evaluate the resulting model metric. Compare this uncompressed model accuracy metric value with your reference.
Run your compression pipeline with a single compression algorithm (Default Quantization or Accuracy-aware Quantization) without any parameter values specified in the config (except for preset and stat_subset_size). Make sure you get the desirable accuracy drop/performance gain in this case.

Finally, if you have done the steps above and the problem persists, you could try to compress your model using the Neural Network Compression Framework (NNCF). Note that NNCF usage requires you to have a PyTorch or TensorFlow 2 based training pipeline of your model to perform Quantization-aware Training. See Model Optimization Guide for more details.

Hope this helps.

Regards,

Aznie

Aznie_Intel · ‎11-15-2023

Hi Aouatif,

This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question.

Regards,

Aznie

AouatifZ · ‎11-21-2023

Hi,

Firstly, I apologize for the delayed response.

Thank you for your advice.

I have worked with the Neural Network Compression Framework and achieved positive results.Thanks again.

Regards,

Aouatif