Running xml model is slower than onnx (same code)

Ziri · ‎03-26-2025

Hello ,

I downloded yolo11 model from UltralycsHub both onnx and (.xml/.bin) format .

Using Openvino c++ this is the inference speed i am getting (512*512 image) :

inference using onnx : 60ms

inference using xml/bin : 230 ms

This is the first time i am trying OpenVino and i was expecting the opposit .

Can you please help ?

Thanks

Aznie_Intel · ‎03-26-2025

Hi Ziri,

Thanks for reaching out. It’s unusual for the OpenVINO IR model (.xml/.bin) to run significantly slower than the ONNX model. Here are some possible reasons and optimizations to try.

First, check if you're using the correct inference device. If you're running inference on a CPU, try setting CPU_THROUGHPUT_AUTO or CPU_BIND_THREAD as performance hints. If you're using a GPU, ensure that OpenVINO uses the correct GPU plugin (AUTO or GPU). For VPU/NPU users, ensure your accelerator is compatible with OpenVINO.

Next, verify that the model conversion was done correctly. If you're using the latest OpenVINO, it's best to use the ovc (OpenVINO Converter) instead of the older mo (Model Optimizer). You can convert your ONNX model to OpenVINO IR format with:

ovc --input_model yolov11.onnx

Alternatively, if you're using Python, you can do the same with:

import openvino as ov

ir_model = ov.convert_model("yolov11.onnx")

ov.save_model(ir_model, "yolov11.xml")

Incorrect conversion may add unnecessary operations that slow down inference.

You can also test both the ONNX and IR models using OpenVINO’s built-in benchmark tool. Benchmark app will show detailed performance metrics to compare their speeds.

benchmark_app -m yolov11.onnx -d CPU

benchmark_app -m yolov11.xml -d CPU

To further optimize performance, try using asynchronous inference instead of synchronous inference. Increasing the number of inference requests can also improve efficiency. This can be done with:

compiled_model = core.compile_model(model_path, "CPU", {ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY)});

Since OpenVINO supports parallel execution, making these adjustments can significantly boost inference speed.

Another key factor is the precision of the model. By default, OpenVINO models run in FP32, which may be slower. Converting the model to FP16 can improve performance:

ovc --input_model yolov11.onnx --compress_fp16

If the issue persists after trying these steps, you can share the model for validation.

Regards,

Aznie

Aznie_Intel · ‎04-07-2025

Hi Ziri,

This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question.

Regards,

Aznie