Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6551 Discussions

Running xml model is slower than onnx (same code)

Ziri
Beginner
350 Views

Hello ,

 

I downloded yolo11 model from UltralycsHub  both  onnx and (.xml/.bin) format .

Using Openvino c++ this is the  inference speed i am getting  (512*512 image) :

 

    inference using onnx :  60ms

   inference using xml/bin : 230 ms 

 

This is the first time i am trying OpenVino and i was expecting the opposit .

Can you please help ?

 

Thanks

0 Kudos
2 Replies
Aznie_Intel
Moderator
295 Views

Hi Ziri,

 

Thanks for reaching out. It’s unusual for the OpenVINO IR model (.xml/.bin) to run significantly slower than the ONNX model. Here are some possible reasons and optimizations to try.

 

First, check if you're using the correct inference device. If you're running inference on a CPU, try setting CPU_THROUGHPUT_AUTO or CPU_BIND_THREAD as performance hints. If you're using a GPU, ensure that OpenVINO uses the correct GPU plugin (AUTO or GPU). For VPU/NPU users, ensure your accelerator is compatible with OpenVINO.

 

Next, verify that the model conversion was done correctly. If you're using the latest OpenVINO, it's best to use the ovc (OpenVINO Converter) instead of the older mo (Model Optimizer). You can convert your ONNX model to OpenVINO IR format with:

 

ovc --input_model yolov11.onnx

 

Alternatively, if you're using Python, you can do the same with:

 

import openvino as ov

ir_model = ov.convert_model("yolov11.onnx")

ov.save_model(ir_model, "yolov11.xml")

 

Incorrect conversion may add unnecessary operations that slow down inference.

You can also test both the ONNX and IR models using OpenVINO’s built-in benchmark tool. Benchmark app will show detailed performance metrics to compare their speeds.

 

benchmark_app -m yolov11.onnx -d CPU 

benchmark_app -m yolov11.xml -d CPU 

 

To further optimize performance, try using asynchronous inference instead of synchronous inference. Increasing the number of inference requests can also improve efficiency. This can be done with:

compiled_model = core.compile_model(model_path, "CPU", {ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY)});

 

Since OpenVINO supports parallel execution, making these adjustments can significantly boost inference speed.

Another key factor is the precision of the model. By default, OpenVINO models run in FP32, which may be slower. Converting the model to FP16 can improve performance:

ovc --input_model yolov11.onnx --compress_fp16

 

If the issue persists after trying these steps, you can share the model for validation.

 

 

Regards,

Aznie

 

0 Kudos
Aznie_Intel
Moderator
126 Views

Hi Ziri,


This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question. 



Regards,

Aznie


0 Kudos
Reply