- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ,
I downloded yolo11 model from UltralycsHub both onnx and (.xml/.bin) format .
Using Openvino c++ this is the inference speed i am getting (512*512 image) :
inference using onnx : 60ms
inference using xml/bin : 230 ms
This is the first time i am trying OpenVino and i was expecting the opposit .
Can you please help ?
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ziri,
Thanks for reaching out. It’s unusual for the OpenVINO IR model (.xml/.bin) to run significantly slower than the ONNX model. Here are some possible reasons and optimizations to try.
First, check if you're using the correct inference device. If you're running inference on a CPU, try setting CPU_THROUGHPUT_AUTO or CPU_BIND_THREAD as performance hints. If you're using a GPU, ensure that OpenVINO uses the correct GPU plugin (AUTO or GPU). For VPU/NPU users, ensure your accelerator is compatible with OpenVINO.
Next, verify that the model conversion was done correctly. If you're using the latest OpenVINO, it's best to use the ovc (OpenVINO Converter) instead of the older mo (Model Optimizer). You can convert your ONNX model to OpenVINO IR format with:
ovc --input_model yolov11.onnx
Alternatively, if you're using Python, you can do the same with:
import openvino as ov
ir_model = ov.convert_model("yolov11.onnx")
ov.save_model(ir_model, "yolov11.xml")
Incorrect conversion may add unnecessary operations that slow down inference.
You can also test both the ONNX and IR models using OpenVINO’s built-in benchmark tool. Benchmark app will show detailed performance metrics to compare their speeds.
benchmark_app -m yolov11.onnx -d CPU
benchmark_app -m yolov11.xml -d CPU
To further optimize performance, try using asynchronous inference instead of synchronous inference. Increasing the number of inference requests can also improve efficiency. This can be done with:
compiled_model = core.compile_model(model_path, "CPU", {ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY)});
Since OpenVINO supports parallel execution, making these adjustments can significantly boost inference speed.
Another key factor is the precision of the model. By default, OpenVINO models run in FP32, which may be slower. Converting the model to FP16 can improve performance:
ovc --input_model yolov11.onnx --compress_fp16
If the issue persists after trying these steps, you can share the model for validation.
Regards,
Aznie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ziri,
This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question.
Regards,
Aznie

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page