Hello,
I try to use the benchmark_app on a BiseNetv2 model and on NPU device, but the compilation process never ends.
If I choose CPU or NPU every think works fine. I also try with a mobilenet model and it works for all devices.
It seems related to this model on this device...
How could I get more information why the compilation failed ?
Here are my configuration:
System : Intel Core Ultra7 155H
Operating System : Windows 11 (I also try on Ubuntu 22.04)
OpenVino 2024.5.0
NPU driver : 2024.5.0
Thanks for your answer.
連結已複製
Hi Maick,
My colleague was able to make compilation pass with the latest nightly NPU driver: https://af01p-ir.devtools.intel.com/artifactory/vpu_ci-ir-local/Unified/nightly/integration/windows_driver/master/ci_tag_driver_rc_config1_20250316_1902/npu-driver-ci-master-6177-RelWithDebInfo.zip on 155H.
The workaround is to set NPU_COMPILATION_MODE_PARAMS enable-se-ptrs-operations=true
Please check on your side and let us know if it works for you.
Hi,
Thanks for your feedback !
I have no acces to the link you provided, please could you share it with me ?
Do we have access to this compilation parameter with the python API or only with the C++ API ?
Thanks for your support.
Maïck
I realized that the nightly NPU driver that resolved your issue is only available internally at the moment. I asked the developer when this functionality will be released publicly and will get back to you asap.
Hello,
Thanks for your feedback, I try with driver 2025.0.0-17942 and the compilation parameters and the compilation and inference is working on the NPU. But the inference time is slower than on the GPU (~2 times).
Did you expect better performances with the new driver version ? Did you have an idea why it is slower using the NPU ? I tried benchmarking a mobilenet model and for this model the NPU is about 10 times better.
Thanks for support.
Maïck
Thank you for checking, it's sad to hear that the perfomance worsened. My colleague has achieved benchmark result of 5.62fps with the nightly build. I'm still waiting for the information when can this functionality move to a publicly available driver version.
Hi Maick,
I don't have an update about the nightly driver build from the developers yet but I can share with you further explanation about the previous fix. Maybe it can help you for the time being:
Enable-se-ptrs-operations option is pretty sensitive for this ISV model, which is false on MTL by default and is true on LNL/PTL.
For interpolation on MTL, there are several solutions currently. For example, by shaving kernel, mapping to dpu, sep interpolate. On MTL, with sep feature disabled, it will map such Interpolate to dpu, and it will be tiled into many ops. That is the reason why the compilation is hanging (Interpolate for a high resolution).
I also compared the dpu solution and sep solution, for single small Interpolate op (1x19x128x256 -> 1x19x512x1024), Sep solution latency is ~5.5 ms and dpu solution latency is ~14ms.
So whether from the perspective of compilation time or performance, it's more reasonable to keep this configuration on MTL.
Hi Maick,
The developers explained that nightly drivers are not allowed for external customers. In this situation you can either wait for an update of the publicly available driver or try the parameter workaround that I shared previously. Can I support you further?
Hi,
Thanks for your feedback and the technical explanation.
If I understood right, even if the new driver solve the compilation hanging, the sep solution will be the best and it will not improve the latency time ?
Thanks for your feedback.
Hi Maick,
actually with the combination of nightly driver and "NPU_COMPILATION_MODE_PARAMS enable-se-ptrs-operations=true" benchmark result is 5.62fps and SIT cosim similarity 100%, without previous issues. Still, I can't provide a timeframe for the public release of this driver. The last official release is currently UD18. Can we deescalate this issue for the time being?
