- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to run a model on NPU by converting the model to onnx and then compiling it using openvino in python. I was able to compile it for the CPU, GPU, and NPU however, when it comes to running it on the NPU after processing a few prompts I get the following error:
Exception from src\plugins\intel_npu\src\utils\src\zero\zero_wrappers.cpp:159:
L0 zeCommandQueueExecuteCommandLists result: ZE_RESULT_ERROR_DEVICE_LOST, code 0x70000001 - device hung, reset, was removed, or driver update occurre
I am using:
openvino 2025.0.0
npu driver 32.0.100.3717
intel arc driver 32.0.101.6314
on
Core Ultra 9 288V - devkit/box(?)
I had to downgrade the graphics drivers as after the update running models on NPU would cause Windows to either bluescreen or freeze permanently.
What could be causing the issue?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi maciej_procero,
Please try to upgrade NPU driver to the latest version, 32.0.200.3717.
Regarding Graphics driver, you can try with the latest version of WHQL Certified Graphics Driver, 32.0.101.6559.
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for speedy reply Peh,
I already installed the latest version of the NPU driver though note that the there is a version discrepancy on the page you provided: The selected latest version is 32.0.200.3717 but all the text as well as the file you end up getting are 32.0.100.3717.
I did notice in the release notes that the NPU Release_Notes_v3717.pdf mention that the driver is compatible with OpenVINO 2024.5. Does that mean I should be using this version of OpenVINO?
Thank you for the graphics driver suggestion, I will give that a shot today.
Thank you,
Maciej
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Maciej,
How was your graphics driver update?
The latest NPU driver should be compatible with OpenVINO 2025.0 as well. I was able to run Benchmark App OpenVINO 2025.0 on NPU with a dynamic model.
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Peh,
I have some updates:
- I updated the graphics drivers (there was one at the end of the week) and I confirmed that NPU is still up to date
- I reinstalled the oneAPI package and all of its components
Right now running on the GPU is more stable. Yesterday I was running the model without any crashes, today I tried to run it once and the computer bluescreened again.
My issue with the NPU persists, however. When running a for loop where I process images one by one, there is a high chance that the process will end with the ZE_RESULT_ERROR_DEVICE_LOST error.
Do you have any further suggestions?
I filed an issue on github:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Maciej,
Have you try to reduce the image size to 256 and observe the stability in inferencing?
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tired reducing the image size and the NPU is much more stable as is the GPU. What does this mean?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Maciej,
This is because smaller images require less data to process, which reduces the computational load on the NPU as well as GPU. If you increase the image size, the inferencing will crash more often or totally crashed.
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the explanation,
Now I am curious to know how big of a model can be run a on the NPU?
How do the number of operations crash the NPU?
Why does the NPU only crash some of the time or after a couple of executions (as opposed to the immediate crash)?
Is there a way to get logs of the NPU to see what exactly happens during the execution?
Also, are there any other settings that I could change for compilation that would make the model more stable?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Maciej,
Let me check with the development team and get back to you with the precise information.
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Maciej,
Thanks for the updates in the GitHub thread.
Since the GitHub thread has been closed, this thread will no longer be monitored. If you need any additional information from Intel, please submit a new question.
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
My project OpenArc has others also interested in getting NPU working with OpenVINO on our Discord, which is linked in the repo. I have converted a model following the documentation, and have some test code which uses OpenVINO GenAI.
https://huggingface.co/Echo9Zulu/Hermes-3-Llama-3.2-3B-int4-awq-se-ns-NPU-ov
```
import openvino_genai as ov_genai
model_dir = ""
pipe = ov_genai.LLMPipeline(
model_dir,
device="NPU",
)
generation_config = ov_genai.GenerationConfig(
max_new_tokens=128
)
prompt = "it's proompt time"
result = pipe.generate([prompt], generation_config=generation_config)
perf_metrics = result.perf_metrics
print(f'Load time: {perf_metrics.get_load_time() / 1000:.2f} s')
print(f'TTFT: {perf_metrics.get_ttft().mean / 1000:.2f} seconds')
print(f'TPOT: {perf_metrics.get_tpot().mean:.2f} ms/token')
print(f'Throughput: {perf_metrics.get_throughput().mean:.2f} tokens/s')
print(f'Generate duration: {perf_metrics.get_generate_duration().mean / 1000:.2f} seconds')
print(f"Result: {result}")
```
I think you may need to add the pipeline_config lines from the docs on cache usage to address the issue you are having. Notice the stateful disabled requirment; this might mean that NPU devices handle KV cache quantization differently as is with multi-gpu, the other usecase for setting stateful disabled.
https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai/inference-with-genai-on-npu.html#cache-compiled-models
Also, converting to ONNX as an intermediate step before converting to OpenVINO IR is generally not necessary and in this case might be a source of your issue?. There was a PR for this last year (I think). Either way, you can go directly from torch to the IR format. Hope this helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! I will check it out!
I think I tried converting the model directly before but I think there were some issues, I will try again

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page