Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6571 Discussions

Ultra 9 288V NPU error when running a model

maciej_procero
Beginner
2,584 Views

Hello,

I am trying to run a model on NPU by converting the model to onnx and then compiling it using openvino in python. I was able to compile it for the CPU, GPU, and NPU however, when it comes to running it on the NPU after processing a few prompts I get the following error: 

Exception from src\plugins\intel_npu\src\utils\src\zero\zero_wrappers.cpp:159:
L0 zeCommandQueueExecuteCommandLists result: ZE_RESULT_ERROR_DEVICE_LOST, code 0x70000001 - device hung, reset, was removed, or driver update occurre

 

I am using: 

openvino 2025.0.0

npu driver 32.0.100.3717

intel arc driver 32.0.101.6314

on 

Core Ultra 9 288V - devkit/box(?)

 

I had to downgrade the graphics drivers as after the update running models on NPU would cause Windows to either bluescreen or freeze permanently. 

 

What could be causing the issue?

0 Kudos
12 Replies
Peh_Intel
Moderator
2,528 Views

Hi maciej_procero,


Please try to upgrade NPU driver to the latest version, 32.0.200.3717.


Regarding Graphics driver, you can try with the latest version of WHQL Certified Graphics Driver, 32.0.101.6559.



Regards,

Peh


0 Kudos
maciej_procero
Beginner
2,500 Views

Thank you for speedy reply Peh, 

I already installed the latest version of the NPU driver though note that the there is a version discrepancy on the page you provided: The selected latest version is  32.0.200.3717 but all the text as well as the file you end up getting are  32.0.100.3717.

I did notice in the release notes that the NPU Release_Notes_v3717.pdf mention that the driver is compatible with OpenVINO 2024.5. Does that mean I should be using this version of OpenVINO?

Thank you for the graphics driver suggestion, I will give that a shot today.

 

Thank you,

Maciej

0 Kudos
Peh_Intel
Moderator
2,421 Views

Hi Maciej,


How was your graphics driver update?


The latest NPU driver should be compatible with OpenVINO 2025.0 as well. I was able to run Benchmark App OpenVINO 2025.0 on NPU with a dynamic model.



Regards,

Peh


0 Kudos
maciej_procero
Beginner
2,395 Views

Hello Peh, 

I have some updates:

  1. I updated the graphics drivers (there was one at the end of the week) and I confirmed that NPU is still up to date 
  2. I reinstalled the oneAPI package and all of its components 

Right now running on the GPU is more stable. Yesterday I was running the model without any crashes, today I tried to run it once and the computer bluescreened again. 

My issue with the NPU persists, however. When running a for loop where I process images one by one, there is a high chance that the process will end with the ZE_RESULT_ERROR_DEVICE_LOST error. 

 

Do you have any further suggestions?

I filed an issue on github:

[Bug]: ZE_RESULT_ERROR_DEVICE_LOST when running inference on the NPU · Issue #29386 · openvinotoolkit/openvino

0 Kudos
Peh_Intel
Moderator
2,359 Views

Hi Maciej,


Have you try to reduce the image size to 256 and observe the stability in inferencing?



Regards,

Peh


0 Kudos
maciej_procero
Beginner
2,325 Views

I tired reducing the image size and the NPU is much more stable as is the GPU. What does this mean? 

0 Kudos
Peh_Intel
Moderator
2,309 Views

Hi Maciej,


This is because smaller images require less data to process, which reduces the computational load on the NPU as well as GPU. If you increase the image size, the inferencing will crash more often or totally crashed. 



Regards,

Peh


0 Kudos
maciej_procero
Beginner
2,290 Views

Thank you for the explanation,

Now I am curious to know how big of a model can be run a on the NPU?
How do the number of operations crash the NPU?
Why does the NPU only crash some of the time or after a couple of executions (as opposed to the immediate crash)?
Is there a way to get logs of the NPU to see what exactly happens during the execution? 

Also, are there any other settings that I could change for compilation that would make the model more stable? 

0 Kudos
Peh_Intel
Moderator
2,220 Views

Hi Maciej,


Let me check with the development team and get back to you with the precise information.



Regards,

Peh


0 Kudos
Peh_Intel
Moderator
1,917 Views

Hi Maciej,


Thanks for the updates in the GitHub thread.


Since the GitHub thread has been closed, this thread will no longer be monitored. If you need any additional information from Intel, please submit a new question.



Regards,

Peh


0 Kudos
Echo9Zulu
Beginner
1,882 Views

Hello!

My project OpenArc has others also interested in getting NPU working with OpenVINO on our Discord, which is linked in the repo. I have converted a model following the documentation, and have some test code which uses OpenVINO GenAI.

https://huggingface.co/Echo9Zulu/Hermes-3-Llama-3.2-3B-int4-awq-se-ns-NPU-ov

 

 

```

import openvino_genai as ov_genai

 

model_dir = ""

pipe = ov_genai.LLMPipeline(
model_dir,
device="NPU",
)

generation_config = ov_genai.GenerationConfig(
max_new_tokens=128
)

prompt = "it's proompt time"

result = pipe.generate([prompt], generation_config=generation_config)
perf_metrics = result.perf_metrics

print(f'Load time: {perf_metrics.get_load_time() / 1000:.2f} s')
print(f'TTFT: {perf_metrics.get_ttft().mean / 1000:.2f} seconds')
print(f'TPOT: {perf_metrics.get_tpot().mean:.2f} ms/token')
print(f'Throughput: {perf_metrics.get_throughput().mean:.2f} tokens/s')
print(f'Generate duration: {perf_metrics.get_generate_duration().mean / 1000:.2f} seconds')

print(f"Result: {result}")

```

 

I think you may need to add the pipeline_config lines from the docs on cache usage to address the issue you are having. Notice the stateful disabled requirment; this might mean that NPU devices handle KV cache quantization differently as is with multi-gpu, the other usecase for setting stateful disabled.

https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai/inference-with-genai-on-npu.html#cache-compiled-models

Also, converting to ONNX as an intermediate step before converting to OpenVINO IR is generally not necessary and in this case might be a source of your issue?. There was a PR for this last year (I think). Either way, you can go directly from torch to the IR format. Hope this helps

0 Kudos
maciej_procero
Beginner
1,856 Views

Thank you! I will check it out! 

I think I tried converting the model directly before but I think there were some issues, I will try again 

0 Kudos
Reply