Solved: OpenVION GenAI chat_sample on NPU

yanny · ‎03-07-2025

Hello Intel Experts!

I am currently testing out the chat_sample from `openvino_genai_windows_2025.0.0.0_x86_64` on the NPU. From https://github.com/openvinotoolkit/openvino.genai/tree/master?tab=readme-ov-file, I followed the instructions from

How to Build OpenVINO™ GenAI APP in C++

I downloaded the model from:

https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-int4-ov/tree/main

It works on both CPU and GPU, however when I tired on NPU by changing the following line to NPU:

https://github.com/openvinotoolkit/openvino.genai/blob/releases/2025/0/samples/cpp/text_generation/chat_sample.cpp#L13

I got the following error:

(base) PS C:\Users\yanny\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release> .\chat_sample.exe C:\llama_data\TinyLlama-1.1B-Chat-v1.0-int4-ov\
question:
1+1
Check 'stop_token_ids_it == stop_token_ids.end()' failed at src\cpp\src\generation_config.cpp:200:
'stop_token_ids' must be non-negative, but it contains a value -1

[ERROR] 16:39:41.925 [NPUZeroInitStructsHolder] zeContextDestroy failed 0X78000001

Here is my computer specs:

OS Name	Microsoft Windows 11 Pro
Processor	Intel(R) Core(TM) Ultra 7 155H, 3800 Mhz, 16 Core(s), 22 Logical Processor(s)
CPU RAM	32GB
GPU	Intel(R) Arc(TM) Graphics
GPU RAM	16GB
NPU	Intel(R) AI Boost

One API	2025.0
OpenVINO	openvino_genai_windows_2025.0.0.0_x86_64

If you have any suggestion on what I can try, please let me know. Thank you very much!

Regards,

-yanny

Peh_Intel · ‎03-12-2025

Hi Yanny,

I also encountered the same error as you if directly use the converted model which downloaded from OpenVINO LLMs collection on Hugging Face.

To run the chat_sample.exe application with NPU, please use optimum Intel to convert the model.

optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group-size 128 TinyLlama-1.1B-Chat-v1.0

chat_sample TinyLlama-1.1B-Chat-v1.0

Regards,

Peh

View solution in original post

Peh_Intel · ‎03-12-2025

Hi Yanny,

I also encountered the same error as you if directly use the converted model which downloaded from OpenVINO LLMs collection on Hugging Face.

To run the chat_sample.exe application with NPU, please use optimum Intel to convert the model.

optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group-size 128 TinyLlama-1.1B-Chat-v1.0

chat_sample TinyLlama-1.1B-Chat-v1.0

Regards,

Peh

yanny · ‎03-13-2025

Hi Peh,

Thank you so much for your help. I download the model via your command and it works on NPU now.

FYI, I also tested the command from https://github.com/openvinotoolkit/openvino.genai/tree/master?tab=readme-ov-file

optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format int4 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"

That results in the following error:

Exception from src\inference\src\cpp\infer_request.cpp:223:
Exception from src\plugins\intel_npu\src\plugin\npuw\just_sync_infer_request.cpp:659:
Failed to compile. No more devices are left!


←[31m[ERROR] 10:37:37.876 [NPUZeroInitStructsHolder] zeContextDestroy failed 0X78000001←[0m

I would like to recommend adding your command to the readme for knowledge sharing.

Once again, thank very much!

Regards,

-yanny

Peh_Intel · ‎03-17-2025

Hi Yanny,

This thread will no longer be monitored since this issue has been resolved. If you need any additional information from Intel, please submit a new question.

Regards,

Peh