Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

OpenVION GenAI chat_sample on NPU

yanny
Novice
113 Views

Hello Intel Experts!

I am currently testing out the chat_sample from `openvino_genai_windows_2025.0.0.0_x86_64` on the NPU.  From https://github.com/openvinotoolkit/openvino.genai/tree/master?tab=readme-ov-file, I followed the instructions from 

I downloaded the model from:

 

It works on both CPU and GPU, however when I tired on NPU by changing the following line to NPU:

 

I got the following error:

 

(base) PS C:\Users\yanny\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release> .\chat_sample.exe C:\llama_data\TinyLlama-1.1B-Chat-v1.0-int4-ov\
question:
1+1
Check 'stop_token_ids_it == stop_token_ids.end()' failed at src\cpp\src\generation_config.cpp:200:
'stop_token_ids' must be non-negative, but it contains a value -1

[ERROR] 16:39:41.925 [NPUZeroInitStructsHolder] zeContextDestroy failed 0X78000001

 

 
Here is my computer specs:

OS NameMicrosoft Windows 11 Pro
ProcessorIntel(R) Core(TM) Ultra 7 155H, 3800 Mhz, 16 Core(s), 22 Logical Processor(s)
CPU RAM32GB
GPUIntel(R) Arc(TM) Graphics
GPU RAM16GB
NPUIntel(R) AI Boost

 

One API2025.0
OpenVINOopenvino_genai_windows_2025.0.0.0_x86_64


If you have any suggestion on what I can try, please let me know.  Thank you very much!

Regards,

-yanny

0 Kudos
1 Solution
Peh_Intel
Moderator
94 Views

Hi Yanny,


I also encountered the same error as you if directly use the converted model which downloaded from OpenVINO LLMs collection on Hugging Face.


To run the chat_sample.exe application with NPU, please use optimum Intel to convert the model.


optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group-size 128 TinyLlama-1.1B-Chat-v1.0

 

chat_sample TinyLlama-1.1B-Chat-v1.0

 

 

Regards,

Peh


View solution in original post

2 Replies
Peh_Intel
Moderator
95 Views

Hi Yanny,


I also encountered the same error as you if directly use the converted model which downloaded from OpenVINO LLMs collection on Hugging Face.


To run the chat_sample.exe application with NPU, please use optimum Intel to convert the model.


optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group-size 128 TinyLlama-1.1B-Chat-v1.0

 

chat_sample TinyLlama-1.1B-Chat-v1.0

 

 

Regards,

Peh


yanny
Novice
65 Views

Hi Peh,

Thank you so much for your help.  I download the model via your command and it works on NPU now.  

FYI, I also tested the command from https://github.com/openvinotoolkit/openvino.genai/tree/master?tab=readme-ov-file

optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format int4 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"

That results in the following error: 

Exception from src\inference\src\cpp\infer_request.cpp:223:
Exception from src\plugins\intel_npu\src\plugin\npuw\just_sync_infer_request.cpp:659:
Failed to compile. No more devices are left!


←[31m[ERROR] 10:37:37.876 [NPUZeroInitStructsHolder] zeContextDestroy failed 0X78000001←[0m

 I would like to recommend adding your command to the readme for knowledge sharing.  

Once again, thank very much!

Regards,

-yanny

0 Kudos
Reply