- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Intel Experts!
I am currently testing out the chat_sample from `openvino_genai_windows_2025.0.0.0_x86_64` on the NPU. From https://github.com/openvinotoolkit/openvino.genai/tree/master?tab=readme-ov-file, I followed the instructions from
I downloaded the model from:
It works on both CPU and GPU, however when I tired on NPU by changing the following line to NPU:
I got the following error:
(base) PS C:\Users\yanny\Documents\Intel\OpenVINO\openvino_cpp_samples_build\intel64\Release> .\chat_sample.exe C:\llama_data\TinyLlama-1.1B-Chat-v1.0-int4-ov\
question:
1+1
Check 'stop_token_ids_it == stop_token_ids.end()' failed at src\cpp\src\generation_config.cpp:200:
'stop_token_ids' must be non-negative, but it contains a value -1
[ERROR] 16:39:41.925 [NPUZeroInitStructsHolder] zeContextDestroy failed 0X78000001
Here is my computer specs:
OS Name | Microsoft Windows 11 Pro |
Processor | Intel(R) Core(TM) Ultra 7 155H, 3800 Mhz, 16 Core(s), 22 Logical Processor(s) |
CPU RAM | 32GB |
GPU | Intel(R) Arc(TM) Graphics |
GPU RAM | 16GB |
NPU | Intel(R) AI Boost |
One API | 2025.0 |
OpenVINO | openvino_genai_windows_2025.0.0.0_x86_64 |
If you have any suggestion on what I can try, please let me know. Thank you very much!
Regards,
-yanny
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Yanny,
I also encountered the same error as you if directly use the converted model which downloaded from OpenVINO LLMs collection on Hugging Face.
To run the chat_sample.exe application with NPU, please use optimum Intel to convert the model.
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group-size 128 TinyLlama-1.1B-Chat-v1.0
chat_sample TinyLlama-1.1B-Chat-v1.0
Regards,
Peh
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Yanny,
I also encountered the same error as you if directly use the converted model which downloaded from OpenVINO LLMs collection on Hugging Face.
To run the chat_sample.exe application with NPU, please use optimum Intel to convert the model.
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --ratio 1.0 --group-size 128 TinyLlama-1.1B-Chat-v1.0
chat_sample TinyLlama-1.1B-Chat-v1.0
Regards,
Peh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peh,
Thank you so much for your help. I download the model via your command and it works on NPU now.
FYI, I also tested the command from https://github.com/openvinotoolkit/openvino.genai/tree/master?tab=readme-ov-file
optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --weight-format int4 --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"
That results in the following error:
Exception from src\inference\src\cpp\infer_request.cpp:223:
Exception from src\plugins\intel_npu\src\plugin\npuw\just_sync_infer_request.cpp:659:
Failed to compile. No more devices are left!
←[31m[ERROR] 10:37:37.876 [NPUZeroInitStructsHolder] zeContextDestroy failed 0X78000001←[0m
I would like to recommend adding your command to the readme for knowledge sharing.
Once again, thank very much!
Regards,
-yanny

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page