Meteor Lake NPU Token Limitations

wrangleGrit — Wed, 18 Mar 2026 08:53:23 GMT

Hi All,

Hope this is the right place to post this question. I am working on a generative AI project on an Intel NPU contained in the Core Ultra 9 288v CPU.

Using the latest OpenVINO toolkit 2026.0 on python 3.14.3

$ pip list | grep -Ei openvino openvino 2026.0.0 20965 openvino-genai 2026.0.0.0 2050 openvino-telemetry 2025.2.0 openvino-tokenizers 2026.0.0.0 $ python --version Python 3.14.3

Creating a genai pipeline with way more context than is supported just to be sure the defaults are not causing limitation issues. However, these limitations do not seem to be honored.

def pipe_config(): """Configure the OpenVINO GenAI pipeline.""" model_path = hfsd( repo_id="OpenVINO/Qwen2.5-Coder-3B-Instruct-fp16-ov", local_files_only=True ) # Use NPU by default return openvino_genai.LLMPipeline(model_path, device="NPU", max_length=32768, min_new_tokens=16386, ignore_eos=True)

Estimating token usage as follows

# Attempt to get tokenizer and rough token count of recent history tokenizer = pipe.get_tokenizer() last_prompt_tokens = tokenizer.encode(prompt).input_ids.size last_response_tokens = tokenizer.encode(response).input_ids.size tokens_used += last_prompt_tokens + last_response_tokens

The response is always limited to around 1124 tokens by usage estimation. Corroborated by the cutoff output regardless of how I configure openvino_genai.LLMPipeline

[≈ 1124 tokens used in recent context]

If I switch devices from NPU to GPU while taking the defaults, the context limitations disappear and full output is generated.

return openvino_genai.LLMPipeline(model_path, device="GPU")

All I'm asking the genai to do is take its own chat.py script and add a helpful startup message. On GPU, this task completes with around ~2100 tokens used.

/load chat.py please add a helpful startup message to this script ### Key Changes: - **Startup Message**: A welcome message is added at the beginning of the script to guide users on how to interact with the chatbot. - **Helpful Prompts**: Additional prompts are provided to help users understand how to use the chatbot effectively. [≈ 2091 tokens used in recent context]

My question is, what am I doing incorrectly with the NPU?

Re: Meteor Lake NPU Token Limitations

wrangleGrit — Wed, 18 Mar 2026 09:10:54 GMT

Sorry, I mean Lunar Lake

topic Re: Meteor Lake NPU Token Limitations in AI Tools from Intel

Meteor Lake NPU Token Limitations

Re: Meteor Lake NPU Token Limitations