Re: Meteor Lake NPU Token Limitations

wrangleGrit · ‎03-18-2026

Hi All,

Hope this is the right place to post this question. I am working on a generative AI project on an Intel NPU contained in the Core Ultra 9 288v CPU.

Using the latest OpenVINO toolkit 2026.0 on python 3.14.3

$ pip list | grep -Ei openvino
openvino            2026.0.0     20965
openvino-genai      2026.0.0.0   2050
openvino-telemetry  2025.2.0
openvino-tokenizers 2026.0.0.0
$ python --version
Python 3.14.3

Creating a genai pipeline with way more context than is supported just to be sure the defaults are not causing limitation issues. However, these limitations do not seem to be honored.

def pipe_config():
    """Configure the OpenVINO GenAI pipeline."""
    model_path = hfsd(
        repo_id="OpenVINO/Qwen2.5-Coder-3B-Instruct-fp16-ov",
        local_files_only=True
    )
    # Use NPU by default
    return openvino_genai.LLMPipeline(model_path, device="NPU", max_length=32768, min_new_tokens=16386, ignore_eos=True)

Estimating token usage as follows

# Attempt to get tokenizer and rough token count of recent history
tokenizer = pipe.get_tokenizer()
last_prompt_tokens = tokenizer.encode(prompt).input_ids.size
last_response_tokens = tokenizer.encode(response).input_ids.size
tokens_used += last_prompt_tokens + last_response_tokens

The response is always limited to around 1124 tokens by usage estimation. Corroborated by the cutoff output regardless of how I configure openvino_genai.LLMPipeline

[≈ 1124 tokens used in recent context]

If I switch devices from NPU to GPU while taking the defaults, the context limitations disappear and full output is generated.

return openvino_genai.LLMPipeline(model_path, device="GPU")

All I'm asking the genai to do is take its own chat.py script and add a helpful startup message. On GPU, this task completes with around ~2100 tokens used.

/load chat.py please add a helpful startup message to this script

### Key Changes:
- **Startup Message**: A welcome message is added at the beginning of the script to guide users on how to interact with the chatbot.
- **Helpful Prompts**: Additional prompts are provided to help users understand how to use the chatbot effectively.
  [≈ 2091 tokens used in recent context]

My question is, what am I doing incorrectly with the NPU?

wrangleGrit · ‎03-18-2026

Sorry, I mean Lunar Lake