<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Meteor Lake NPU Token Limitations in AI Tools from Intel</title>
    <link>https://community.intel.com/t5/AI-Tools-from-Intel/Meteor-Lake-NPU-Token-Limitations/m-p/1741276#M1112</link>
    <description>&lt;P&gt;Sorry, I mean Lunar Lake&lt;/P&gt;</description>
    <pubDate>Wed, 18 Mar 2026 09:10:54 GMT</pubDate>
    <dc:creator>wrangleGrit</dc:creator>
    <dc:date>2026-03-18T09:10:54Z</dc:date>
    <item>
      <title>Meteor Lake NPU Token Limitations</title>
      <link>https://community.intel.com/t5/AI-Tools-from-Intel/Meteor-Lake-NPU-Token-Limitations/m-p/1741274#M1111</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;Hope this is the right place to post this question. I am working on a generative AI project on an Intel NPU contained in the Core Ultra 9 288v CPU.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Using the latest OpenVINO toolkit 2026.0 on python 3.14.3&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;$ pip list | grep -Ei openvino
openvino            2026.0.0     20965
openvino-genai      2026.0.0.0   2050
openvino-telemetry  2025.2.0
openvino-tokenizers 2026.0.0.0
$ python --version
Python 3.14.3&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;BR /&gt;Creating a genai pipeline with way more context than is supported just to be sure the defaults are not causing limitation issues. However, these limitations do not seem to be honored.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;def pipe_config():
    """Configure the OpenVINO GenAI pipeline."""
    model_path = hfsd(
        repo_id="OpenVINO/Qwen2.5-Coder-3B-Instruct-fp16-ov",
        local_files_only=True
    )
    # Use NPU by default
    return openvino_genai.LLMPipeline(model_path, device="NPU", max_length=32768, min_new_tokens=16386, ignore_eos=True)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Estimating token usage as follows&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;# Attempt to get tokenizer and rough token count of recent history
tokenizer = pipe.get_tokenizer()
last_prompt_tokens = tokenizer.encode(prompt).input_ids.size
last_response_tokens = tokenizer.encode(response).input_ids.size
tokens_used += last_prompt_tokens + last_response_tokens&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The response is always limited to around 1124 tokens by usage estimation. Corroborated by the cutoff output regardless of how I configure &lt;STRONG&gt;&lt;EM&gt;openvino_genai.LLMPipeline&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[≈ 1124 tokens used in recent context]&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;If I switch devices from NPU to GPU while taking the defaults, the context limitations disappear and full output is generated.&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;return openvino_genai.LLMPipeline(model_path, device="GPU")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All I'm asking the genai to do is take its own chat.py script and add a helpful startup message. On GPU, this task completes with around ~2100 tokens used.&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;/load chat.py please add a helpful startup message to this script

### Key Changes:
- **Startup Message**: A welcome message is added at the beginning of the script to guide users on how to interact with the chatbot.
- **Helpful Prompts**: Additional prompts are provided to help users understand how to use the chatbot effectively.
  [≈ 2091 tokens used in recent context]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;BR /&gt;My question is, what am I doing incorrectly with the NPU?&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2026 08:53:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/AI-Tools-from-Intel/Meteor-Lake-NPU-Token-Limitations/m-p/1741274#M1111</guid>
      <dc:creator>wrangleGrit</dc:creator>
      <dc:date>2026-03-18T08:53:23Z</dc:date>
    </item>
    <item>
      <title>Re: Meteor Lake NPU Token Limitations</title>
      <link>https://community.intel.com/t5/AI-Tools-from-Intel/Meteor-Lake-NPU-Token-Limitations/m-p/1741276#M1112</link>
      <description>&lt;P&gt;Sorry, I mean Lunar Lake&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2026 09:10:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/AI-Tools-from-Intel/Meteor-Lake-NPU-Token-Limitations/m-p/1741276#M1112</guid>
      <dc:creator>wrangleGrit</dc:creator>
      <dc:date>2026-03-18T09:10:54Z</dc:date>
    </item>
  </channel>
</rss>

