LLMPipeline Support for Non-Optimum Models

pranav_k · ‎02-10-2026

I was attempting to run LLM decoding on the NPU for models that aren't present on HF's hub, specifically for LLMs that have been structurally pruned. Can't seem to find much support for piping them into the LLMPipeline, and compiling them through Optimum is difficult when they don't have uniform widths.

Was wondering if any forks or fixes exist that have attempted to deal with this already, and if yes, what stopped them from being incorporated in the main.

Wan_Intel · ‎02-10-2026

Hi pranav_k,

Thank you for reaching out to us.

For your information, models with similar architectures as listed in Supported Models in OpenVINO™ GenAI may also work successfully even if not explicitly validated. Please consider testing any unlisted model to verify compatibility with your specific use case.

For more information, please refer to OpenVINO™ GenAI GitHub repository.

Best regards,

Wan

Wan_Intel · ‎02-22-2026

Hi pranav_k,

If you need any additional information, please submit a new question as this thread will no longer be monitored.

Regards,

Wan_Intel