- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was attempting to run LLM decoding on the NPU for models that aren't present on HF's hub, specifically for LLMs that have been structurally pruned. Can't seem to find much support for piping them into the LLMPipeline, and compiling them through Optimum is difficult when they don't have uniform widths.
Was wondering if any forks or fixes exist that have attempted to deal with this already, and if yes, what stopped them from being incorporated in the main.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi pranav_k,
Thank you for reaching out to us.
For your information, models with similar architectures as listed in Supported Models in OpenVINO™ GenAI may also work successfully even if not explicitly validated. Please consider testing any unlisted model to verify compatibility with your specific use case.
For more information, please refer to OpenVINO™ GenAI GitHub repository.
Best regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi pranav_k,
If you need any additional information, please submit a new question as this thread will no longer be monitored.
Regards,
Wan_Intel
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page