- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I want to run large LLM models using multiple Intel GPUs.
Is there a way to split a large model and run it on multiple GPUs?
Ref: https://huggingface.co/docs/transformers/v4.13.0/parallelism
Thank you
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi jic5760,
Thank you for reaching out.
To distribute inference across multiple GPUs, you can use the heterogeneous plugin in OpenVINO, which allows you to simultaneously leverage multiple inference devices (e.g., CPU, GPU, NPU) in one model. You can refer to pipeline-parallelism for multiple devices' execution in OpenVINO.
Regards,
Zul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page