- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I want to run large LLM models using multiple Intel GPUs.
Is there a way to split a large model and run it on multiple GPUs?
Ref: https://huggingface.co/docs/transformers/v4.13.0/parallelism
Thank you
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi jic5760,
Thank you for reaching out.
To distribute inference across multiple GPUs, you can use the heterogeneous plugin in OpenVINO, which allows you to simultaneously leverage multiple inference devices (e.g., CPU, GPU, NPU) in one model. You can refer to pipeline-parallelism for multiple devices' execution in OpenVINO.
Regards,
Zul

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page