Is parallelism possible in OpenVINO?

jic5760 · ‎05-09-2025

Hello,

I want to run large LLM models using multiple Intel GPUs.

Is there a way to split a large model and run it on multiple GPUs?

Ref: https://huggingface.co/docs/transformers/v4.13.0/parallelism

Thank you

Zulkifli_Intel · ‎05-09-2025

Hi jic5760,

Thank you for reaching out.

To distribute inference across multiple GPUs, you can use the heterogeneous plugin in OpenVINO, which allows you to simultaneously leverage multiple inference devices (e.g., CPU, GPU, NPU) in one model. You can refer to pipeline-parallelism for multiple devices' execution in OpenVINO.

Regards,

Zul

Zulkifli_Intel · ‎05-20-2025

This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question