Solved: OpenVino GenAI LLM benchmark not able to use multiple B60 GPU for inference

J-N-ch · ‎07-22-2025

Hi, When I try to run a benchmark of OpenVINO GenAI with multiple Intel Arc B60 GPU, I got RuntimeError as below:

(B60_LLM_env) PS C:\Users\andre\B60_LLM_env\openvino.genai\tools\llm_bench> python ./benchmark.py -m models/Llama-3.2-3B-Instruct/openvino -n 3 -d "MULTI:GPU.2,GPU.1"
Multiple distributions found for package optimum. Picked distribution: optimum
[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: llama
[ INFO ] OV Config={'CACHE_DIR': '', 'ATTENTION_BACKEND': 'PA'}
[ INFO ] Model path=models\Llama-3.2-3B-Instruct\openvino, openvino runtime version: 2025.2.0-19140-c01cd93e24d-releases/2025/2
[ INFO ] Selected OpenVINO GenAI for benchmarking
[ ERROR ] An exception occurred
[ INFO ] Traceback (most recent call last):
File "C:\Users\andre\B60_LLM_env\openvino.genai\tools\llm_bench\benchmark.py", line 270, in main
iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case']](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\andre\B60_LLM_env\openvino.genai\tools\llm_bench\task\text_generation.py", line 528, in run_text_generation_benchmark
model, tokenizer, pretrain_time, bench_hook, use_genai = FW_UTILS[framework].create_text_gen_model(model_path, device, mem_consumption, **args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\andre\B60_LLM_env\openvino.genai\tools\llm_bench\llm_bench_utils\ov_utils.py", line 132, in create_text_gen_model
return create_genai_text_gen_model(model_path, device, ov_config, memory_monitor, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\andre\B60_LLM_env\openvino.genai\tools\llm_bench\llm_bench_utils\ov_utils.py", line 223, in create_genai_text_gen_model
llm_pipe = openvino_genai.LLMPipeline(model_path, device.upper(), **ov_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Check 'execution_devices.size() == 1' failed at C:\Jenkins\workspace\private-ci\ie\build-windows-vs2022\b\repos\openvino.genai\src\cpp\src\continuous_batching\pipeline_impl.cpp:107:
Contituous batching: execution device is expected to be CPU or GPU, but got 2 devices

(B60_LLM_env) PS C:\Users\andre\B60_LLM_env\openvino.genai\tools\llm_bench>

Zulkifli_Intel · ‎07-22-2025

Hi J-N-ch,

Thank you for reaching out.

Based on the error, the continuous batching in openvino_genai.LLMPipeline(…) is currently not supporting multi-device execution; it only supports a single execution device, either CPU or GPU (single).

Regards,

Zul

View solution in original post

Zulkifli_Intel · ‎07-22-2025

Hi J-N-ch,

Thank you for reaching out.

Based on the error, the continuous batching in openvino_genai.LLMPipeline(…) is currently not supporting multi-device execution; it only supports a single execution device, either CPU or GPU (single).

Regards,

Zul

Zulkifli_Intel · ‎07-24-2025

This thread will no longer be monitored since this issue has been resolved. If you need any additional information from Intel, please submit a new question.

OpenVino GenAI LLM benchmark not able to use multiple B60 GPU for inference

Benchmarking