Unable to use HETERO Plugin with manual affinity

GlowingScrewdriver · ‎05-04-2026

Hi,

I'm trying to run a model on OpenVINO HETERO plugin, while manually assigning node affinities to force some layers to execute on the CPU.

The model in question is a transformer model from HuggingFace (Qwen3-8B). I have tried using the converted model from at OpenVINO/Qwen3-8B-int4-ov, as well as exporting the model using `optimum-cli`, but the issue persists with both.

My code is adapted from this section in the official documentation on "Using Manual and Automatic Modes in Combination":

DEVICE = "HETERO:GPU,CPU"
MODEL = "./Qwen3-8B-ov"

def set_affinities (model):
    supported_ops = core.query_model (model, DEVICE)
    for node in model.get_ops ():
        fname = node.get_friendly_name ()
        # target_ops is a set of node names that should be on the CPU
        if fname in target_ops:
            print (f"Pinning to CPU: {node}")
            affinity = "CPU"
        else:
            affinity = supported_ops [fname]
        node.get_rt_info ()["affinity"] = affinity

model = OVModelForCausalLM.from_pretrained (MODEL)
set_affinities (MODEL)
model.to (DEVICE)
model.compile ()

At runtime, I get this error:

Traceback (most recent call last):
  File "/home/vedanth/Desktop/ov-inference.py", line 42, in <module>
    model.compile ()
  File "/home/vedanth/Desktop/models/venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_decoder.py", line 428, in compile
    super().compile()
  File "/home/vedanth/Desktop/models/venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 922, in compile
    self.request = self._compile_model(self.model, self._device, ov_config, self.model_save_dir)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vedanth/Desktop/models/venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 418, in _compile_model
    compiled_model = core.compile_model(model, device.upper() if device is not None else device, config=ov_config)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vedanth/Desktop/models/venv/lib/python3.12/site-packages/openvino/_ov_api.py", line 646, in compile_model
    super().compile_model(model, device_name, {} if config is None else config),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:110:
Exception from src/inference/src/dev/plugin.cpp:53:
Check 'unregistered_parameters.str().empty()' failed at src/core/src/model.cpp:58:
Model references undeclared parameters: opset1::Parameter input_ids () -> (i64[?,?])

Not sure where to start debugging the issue. Any leads will be appreciated. Alternatively, if there is a better way of pinning an operation to a device, I would be happy to try it.

OpenVINO version: 2026.0.0

Peh_Intel · ‎05-04-2026

Hi GlowingScrewdriver,

Below is the method for pinning an operation to a device using a model ID.

from optimum.intel.openvino import OVModelForCausalLM
import openvino
from transformers import AutoTokenizer

model_id="OpenVINO/qwen3-8b-int4-ov" 
DEVICE = "HETERO:GPU,CPU"

def set_affinities(model, target_ops): 
  core = openvino.Core()
  ov_model = model.model # Get the underlying OpenVINO model   
  # Query what operations are supported on each device
  supported_ops = core.query_model(ov_model, DEVICE)
  for node in ov_model.get_ops():
    fname = node.get_friendly_name()
    # Force specific operations to CPU
    if fname in target_ops:
      print(f"Pinning to CPU: {node}")
      node.get_rt_info()["affinity"] = "CPU"
    else:
      affinity = supported_ops[fname]
      node.get_rt_info()["affinity"] = affinity
      #print(f"Assigning to GPU: {node}")

# Define operations you want to force to CPU
target_ops = {
  # Example operation names
  "Constant_100568",
  "Constant_100569"
}

model = OVModelForCausalLM.from_pretrained(model_id)
set_affinities(model, target_ops)
model.to (DEVICE)
model.compile()

Regards,

Peh

GlowingScrewdriver · ‎05-04-2026

Hello Peh,

Thank you for your quick response. Apologies for the typos in my code, which seem to have crept in since I adapted the snippet from a larger piece of code.

What you suggested is indeed what I had tried, i.e. getting default affinities using `core.query_model()`, selecting nodes using their names, and overriding affinities per-node by assigning to `node.get_rt_info()["affinity"]`.

My issue is this: when I pin some nodes to the CPU and try to compile the model, I run into an error about undeclared parameters in the model. The issue does not arise if I do `node.get_rt_info ["affinity"] = "GPU"` or `node.get_rt_info ["affinity"] = "CPU"` for all nodes; it only occurs when I selectively assign some nodes to CPU and the rest to GPU.

Thank you,

Vedanth

Peh_Intel · ‎05-05-2026

Hi Vedanth,

Could you please provide your complete code for further investigation on this selective node affinity assignment?

Regards,

Peh