Hi,
I'm trying to run a model on OpenVINO HETERO plugin, while manually assigning node affinities to force some layers to execute on the CPU.
The model in question is a transformer model from HuggingFace (Qwen3-8B). I have tried using the converted model from at OpenVINO/Qwen3-8B-int4-ov, as well as exporting the model using `optimum-cli`, but the issue persists with both.
My code is adapted from this section in the official documentation on "Using Manual and Automatic Modes in Combination":
DEVICE = "HETERO:GPU,CPU"
MODEL = "./Qwen3-8B-ov"
def set_affinities (model):
supported_ops = core.query_model (model, DEVICE)
for node in model.get_ops ():
fname = node.get_friendly_name ()
# target_ops is a set of node names that should be on the CPU
if fname in target_ops:
print (f"Pinning to CPU: {node}")
affinity = "CPU"
else:
affinity = supported_ops [fname]
node.get_rt_info ()["affinity"] = affinity
model = OVModelForCausalLM.from_pretrained (MODEL)
set_affinities (MODEL)
model.to (DEVICE)
model.compile ()
At runtime, I get this error:
Traceback (most recent call last):
File "/home/vedanth/Desktop/ov-inference.py", line 42, in <module>
model.compile ()
File "/home/vedanth/Desktop/models/venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_decoder.py", line 428, in compile
super().compile()
File "/home/vedanth/Desktop/models/venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 922, in compile
self.request = self._compile_model(self.model, self._device, ov_config, self.model_save_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vedanth/Desktop/models/venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 418, in _compile_model
compiled_model = core.compile_model(model, device.upper() if device is not None else device, config=ov_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vedanth/Desktop/models/venv/lib/python3.12/site-packages/openvino/_ov_api.py", line 646, in compile_model
super().compile_model(model, device_name, {} if config is None else config),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:110:
Exception from src/inference/src/dev/plugin.cpp:53:
Check 'unregistered_parameters.str().empty()' failed at src/core/src/model.cpp:58:
Model references undeclared parameters: opset1::Parameter input_ids () -> (i64[?,?])
Not sure where to start debugging the issue. Any leads will be appreciated. Alternatively, if there is a better way of pinning an operation to a device, I would be happy to try it.
OpenVINO version: 2026.0.0
連結已複製
Hi GlowingScrewdriver,
Below is the method for pinning an operation to a device using a model ID.
from optimum.intel.openvino import OVModelForCausalLM
import openvino
from transformers import AutoTokenizer
model_id="OpenVINO/qwen3-8b-int4-ov"
DEVICE = "HETERO:GPU,CPU"
def set_affinities(model, target_ops):
core = openvino.Core()
ov_model = model.model # Get the underlying OpenVINO model
# Query what operations are supported on each device
supported_ops = core.query_model(ov_model, DEVICE)
for node in ov_model.get_ops():
fname = node.get_friendly_name()
# Force specific operations to CPU
if fname in target_ops:
print(f"Pinning to CPU: {node}")
node.get_rt_info()["affinity"] = "CPU"
else:
affinity = supported_ops[fname]
node.get_rt_info()["affinity"] = affinity
#print(f"Assigning to GPU: {node}")
# Define operations you want to force to CPU
target_ops = {
# Example operation names
"Constant_100568",
"Constant_100569"
}
model = OVModelForCausalLM.from_pretrained(model_id)
set_affinities(model, target_ops)
model.to (DEVICE)
model.compile()
Regards,
Peh
Hello Peh,
Thank you for your quick response. Apologies for the typos in my code, which seem to have crept in since I adapted the snippet from a larger piece of code.
What you suggested is indeed what I had tried, i.e. getting default affinities using `core.query_model()`, selecting nodes using their names, and overriding affinities per-node by assigning to `node.get_rt_info()["affinity"]`.
My issue is this: when I pin some nodes to the CPU and try to compile the model, I run into an error about undeclared parameters in the model. The issue does not arise if I do `node.get_rt_info ["affinity"] = "GPU"` or `node.get_rt_info ["affinity"] = "CPU"` for all nodes; it only occurs when I selectively assign some nodes to CPU and the rest to GPU.
Thank you,
Vedanth