Re: Huggingface model can't run on Ultra9 npu

MansonHua · ‎05-24-2024

When i try to run text classification model on Ultra 9 npu, it reported: get_shape was called on a descriptor::Tensor with dynamic shape

My code:

from optimum.intel.openvino import OVModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

model_id = "distilbert-base-uncased-finetuned-sst-2-english"
hf_model = OVModelForSequenceClassification.from_pretrained(
    model_id, from_transformers=True)
hf_model.to('npu')
tokenizer = AutoTokenizer.from_pretrained(model_id)

hf_pipe_cls = pipeline("text-classification",
                       model=hf_model, tokenizer=tokenizer)
text = "He's a dreadful magician."
fp32_outputs = hf_pipe_cls(text)
print("FP32 model outputs: ", fp32_outputs)

The full Error content:

---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[4], line 13
10 hf_pipe_cls = pipeline("text-classification",
11 model=hf_model, tokenizer=tokenizer)
12 text = "He's a dreadful magician."
---> 13 fp32_outputs = hf_pipe_cls(text)
14 print("FP32 model outputs: ", fp32_outputs)

File ~\miniconda3\envs\npu-infer\Lib\site-packages\transformers\pipelines\text_classification.py:156, in TextClassificationPipeline.__call__(self, inputs, **kwargs)
122 """
123 Classify the text(s) given as inputs.
124
(...)
153 If `top_k` is used, one such dictionary is returned per label.
154 """
155 inputs = (inputs,)
--> 156 result = super().__call__(*inputs, **kwargs)
157 # TODO try and retrieve it in a nicer way from _sanitize_parameters.
158 _legacy = "top_k" not in kwargs

File ~\miniconda3\envs\npu-infer\Lib\site-packages\transformers\pipelines\base.py:1242, in Pipeline.__call__(self, inputs, num_workers, batch_size, *args, **kwargs)
1234 return next(
1235 iter(
1236 self.get_iterator(
(...)
1239 )
1240 )
1241 else:
-> 1242 return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)

File ~\miniconda3\envs\npu-infer\Lib\site-packages\transformers\pipelines\base.py:1249, in Pipeline.run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
1247 def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
1248 model_inputs = self.preprocess(inputs, **preprocess_params)
-> 1249 model_outputs = self.forward(model_inputs, **forward_params)
1250 outputs = self.postprocess(model_outputs, **postprocess_params)
1251 return outputs

File ~\miniconda3\envs\npu-infer\Lib\site-packages\transformers\pipelines\base.py:1149, in Pipeline.forward(self, model_inputs, **forward_params)
1147 with inference_context():
1148 model_inputs = self._ensure_tensor_on_device(model_inputs, device=self.device)
-> 1149 model_outputs = self._forward(model_inputs, **forward_params)
1150 model_outputs = self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))
1151 else:

File ~\miniconda3\envs\npu-infer\Lib\site-packages\transformers\pipelines\text_classification.py:187, in TextClassificationPipeline._forward(self, model_inputs)
185 if "use_cache" in inspect.signature(model_forward).parameters.keys():
186 model_inputs["use_cache"] = False
--> 187 return self.model(**model_inputs)

File ~\miniconda3\envs\npu-infer\Lib\site-packages\optimum\modeling_base.py:92, in OptimizedModel.__call__(self, *args, **kwargs)
91 def __call__(self, *args, **kwargs):
---> 92 return self.forward(*args, **kwargs)

File ~\miniconda3\envs\npu-infer\Lib\site-packages\optimum\intel\openvino\modeling.py:190, in OVModelForSequenceClassification.forward(self, input_ids, attention_mask, token_type_ids, **kwargs)
175 @ADD_start_docstrings_to_model_forward(
176 INPUTS_DOCSTRING.format("batch_size, sequence_length")
177 + SEQUENCE_CLASSIFICATION_EXAMPLE.format(
(...)
188 **kwargs,
189
--> 190 self.compile()
192 np_inputs = isinstance(input_ids, np.ndarray)
193 if not np_inputs:

File ~\miniconda3\envs\npu-infer\Lib\site-packages\optimum\intel\openvino\modeling_base.py:400, in OVBaseModel.compile(self)
398 ov_config["CACHE_DIR"] = str(cache_dir)
399 logger.info(f"Setting OpenVINO CACHE_DIR to {str(cache_dir)}")
--> 400 self.request = core.compile_model(self.model, self._device, ov_config)
401 # OPENVINO_LOG_LEVEL can be found in https://docs.openvino.ai/2023.2/openvino_docs_OV_UG_supported_plugins_AUTO_debugging.html
402 if "OPENVINO_LOG_LEVEL" in os.environ and int(os.environ["OPENVINO_LOG_LEVEL"]) > 2:

File ~\miniconda3\envs\npu-infer\Lib\site-packages\openvino\runtime\ie_api.py:521, in Core.compile_model(self, model, device_name, config, weights)
516 if device_name is None:
517 return CompiledModel(
518 super().compile_model(model, {} if config is None else config),
519 )
520 return CompiledModel(
--> 521 super().compile_model(model, device_name, {} if config is None else config),
522 )
523 else:
524 if device_name is None:

RuntimeError: Exception from src\inference\src\cpp\core.cpp:109:
Exception from src\inference\src\dev\plugin.cpp:54:
Exception from src\plugins\intel_npu\src\plugin\src\plugin.cpp:513:
get_shape was called on a descriptor::Tensor with dynamic shape

my openvino version is 2024.1

my torch version is 2.3.0+cpu

I have tired to use torch 2.3.0+cu12.1, it reported the same error.

What can i do to run huggingface models on intel npu?

Wan_Intel · ‎05-25-2024

Hi MansonHua,

Thanks for reaching out to us.

The error you encountered: RuntimeError - get_shape was called on a descriptor::Tensor with dynamic shape is expected due to only models with static shapes are supported on the NPU plugin. For more information, please refer to Limitations in NPU Device.

Regards,

Wan

MansonHua · ‎05-26-2024

Thank you very much for your reply.

I have tried many huggingface models and always end up with the same error.

And I tried to switch huggingface models to static shapes, but I failed.

I think may be all the hf models are dynamic shapes.

Does it mean Openvino can not run hf models on Utra 9 NPU device.

Or is there any way or tutorials to switch hf models to static shapes.

Wan_Intel · ‎05-27-2024

Hi MansonHua,

Thanks for the information.

For your information, we regret to inform you that there is no tutorial to switch hugging face models to static shapes at the moment. However, here are the documentation on setting input shapes and converting a hugging face model:

Regards,

Wan

MansonHua · ‎05-30-2024

Thank you very much for your reply, it helps me a lot. Looking forward to the continued upgrade of openvino.

Wan_Intel · ‎06-01-2024

Hi MansonHua,

Glad to know that the reply above helps you. If you need additional information from Intel, please submit a new question as this thread will no longer be monitored.

Regards,

Wan