OpenVINO Model Server

ashish017 · ‎08-12-2025

I am running OVMS model using chat/completions endpoint. The input is list of dictionaries:

[ {"role": "system" : "You are an helpful assistant."}, {"role": "user" : "Hi! How are you?"}, {"role": "assistant" : "."} ]

The model is correctly hosting, tested completions endpoint which is working fine. Getting the following error for the chat/completions case:

```
"Mediapipe execution failed. MP status - INVALID_ARGUMENT: CalculatorGraph::Run() failed:
Calculator::Process() for node "LLMExecutor" failed: Error: Chat template not loaded correctly, so it cannot be applied"

```

I have tried Openvino model from huggingface and local converted models. All required files were preset( tokenizer_config.json).

These are the versions I am using

transformers 4.51.3
openvino 2025.1.0
openvino-telemetry 2025.2.0
openvino-tokenizers 2025.1.0.0

The same code works properly on other device with same configurations. A resolution will be greatly appreciated.

Am I missing something. Do direct me to relevant source for the same.

Thanks.

Zulkifli_Intel · ‎08-12-2025

Hi ashish017,

Thank you for reaching out to us.

For OVMS, the support for the Huggingface models is included in OpenVINO 2025.2.0. Please use OpenVINO 2025.2.0 version and see if this helps to resolve the issue.

If you still face the same issue, please share the related files, environment, and steps to reproduce the issue.

Regards,

Zul