convert embedding models with export_model.py or optimum-intel

Florianoli · ‎01-08-2025

Hi everyone,

I'm currently comparing different embedding models and the compatibility an performance in openvino. As I'm looking for top performing multilingual models I tried to use Alibaba-NLP/gte-Qwen2-1.5B-instruct and BAAI/bge-multilingual-gemma2. I tried the conversion to openvino format with export_model.py and optimum cli.

With export_model.py the conversion cancels during the process. I used these commands:

python export_model.py embeddings --source_model BAAI/bge-multilingual-gemma2 --weight-format fp16 --config_file_path models/config_all.json

python export_model.py embeddings --source_model Alibaba-NLP/gte-Qwen2-1.5B-instruct --weight-format int8 --config_file_path models/config_all.json

With optimum-cli the conversion works but the models cannot be loaded. This is the error which I get doing a request to the model:

APIStatusError: {"error": "Mediapipe graph precondition failed - FAILED_PRECONDITION: CalculatorGraph::Run() failed in Run: 
Calculator::Open() for node "OpenVINOModelServerSessionCalculator_1" failed: ; OpenVINOModelServerSessionCalculator failed to load the model
Calculator::Open() for node "OpenVINOModelServerSessionCalculator_2" failed: ; OpenVINOModelServerSessionCalculator failed to load the model"}

Request:

from openai import OpenAI
import numpy as np

client = OpenAI(
base_url="http://localhost:8000/v3",
api_key="unused"
)
model = "Alibaba-NLP/gte-Qwen2-1.5B-instruct"
embedding_responses = client.embeddings.create(
input=[
"That is a happy person",
"That is a happy very person"
],
model=model,
)
embedding_from_string1 = np.array(embedding_responses.data[0].embedding)
embedding_from_string2 = np.array(embedding_responses.data[1].embedding)
cos_sim = np.dot(embedding_from_string1, embedding_from_string2)/(np.linalg.norm(embedding_from_string1)*np.linalg.norm(embedding_from_string2))
print("Similarity score as cos_sim", cos_sim)

Folder:

And the graph file:

input_stream: "REQUEST_PAYLOAD:input"
output_stream: "RESPONSE_PAYLOAD:output"
node {
calculator: "OpenVINOModelServerSessionCalculator"
output_side_packet: "SESSION:tokenizer"
node_options: {
[type.googleapis.com / mediapipe.OpenVINOModelServerSessionCalculatorOptions]: {
servable_name: "Alibaba-NLP/gte-Qwen2-1.5B-instruct_tokenizer_model"
}
}
}
node {
calculator: "OpenVINOModelServerSessionCalculator"
output_side_packet: "SESSION:embeddings"
node_options: {
[type.googleapis.com / mediapipe.OpenVINOModelServerSessionCalculatorOptions]: {
servable_name: "Alibaba-NLP/gte-Qwen2-1.5B-instruct_embeddings_model"
}
}
}
node {
input_side_packet: "TOKENIZER_SESSION:tokenizer"
input_side_packet: "EMBEDDINGS_SESSION:embeddings"
calculator: "EmbeddingsCalculator"
input_stream: "REQUEST_PAYLOAD:input"
output_stream: "RESPONSE_PAYLOAD:output"
node_options: {
[type.googleapis.com / mediapipe.EmbeddingsCalculatorOptions]: {
normalize_embeddings: true,
}
}
}

I think there is a problem with the mediapipe graph which I created manually. Is there any documentation on how to create the folder structure and mediapipe graph? During the export with optimum a lot of addtional files get exported, not like it is with export_model.py. Are there any hints on how to get these models running with openvino?

Thanks!

Wan_Intel · ‎01-08-2025

Hi Florianoli,

Thanks for reaching out to us.

For your information, Alibaba-NLP/gte-Qwen2-1.5B-instruct and BAAI/bge-multilingual-gemma2 are not supported at the moment. Models supported by optimum-intel should be compatible. In serving validation are included Hugging Face models:

nomic-ai/nomic-embed-text-v1.5
Alibaba-NLP/gte-large-en-v1.5
BAAI/bge-large-en-v1.5
BAAI/bge-large-zh-v1.5
thenlper/gte-small

On the other hand, did you encountered OSError: The paging file is too small for this operation to complete. (os error 1455) while exporting Alibaba-NLP/gte-Qwen2-1.5B-instruct and BAAI/bge-multilingual-gemma2?

Regards,

Wan

Florianoli · ‎01-10-2025

Hello Wan,

thank you for answer. No, I didn't encount that error.

I just looked up the supported models, I found that Quen2/1.5 and Gemma2 are supported. Or does this only count for the generative models and not the embedding models?

https://huggingface.co/docs/optimum/main/en/intel/openvino/models

Wan_Intel · ‎01-11-2025

Hi Florianoli,

Thanks for the information.

Let me check with relevant team and I'll provide an update here as soon as possible.

Regards,

Wan

Luis_at_Intel · ‎01-29-2025

Hi Florianoli,

Sorry for the delay, please note I had no issues loading both Alibaba-NLP/gte-Qwen2-1.5B-instruct and BAAI/bge-multilingual-gemma2 models using the Model Server as described in the following How to serve Embeddings models via OpenAI API guide. I used the 2024.5 version of the docker images for Model Server, I'd expect newer version 2024.6 should also work fine. Please have a try and hope it helps you. If the issue persists on your end kindly share additional information to help us reproduce (i.e. how you are loading the model, OpenVINO version used, optimum version, if docker used, etc.)

Exported models:

$ python export_model.py embeddings --source_model Alibaba-NLP/gte-Qwen2-1.5B-instruct --weight-format int8 --config_file_path models/config.json

$ python export_model.py embeddings --source_model BAAI/bge-multilingual-gemma2 --weight-format int8 --config_file_path models/config.json

Testing over serving API:

$ python openai_client.py

Similarity score as cos_sim 0.965433590102649

$ python openai_client-qwen.py

Similarity score as cos_sim 1.0

$ python openai_client-gemma2.py

Similarity score as cos_sim 0.26743674766226727

Wan_Intel · ‎02-19-2025

Dear Florianoli,

We will proceed with closing this case since we have provided solution. If you need further assistance, please open a new ticket.

Best regards,

Wan

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

convert embedding models with export_model.py or optimum-intel