Re:The performance of using OpenVINO Model Server is much lower than deploying locally

WT_Jacko · ‎07-18-2024

Hi sir,

When I run models using OVMS, I find that CPU or GPU utilization is much lower than when deployed locally.

Here is my configuration:

CPU model name: Intel(R) Core(TM) Ultra 7 165H

Driver Version: i915 | 24.13.29138.7

Openvino version: 2024.2

Please refer to my link in attachment; the video of the test results is included

Thanks

Best Regards

Jacko

Zulkifli_Intel · ‎07-22-2024

Hi WT_Jacko,

Thank you for reporting the issue.

We are checking this out and will get back to you soon.

Regards,

Zul

WT_Jacko · ‎07-28-2024

Hi Zul,

May i know any updates?

Thanks

Best Regards

Jacko

WT_Jacko · ‎08-06-2024

Hi Zul,

May i know any updates?

Thanks

Best Regards

Jacko

Zulkifli_Intel · ‎08-12-2024

Hi WT_Jacko,

Thank you for your patience. This issue could be related to different parameters that have been used. Could you share the parameters you used to train your model?

Regards

Zul

WT_Jacko · ‎08-15-2024

Hi Zul,

Could you please let me know which training model parameters you require?

This link documents my test results. You can clearly see that the performance differs significantly when using the same model and hardware locally versus with OVMS. Is this difference normal(locally : 50ms, OVMS : 550ms)?

https://breezy-dill-4d0.notion.site/2024-07-17-Benchmark-monitor-usage-c8900510e52345138195836050c43bd5

Thanks for your help

Best Regards

Jacko

Witold_Intel · ‎08-26-2024

Hi WT_Jacko,

I'd like to remind that source files would help us to understand this issue and conduct tests. Could you please send the code to us?

Kind regards,

WT_Jacko · ‎08-26-2024

Hi Witold

The command currently executed by the customer in OVMS is as follows:

sudo docker run -it --rm --device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v ${PWD}/models:/models -p 8000:8000 -p 9000:9000 openvino/model_server:2024.2-gpu --model_name medicine --model_path /models/medicine --port 9000 --rest_port 8000 --target_device GPU

We have confirmed with the customer that both locally and on OVM, they are running the same demo.py and using the same model. Note: demo.py is the customer's own script

We are asking the customer if there's a chance we could get the source files for demo.py and the AI model.

Thanks

Best Regards

Jacko

Zulkifli_Intel · ‎08-19-2024

Hi WT_Jacko,

You mentioned that you are running the same model both locally and on OVMS, but we noticed that file names and command line prompts are different. The first model uses ObjectDetectionV2IR.Inference and the second one seem to use a different, like more time-consuming inference method. Can you share the source files with us? From the videos, we are unable to tell the differences.

Regards,

Zul

Witold_Intel · ‎08-27-2024

Hello @WT_Jacko, thank you for the detailed explanation. In this case we are waiting for the "medicine" model and "demo.py" script.

Witold_Intel · ‎08-30-2024

Hello Jacko, can we receive the files for testing? Is further support required from our side? Thank you.

Witold_Intel · ‎09-02-2024

Hello Jacko, can we receive the files for testing? Is further support required from our side?

I will have to close the support ticket if there is no reply from your side for 7 business days. Thanks for understanding.

WT_Jacko · ‎09-03-2024

Hi Sir,

Due to customer company policies, our client is currently unable to share their internal software. We've asked them to update the Compute Runtime libraries from https://github.com/intel/compute-runtime/releases to see if it improves performance. If there are any further issues, they will create a ticket. Thanks!

Thanks

Best Regards

Jacko

Witold_Intel · ‎09-03-2024

Hi Jacko,

many thanks for the update. I'm deescalating the case then.

Kind regards,

Witold

Zulkifli_Intel · ‎09-04-2024

Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored.

The performance of using OpenVINO Model Server is much lower than deploying locally

Benchmarking