Best way to do profiling for the communication overhead in openvino

lostkingdom4 · ‎01-15-2024

I was trying to profile my compiled model during inferencing in GPU using

infos = infer_request.profiling_info. I notice that the infer_request.latency differs by magnitude from summing real-time for all the nodes in the computation graph. I know the cause might be the overhead of loading data into the GPU. However, is there a way to profile the latency caused by such overhead between nodes?

Thanks

Wan_Intel · ‎01-16-2024

Hi Lostkingdom4,

Thanks for reaching out to us.

Are you using dynamic shapes during inferencing in GPU? For your information, due to the dominant runtime overhead on the host device, dynamic shapes may perform worse than static shapes on a discrete GPU.

To improve the performance, you may use static shapes whenever possible, use bounded dynamic shapes whenever possible, or use a permanent cache to reduce the runtime re-compilation overhead.

For more information, please refer to Recommendations for performance improvement in GPU Device.

Regards,

Wan

lostkingdom4 · ‎01-16-2024

Thanks for the information. Dynamic shapes are definitely a problem.

Meanwhile, I'm also interested in profiling the model inference for Openvino code written in Python. As I mentioned in the first post, infer_request.latency differs by magnitude from summing real-time for all the nodes from infer_request.profiling_info with GPU. It would be a great help for us if we could have a better understanding of what really caused such a time difference in latency information when the inference started and between nodes of the XML computation graph.

I have tried Intel Vtune and Advisor. It seems like they cannot get a precise result on Python code. It would be great if you could give us some advice on profiling the entire inference.

I've attached my Python code in txt format for your reference.

Wan_Intel · ‎01-20-2024

Hi Lostkingdom4,

I've run your Python script with OpenVINO™ Development Tools 2023.2.0. However, I encountered an error as shown as follows:

[General Error] Model file /home/devcloud/Latency_prediction/xml/GCNConv_cora_small.xml cannot be opened!

Could you please share the necessary files and steps to reproduce the issue so that we can further investigate the issue?

Regards,

Wan

lostkingdom4 · ‎01-31-2024

Hi Wan,

Thank you so much for your help. I put all the required files in a zip file. After you extract the zip, please start with the readme. It will help your understanding.

If you have any questions, please let me know.

Thank you for your help again.

Wan_Intel · ‎01-31-2024

Hi Lostkingdom4,

Thanks for sharing the information with us.

Let me check with relevant team and I'll update you as soon as possible.

Regards,

Wan

Wan_Intel · ‎01-31-2024

Hi Lostkingdom4,

I've extracted the ZIP file and run the Python file with the command: python node_prediction.py

However, I'm not able to see the result of the total latency from infer_request and sum the latency layer by layer from their profilling_info as shown in the image.

Are you able to generate the issue from your end? Could you please share the result of running the Python file from your end?

Regards,

Wan

lostkingdom4 · ‎02-01-2024

Hi Wan,

I double-checked my code and this zip works better to print a clearer output.

By running the Python script, I got some results as

One shows 55.606 ms. One shows 779.580 ms

Thanks.

Wan_Intel · ‎02-03-2024

Hi Lostkingdom4,

Thanks for sharing the information with us.

I've run the Python script from the latest ZIP file. I also obtained the magnitude of total latency from infer_request is larger than the sum of the the latency layer by layer from their profiling_info.

same issue.jpg

Let me check with the relevant team and I'll update you as soon as possible.

Regards,

Wan

lostkingdom4 · ‎03-02-2024

Hi Wan,

Thanks for the help. Could you please provide me with any updates?

Best regards.

Wan_Intel · ‎05-25-2024

Hi Lostkingdom4,

Thanks for your patience. We've received feedback from relevant team.

After deep analysis, we are sorry to tell you that ways to profile the latency caused by the overhead and the improvement on the current perf_counter behavior are not available at the moment. We will fix the issue in future OpenVINO releases. Sorry for the inconvenience and thank you for your support.

Regards,

Wan

Wan_Intel · ‎05-30-2024

Hi Lostkingdom4,

If you need additional information from Intel, please submit a new question as this thread will no longer be monitored.

Regards,

Wan