Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.
6430 Discussions

Best way to do profiling for the communication overhead in openvino

lostkingdom4
Beginner
1,423 Views

I was trying to profile my compiled model during inferencing in GPU using  

infos = infer_request.profiling_info. I notice that the infer_request.latency differs by magnitude from summing real-time for all the nodes in the computation graph. I know the cause might be the overhead of loading data into the GPU. However, is there a way to profile the latency caused by such overhead between nodes?
 
Thanks
0 Kudos
11 Replies
Wan_Intel
Moderator
1,385 Views

Hi Lostkingdom4,

Thanks for reaching out to us.

 

Are you using dynamic shapes during inferencing in GPU? For your information, due to the dominant runtime overhead on the host device, dynamic shapes may perform worse than static shapes on a discrete GPU.

 

To improve the performance, you may use static shapes whenever possible, use bounded dynamic shapes whenever possible, or use a permanent cache to reduce the runtime re-compilation overhead.

 

For more information, please refer to Recommendations for performance improvement in GPU Device.

 

 

Regards,

Wan

 

0 Kudos
lostkingdom4
Beginner
1,368 Views

Thanks for the information. Dynamic shapes are definitely a problem. 

 

Meanwhile, I'm also interested in profiling the model inference for Openvino code written in Python. As I mentioned in the first post, infer_request.latency differs by magnitude from summing real-time for all the nodes from infer_request.profiling_info with GPU. It would be a great help for us if we could have a better understanding of what really caused such a time difference in latency information when the inference started and between nodes of the XML computation graph.

 

I have tried Intel Vtune and Advisor. It seems like they cannot get a precise result on Python code. It would be great if you could give us some advice on profiling the entire inference.

 

I've attached my Python code in txt format for your reference. 

0 Kudos
Wan_Intel
Moderator
1,299 Views

Hi Lostkingdom4,

I've run your Python script with OpenVINO™ Development Tools 2023.2.0. However, I encountered an error as shown as follows:

[General Error] Model file /home/devcloud/Latency_prediction/xml/GCNConv_cora_small.xml cannot be opened!

 

Could you please share the necessary files and steps to reproduce the issue so that we can further investigate the issue?

 

 

Regards,

Wan

 

0 Kudos
lostkingdom4
Beginner
1,187 Views

Hi Wan,

 

Thank you so much for your help. I put all the required files in a zip file. After you extract the zip, please start with the readme. It will help your understanding. 

 

If you have any questions, please let me know.

 

Thank you for your help again.

0 Kudos
Wan_Intel
Moderator
1,179 Views

Hi Lostkingdom4,

Thanks for sharing the information with us.

Let me check with relevant team and I'll update you as soon as possible.

 

 

Regards,

Wan

 

0 Kudos
Wan_Intel
Moderator
1,167 Views

Hi Lostkingdom4,

I've extracted the ZIP file and run the Python file with the command: python node_prediction.py

 

However, I'm not able to see the result of the total latency from infer_request and sum the latency layer by layer from their profilling_info as shown in the image.

GNN_profilling.jpg

 

Are you able to generate the issue from your end? Could you please share the result of running the Python file from your end?              

 

 

Regards,

Wan 

 

0 Kudos
lostkingdom4
Beginner
1,148 Views

Hi Wan,

 

I double-checked my code and this zip works better to print a clearer output.

 

By running the Python script, I got some results as

Screenshot 2024-02-01 000248.png

One shows 55.606 ms. One shows 779.580 ms

 

Thanks.

0 Kudos
Wan_Intel
Moderator
1,092 Views

Hi Lostkingdom4,

Thanks for sharing the information with us.

 

I've run the Python script from the latest ZIP file. I also obtained the magnitude of total latency from infer_request is larger than the sum of the the latency layer by layer from their profiling_info.

same issue.jpg

 

Let me check with the relevant team and I'll update you as soon as possible.

 

 

Regards,

Wan

 

0 Kudos
lostkingdom4
Beginner
849 Views

Hi Wan,

 

Thanks for the help. Could you please provide me with any updates?

 

Best regards.

0 Kudos
Wan_Intel
Moderator
241 Views

Hi Lostkingdom4,

Thanks for your patience. We've received feedback from relevant team.

 

After deep analysis, we are sorry to tell you that ways to profile the latency caused by the overhead and the improvement on the current perf_counter behavior are not available at the moment. We will fix the issue in future OpenVINO releases. Sorry for the inconvenience and thank you for your support.

 

 

Regards,

Wan

 

0 Kudos
Wan_Intel
Moderator
114 Views

Hi Lostkingdom4,

If you need additional information from Intel, please submit a new question as this thread will no longer be monitored.

 

 

Regards,

Wan

 

0 Kudos
Reply