Analyzers
Community support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
4925 Discussions

Intel VTune - Estimate data offload to GPU

HPCAnalisys
Beginner
725 Views

Hi, I'm interested in estimate the data transfer, in terms of bytes, of an algorithm or function to be executed on a GPU using Intel VTune. For example, if my algorithm computes a multiplication between two vectors of 10 floats elements each, the result after the offloading would be: 10+10 float elements sent to the GPU and 1, the result, is sent back, so we have 84 bytes in total (21*4). Keep in mind that I'm interested in an estimation, not the actual result on a GPU, since I don't have one available.

With Intel Advisor is possible to do so and is called "Estimated data transfer with reuse", as I attach in the following screen:

Screenshot from 2022-10-24 16-49-17.png

 

In Intel VTune the only way I found is via the "Memory Access" analysis but It express the result as number of loads and stores and probably using hardware counters, so if there are multiple readings from main memory caused by huge data structures, they will be taken into account and does not returns the number of bytes.

Screenshot from 2022-10-24 16-56-17.png

Is there a way to perform a similar analysis with Intel VTune? Thanks

Labels (1)
0 Kudos
6 Replies
JaideepK_Intel
Moderator
679 Views

Hi,

 

Good day to you.

Thank you for posting in the Intel communities.

 

If you have a core CPU in your system, it will have UHD graphics. When you try GPU offload using the Intel Vtune profiler, you can see GPU memory access (read and write) metrics in GB/sec. I have attached a screen shot for your reference. To try GPU offload analysis, you need to have a sample that runs on GPU as well as a system with Intel GPU.

 

JaideepK_Intel_0-1666790595751.png

 

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Thank you!

 

Regards,

Jaideep

 

0 Kudos
HPCAnalisys
Beginner
665 Views

Thanks for your answer! So the analysis you are proposing is not an estimation of the amount of data to offloaded on a GPU but is it a real measurement that requires a graphics card to be performed and collect the data? Is the type of variable taken int account (float, double...) or only the number of reads and writes?

 

0 Kudos
JaideepK_Intel
Moderator
634 Views

Hi,

 

Good day to you.

 

>>So, the analysis you are proposing is not an estimation of the amount of data to offloaded on a GPU but is it a real measurement that requires a graphics card to be performed and collect the data?

Yes, the date which we get is not an estimation data and it is a real measurement.

 

>>Is the type of variable taken int account (float, double...) or only the number of reads and writes?

It is only the number of reads and writes.

 

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Thank you!

 

Regards,

Jaideep

 

 

 

0 Kudos
HPCAnalisys
Beginner
623 Views

Thanks for the clarification! Before marking the thread solved I would like to be sure that there is no way to estimate data transfer with reuse, like Intel Advisor does, for GPU offloading analysis with Intel VTune. It only perform real measurement, Is it correct?

I saw that also the "Memory access" analysis calculate the number of accesses expressed ad loads and store, so if I know that they are float I need to multiply by 4 bytes to have the number of bytes read and wrote, correct?

Thanks again.

0 Kudos
JaideepK_Intel
Moderator
511 Views

Hi,

 

Good day to you.

Sorry for the delay,

 

>>I would like to be sure that there is no way to estimate data transfer with reuse, like Intel Advisor does, for GPU offloading analysis with Intel VTune. It only performs real measurement, Is it correct?

Yes, Vtune only gives real measurements.

 

>>I saw that also the "Memory access" analysis calculates the number of accesses expressed ad loads and store, so if I know that they are float I need to multiply by 4 bytes to have the number of bytes read and wrote, correct?

If it is a float, we can multiply with 4 bytes.

 

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Thank you!

 

Regards,

Jaideep

 

 

0 Kudos
JaideepK_Intel
Moderator
455 Views

Hi,


We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks,

Jaideep


0 Kudos
Reply