Analyzers
Support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
Announcements
The Intel sign-in experience is changing in February to support enhanced security controls. If you sign in, click here for more information.
4751 Discussions

Intel VTune - Estimate data offload to GPU

HPCAnalisys
Beginner
359 Views

Hi, I'm interested in estimate the data transfer, in terms of bytes, of an algorithm or function to be executed on a GPU using Intel VTune. For example, if my algorithm computes a multiplication between two vectors of 10 floats elements each, the result after the offloading would be: 10+10 float elements sent to the GPU and 1, the result, is sent back, so we have 84 bytes in total (21*4). Keep in mind that I'm interested in an estimation, not the actual result on a GPU, since I don't have one available.

With Intel Advisor is possible to do so and is called "Estimated data transfer with reuse", as I attach in the following screen:

Screenshot from 2022-10-24 16-49-17.png

 

In Intel VTune the only way I found is via the "Memory Access" analysis but It express the result as number of loads and stores and probably using hardware counters, so if there are multiple readings from main memory caused by huge data structures, they will be taken into account and does not returns the number of bytes.

Screenshot from 2022-10-24 16-56-17.png

Is there a way to perform a similar analysis with Intel VTune? Thanks

Labels (1)
0 Kudos
6 Replies
JaideepK_Intel
Moderator
313 Views

Hi,

 

Good day to you.

Thank you for posting in the Intel communities.

 

If you have a core CPU in your system, it will have UHD graphics. When you try GPU offload using the Intel Vtune profiler, you can see GPU memory access (read and write) metrics in GB/sec. I have attached a screen shot for your reference. To try GPU offload analysis, you need to have a sample that runs on GPU as well as a system with Intel GPU.

 

JaideepK_Intel_0-1666790595751.png

 

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Thank you!

 

Regards,

Jaideep

 

HPCAnalisys
Beginner
299 Views

Thanks for your answer! So the analysis you are proposing is not an estimation of the amount of data to offloaded on a GPU but is it a real measurement that requires a graphics card to be performed and collect the data? Is the type of variable taken int account (float, double...) or only the number of reads and writes?

 

JaideepK_Intel
Moderator
268 Views

Hi,

 

Good day to you.

 

>>So, the analysis you are proposing is not an estimation of the amount of data to offloaded on a GPU but is it a real measurement that requires a graphics card to be performed and collect the data?

Yes, the date which we get is not an estimation data and it is a real measurement.

 

>>Is the type of variable taken int account (float, double...) or only the number of reads and writes?

It is only the number of reads and writes.

 

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Thank you!

 

Regards,

Jaideep

 

 

 

HPCAnalisys
Beginner
257 Views

Thanks for the clarification! Before marking the thread solved I would like to be sure that there is no way to estimate data transfer with reuse, like Intel Advisor does, for GPU offloading analysis with Intel VTune. It only perform real measurement, Is it correct?

I saw that also the "Memory access" analysis calculate the number of accesses expressed ad loads and store, so if I know that they are float I need to multiply by 4 bytes to have the number of bytes read and wrote, correct?

Thanks again.

JaideepK_Intel
Moderator
145 Views

Hi,

 

Good day to you.

Sorry for the delay,

 

>>I would like to be sure that there is no way to estimate data transfer with reuse, like Intel Advisor does, for GPU offloading analysis with Intel VTune. It only performs real measurement, Is it correct?

Yes, Vtune only gives real measurements.

 

>>I saw that also the "Memory access" analysis calculates the number of accesses expressed ad loads and store, so if I know that they are float I need to multiply by 4 bytes to have the number of bytes read and wrote, correct?

If it is a float, we can multiply with 4 bytes.

 

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Thank you!

 

Regards,

Jaideep

 

 

JaideepK_Intel
Moderator
89 Views

Hi,


We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks,

Jaideep


Reply