topic Re: Intel VTune - Estimate data offload to GPU in Analyzers

Intel VTune - Estimate data offload to GPU

HPCAnalisys — Mon, 24 Oct 2022 16:29:32 GMT

Hi, I'm interested in estimate the data transfer, in terms of bytes, of an algorithm or function to be executed on a GPU using Intel VTune. For example, if my algorithm computes a multiplication between two vectors of 10 floats elements each, the result after the offloading would be: 10+10 float elements sent to the GPU and 1, the result, is sent back, so we have 84 bytes in total (21*4). Keep in mind that I'm interested in an estimation, not the actual result on a GPU, since I don't have one available.

With Intel Advisor is possible to do so and is called "Estimated data transfer with reuse", as I attach in the following screen:

In Intel VTune the only way I found is via the "Memory Access" analysis but It express the result as number of loads and stores and probably using hardware counters, so if there are multiple readings from main memory caused by huge data structures, they will be taken into account and does not returns the number of bytes.

Is there a way to perform a similar analysis with Intel VTune? Thanks

Re: Intel VTune - Estimate data offload to GPU

JaideepK_Intel — Wed, 26 Oct 2022 13:24:09 GMT

Hi,

Good day to you.

Thank you for posting in the Intel communities.

If you have a core CPU in your system, it will have UHD graphics. When you try GPU offload using the Intel Vtune profiler, you can see GPU memory access (read and write) metrics in GB/sec. I have attached a screen shot for your reference. To try GPU offload analysis, you need to have a sample that runs on GPU as well as a system with Intel GPU.

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Thank you!

Regards,

Jaideep

Re: Intel VTune - Estimate data offload to GPU

HPCAnalisys — Wed, 26 Oct 2022 15:18:18 GMT

Thanks for your answer! So the analysis you are proposing is not an estimation of the amount of data to offloaded on a GPU but is it a real measurement that requires a graphics card to be performed and collect the data? Is the type of variable taken int account (float, double...) or only the number of reads and writes?

Re: Intel VTune - Estimate data offload to GPU

JaideepK_Intel — Fri, 28 Oct 2022 07:30:38 GMT

Hi,

Good day to you.

>>So, the analysis you are proposing is not an estimation of the amount of data to offloaded on a GPU but is it a real measurement that requires a graphics card to be performed and collect the data?

Yes, the date which we get is not an estimation data and it is a real measurement.

>>Is the type of variable taken int account (float, double...) or only the number of reads and writes?

It is only the number of reads and writes.

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Thank you!

Regards,

Jaideep

Re: Intel VTune - Estimate data offload to GPU

HPCAnalisys — Fri, 28 Oct 2022 12:45:20 GMT

Thanks for the clarification! Before marking the thread solved I would like to be sure that there is no way to estimate data transfer with reuse, like Intel Advisor does, for GPU offloading analysis with Intel VTune. It only perform real measurement, Is it correct?

I saw that also the "Memory access" analysis calculate the number of accesses expressed ad loads and store, so if I know that they are float I need to multiply by 4 bytes to have the number of bytes read and wrote, correct?

Thanks again.

Re: Intel VTune - Estimate data offload to GPU

JaideepK_Intel — Fri, 25 Nov 2022 05:03:15 GMT

Hi,

Good day to you.

Sorry for the delay,

>>I would like to be sure that there is no way to estimate data transfer with reuse, like Intel Advisor does, for GPU offloading analysis with Intel VTune. It only performs real measurement, Is it correct?

Yes, Vtune only gives real measurements.

>>I saw that also the "Memory access" analysis calculates the number of accesses expressed ad loads and store, so if I know that they are float I need to multiply by 4 bytes to have the number of bytes read and wrote, correct?

If it is a float, we can multiply with 4 bytes.

If this resolves your issue, make sure to accept this as a solution. This would help others with similar issue. Thank you!

Regards,

Jaideep

Re:Intel VTune - Estimate data offload to GPU

JaideepK_Intel — Mon, 12 Dec 2022 06:11:56 GMT

Hi,

We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.

Thanks,

Jaideep