Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
5260 Discussions

About GPU profile log from command line

k_higashi
Beginner
2,921 Views

Hi.

I have two question about what I see in the profile log.
I'm using DPC++ compiler with Intel OneAPI.

Develop Environment:
OS: Windows 10 Home (64bit)
CPU: Intel Corei7-1065G7 1.3GHz
GPU: Intel Iris Plus Graphics
I have installed Intel oneAPI basetoolkit beta Update 8.

I ran the following command.
test_matrix.exe is my executable file created by DPC++ compiler.

advixe-cl --collect=roofline --profile-gpu --project-dir=C:\Test\Release --search-dir src:r=C:\Test\src -- C:\Test\Release\test_matrix.exe

 

<Question 1>

The following warning is displayed in survey analysis.

advixe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: advixe-cl -r C:\Test\Release\e000\hs008 -command stop.
advixe: Warning: [Instrumentation Engine]: GTPin: GTPin didn't find any kernels... Exiting without doing anything.
advixe: Collection stopped.

・What is the cause of this? 
・In my executable file, matrix operation is executed by GPU (DPC++) parallel processing.

    Is it not profiled correctly?

<Question 2>

The value of GFLOPS displayed in the log is 0, 

 about survey analysis and tripcounts analysis.

Output log example:

Elapsed Time: 5.23s
Total CPU time: 3.83101
Time in 1 vectorized loop: 0.298428
GFLOPS: 0

・Is it not profiled correctly?
Is there a way to make sure it is correct?

Best regard.

0 Kudos
6 Replies
AntonT
Employee
2,908 Views

Hi,

Regarding your first question: this is fine for the first 'survey' step of the collection.

For the second one: what is the size of the multiplied matrices? Please note, that the kernel has to run at least 10ms (longer is better).

BR, Anton

0 Kudos
k_higashi
Beginner
2,890 Views
Thank you for your answer.
 
>first question
OK.
 
>second question
The size of the multiplied matrices is 1024.
I also tried the matrix size with 4096, but the value of GFLOPS was 0.
 
output log (Excerpt)
Elapsed Time: 14.74s
Total CPU time: 12.5268
Time in 2 vectorized loops: 12.02
GFLOPS: 0
I think the process takes more than 10ms.
 
・another question.(About the reason for restrictions)
>that the kernel has to run at least 10ms (longer is better).

I want to know why I need more than 10ms.
 
If the matrix size is small (eg N=256x256) and the processing time is short,
is it impossible to analyze the Adivisor roofline?
 
best regards.
0 Kudos
AntonT
Employee
2,881 Views

Hi,

Can you share your source code?

Advisor needs 10ms in order to have at least a couple of the time sampling hits inside the kernel. In other words to have more reliable results.

BR, Anton

 

0 Kudos
k_higashi
Beginner
2,865 Views

Hi.

I attach the source code zip file. (TestCodeDCP_IntelAdvisor.zip)

Development: Microsoft VisualStudio Professional 2019 Version 16.5.5

I want to measure the performance of the following GPU parallel processing part.
(src\multiply.cpp Line26-53)

Best regards.

0 Kudos
Mariya_P_Intel
Moderator
2,507 Views

Hi @k_higashi, could you please try to use Advisor 2021.2.0 and let us know the result?

https://software.intel.com/content/www/us/en/develop/articles/oneapi-standalone-components.html#advisor

Thanks, Mariya


0 Kudos
Gopika_Intel
Moderator
2,464 Views

Hi,

We have not heard from you in a while. If you need any additional information, please submit a new question as this thread will no longer be monitored.

Regards

Gopika


0 Kudos
Reply