Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4974 Discussions

About GPU profile log from command line

k_higashi
Beginner
1,693 Views

Hi.

I have two question about what I see in the profile log.
I'm using DPC++ compiler with Intel OneAPI.

Develop Environment:
OS: Windows 10 Home (64bit)
CPU: Intel Corei7-1065G7 1.3GHz
GPU: Intel Iris Plus Graphics
I have installed Intel oneAPI basetoolkit beta Update 8.

I ran the following command.
test_matrix.exe is my executable file created by DPC++ compiler.

advixe-cl --collect=roofline --profile-gpu --project-dir=C:\Test\Release --search-dir src:r=C:\Test\src -- C:\Test\Release\test_matrix.exe

 

<Question 1>

The following warning is displayed in survey analysis.

advixe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: advixe-cl -r C:\Test\Release\e000\hs008 -command stop.
advixe: Warning: [Instrumentation Engine]: GTPin: GTPin didn't find any kernels... Exiting without doing anything.
advixe: Collection stopped.

・What is the cause of this? 
・In my executable file, matrix operation is executed by GPU (DPC++) parallel processing.

    Is it not profiled correctly?

<Question 2>

The value of GFLOPS displayed in the log is 0, 

 about survey analysis and tripcounts analysis.

Output log example:

Elapsed Time: 5.23s
Total CPU time: 3.83101
Time in 1 vectorized loop: 0.298428
GFLOPS: 0

・Is it not profiled correctly?
Is there a way to make sure it is correct?

Best regard.

0 Kudos
6 Replies
AntonT
Employee
1,680 Views

Hi,

Regarding your first question: this is fine for the first 'survey' step of the collection.

For the second one: what is the size of the multiplied matrices? Please note, that the kernel has to run at least 10ms (longer is better).

BR, Anton

0 Kudos
k_higashi
Beginner
1,662 Views
Thank you for your answer.
 
>first question
OK.
 
>second question
The size of the multiplied matrices is 1024.
I also tried the matrix size with 4096, but the value of GFLOPS was 0.
 
output log (Excerpt)
Elapsed Time: 14.74s
Total CPU time: 12.5268
Time in 2 vectorized loops: 12.02
GFLOPS: 0
I think the process takes more than 10ms.
 
・another question.(About the reason for restrictions)
>that the kernel has to run at least 10ms (longer is better).

I want to know why I need more than 10ms.
 
If the matrix size is small (eg N=256x256) and the processing time is short,
is it impossible to analyze the Adivisor roofline?
 
best regards.
0 Kudos
AntonT
Employee
1,653 Views

Hi,

Can you share your source code?

Advisor needs 10ms in order to have at least a couple of the time sampling hits inside the kernel. In other words to have more reliable results.

BR, Anton

 

0 Kudos
k_higashi
Beginner
1,637 Views

Hi.

I attach the source code zip file. (TestCodeDCP_IntelAdvisor.zip)

Development: Microsoft VisualStudio Professional 2019 Version 16.5.5

I want to measure the performance of the following GPU parallel processing part.
(src\multiply.cpp Line26-53)

Best regards.

0 Kudos
Mariya_P_Intel
Moderator
1,279 Views

Hi @k_higashi, could you please try to use Advisor 2021.2.0 and let us know the result?

https://software.intel.com/content/www/us/en/develop/articles/oneapi-standalone-components.html#advisor

Thanks, Mariya


0 Kudos
Gopika_Intel
Moderator
1,236 Views

Hi,

We have not heard from you in a while. If you need any additional information, please submit a new question as this thread will no longer be monitored.

Regards

Gopika


0 Kudos
Reply