- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
I have two question about what I see in the profile log.
I'm using DPC++ compiler with Intel OneAPI.
Develop Environment:
OS: Windows 10 Home (64bit)
CPU: Intel Corei7-1065G7 1.3GHz
GPU: Intel Iris Plus Graphics
I have installed Intel oneAPI basetoolkit beta Update 8.
I ran the following command.
test_matrix.exe is my executable file created by DPC++ compiler.
advixe-cl --collect=roofline --profile-gpu --project-dir=C:\Test\Release --search-dir src:r=C:\Test\src -- C:\Test\Release\test_matrix.exe
<Question 1>
The following warning is displayed in survey analysis.
advixe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: advixe-cl -r C:\Test\Release\e000\hs008 -command stop.
advixe: Warning: [Instrumentation Engine]: GTPin: GTPin didn't find any kernels... Exiting without doing anything.
advixe: Collection stopped.
・What is the cause of this?
・In my executable file, matrix operation is executed by GPU (DPC++) parallel processing.
Is it not profiled correctly?
<Question 2>
The value of GFLOPS displayed in the log is 0,
about survey analysis and tripcounts analysis.
Output log example:
Elapsed Time: 5.23s
Total CPU time: 3.83101
Time in 1 vectorized loop: 0.298428
GFLOPS: 0
・Is it not profiled correctly?
Is there a way to make sure it is correct?
Best regard.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Regarding your first question: this is fine for the first 'survey' step of the collection.
For the second one: what is the size of the multiplied matrices? Please note, that the kernel has to run at least 10ms (longer is better).
BR, Anton
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I also tried the matrix size with 4096, but the value of GFLOPS was 0.
Elapsed Time: 14.74s
Total CPU time: 12.5268
Time in 2 vectorized loops: 12.02
GFLOPS: 0
I want to know why I need more than 10ms.
is it impossible to analyze the Adivisor roofline?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can you share your source code?
Advisor needs 10ms in order to have at least a couple of the time sampling hits inside the kernel. In other words to have more reliable results.
BR, Anton
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
I attach the source code zip file. (TestCodeDCP_IntelAdvisor.zip)
Development: Microsoft VisualStudio Professional 2019 Version 16.5.5
I want to measure the performance of the following GPU parallel processing part.
(src\multiply.cpp Line26-53)
Best regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @k_higashi, could you please try to use Advisor 2021.2.0 and let us know the result?
Thanks, Mariya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard from you in a while. If you need any additional information, please submit a new question as this thread will no longer be monitored.
Regards
Gopika
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page