Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
5255 Discussions

Issue with VTune Source analysis for gpu hotspot

SampathRachumallu
2,352 Views

Hi,

I am getting below warning message when trying to get the GPU Hotspot analysis with VTune for SYCL application

vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis 
topic in the product online help for instructions                                                           

 
I have tried enabling the debug information as mentioned in the post  using -gline-tables-only and -fdebug-info-for-profiling flags. But with -gline-tables-only i was getting below error

llvm-foreach: Segmentation fault (core dumped)
icpx: error: gen compiler command failed with exit code 254 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2025.1.1 (2025.1.1.20250418)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2025.1/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2025.1/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s)


Please help in resolving this issue
Below are the config:
OS: Ubuntu 22.04.4 LTS
GPU: Intel Data Center GPU Max 1550 (PVC)
Vtune: Intel(R) VTune(TM) Profiler 2025.3.0 (build 630104)
compiler: Intel(R) oneAPI DPC++/C++ Compiler 2025.1.1 (2025.1.1.20250418)

Output of sycl-ls

[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Data Center GPU Max 1550 12.60.7 [1.6.31294+9]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Xeon(R) Platinum 8352Y CPU @ 2.20GHz OpenCL 3.0 (Build 0) [2025.19.4.0.18_160000.xmain-hotfix]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1550 OpenCL 3.0 NEO  [24.39.31294]

 

Labels (1)
0 Kudos
11 Replies
yuzhang3_intel
Moderator
2,299 Views

This warning just tells you VTune can't locate the Linux kernel,  do you need to trace code in the kernel? I suppose you want to profile the sycl application, so you only build your application with debug information. 

 

You can refer to the cmake file in the oneapi sample code below:

oneAPI-samples/Tools/VTuneProfiler/matrix_multiply_vtune/CMakeLists.txt

set(CMAKE_CXX_COMPILER icpx)
cmake_minimum_required(VERSION 3.4)
project(matrix_multiply)
set(CMAKE_CXX_FLAGS "-g -O3 -fsycl -Wno-write-strings -w -D_Linux")
add_executable(matrix.dpcpp src/matrix.cpp src/multiply.cpp)
add_custom_target(run ./matrix.dpcpp)

 

https://github.com/oneapi-src/oneAPI-samples

0 Kudos
SampathRachumallu
2,276 Views

Hi,
Thanks for the reply

I need to trace the code in the kernel and findout which part of the source code is creating the bottleneck
Does -g helping in source analysis for VTune?

 

I have just tried the vtune self checker to verify if there is any issue with installation

It gave below output

 

The system is ready for the following analyses:
* Performance Snapshot
* Hotspots and Threading with user-mode sampling
* Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
* Microarchitecture Exploration
* Memory Access
* Hotspots with HW event-based sampling and call stacks
* Threading with HW event-based sampling

The following analyses have failed on the system:
* GPU Compute/Media Hotspots (characterization mode)
* GPU Compute/Media Hotspots (source analysis mode)

Attaching the complete log
 

0 Kudos
yuzhang3_intel
Moderator
2,267 Views

If you need to profile kernel code, you need to build a kernel with debug information using option, '-g'.

 

https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2025-1/enabling-linux-kernel-analysis.html

 

You can profile a GPU workload to see if the gpu profiling is ready, like:

vtune -collect gpu-hotspots -- {your application}

 

0 Kudos
SampathRachumallu
2,257 Views

Hi @yuzhang3_intel ,

 

Thanks for clarification

I do get the source code analysis enabled now with the -g flag. But, it is pointing to incorrect lines in the source code mapping window

I have multiple variations of a kernel in .hpp and in the source code analysis it is mapping the metrics to the variant which is not even getting executed. Also i am running the vtune analysis in the remote linux machine, getting the vtune output dumps there. I am downloading this output folder and visualizing the results in my windows machine

 

Will this create any problem with the mapping?

 

I am also attaching the source code files for your reference (reduction_sum_1d_header.c is the actual kernel file)

Note: I have renamed the .hpp file to _header.c since there portal is reporting some issue with the content of .hpp file

0 Kudos
yuzhang3_intel
Moderator
2,249 Views

The binary with debug information must be matched with the source. For example, if you rebuild the kernel, you need to re-map the binary with the corresponding source code.

0 Kudos
SampathRachumallu
2,228 Views

I just rechecked
The binary and source code are matched correctly

 

Any suggestion on what else can cause this?

0 Kudos
yuzhang3_intel
Moderator
2,222 Views

The binary/symbol and source path set correctly?

0 Kudos
SampathRachumallu
2,221 Views

Yes
I downloaded the binary and source path from remote machine and added them to the correct search path in my windows vtune GUI

0 Kudos
SampathRachumallu
2,126 Views

I added -O2 flag to the compilation this time and this seems to have resolved the issue
Now i am able to see the correct mapping. Thanks!

0 Kudos
yuzhang3_intel
Moderator
2,075 Views

Great! The original option you used is -O3?

0 Kudos
SampathRachumallu
2,053 Views

I have not used any optimization flag previously
I have actually added O3 flag now, not O2. There was a typo in previous answer. Kindly note

0 Kudos
Reply