Re: Issue with VTune Source analysis for gpu hotspot

SampathRachumallu · ‎05-06-2025

Hi,

I am getting below warning message when trying to get the GPU Hotspot analysis with VTune for SYCL application

vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis 
topic in the product online help for instructions

I have tried enabling the debug information as mentioned in the post using -gline-tables-only and -fdebug-info-for-profiling flags. But with -gline-tables-only i was getting below error

llvm-foreach: Segmentation fault (core dumped)
icpx: error: gen compiler command failed with exit code 254 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2025.1.1 (2025.1.1.20250418)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2025.1/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2025.1/bin/compiler/../icpx.cfg
icpx: note: diagnostic msg: Error generating preprocessed source(s)

Please help in resolving this issue
Below are the config:
OS: Ubuntu 22.04.4 LTS
GPU: Intel Data Center GPU Max 1550 (PVC)
Vtune: Intel(R) VTune(TM) Profiler 2025.3.0 (build 630104)
compiler: Intel(R) oneAPI DPC++/C++ Compiler 2025.1.1 (2025.1.1.20250418)

Output of sycl-ls

[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Data Center GPU Max 1550 12.60.7 [1.6.31294+9]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Xeon(R) Platinum 8352Y CPU @ 2.20GHz OpenCL 3.0 (Build 0) [2025.19.4.0.18_160000.xmain-hotfix]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1550 OpenCL 3.0 NEO  [24.39.31294]

yuzhang3_intel · ‎05-06-2025

This warning just tells you VTune can't locate the Linux kernel, do you need to trace code in the kernel? I suppose you want to profile the sycl application, so you only build your application with debug information.

You can refer to the cmake file in the oneapi sample code below:

oneAPI-samples/Tools/VTuneProfiler/matrix_multiply_vtune/CMakeLists.txt

set(CMAKE_CXX_COMPILER icpx)
cmake_minimum_required(VERSION 3.4)
project(matrix_multiply)
set(CMAKE_CXX_FLAGS "-g -O3 -fsycl -Wno-write-strings -w -D_Linux")
add_executable(matrix.dpcpp src/matrix.cpp src/multiply.cpp)
add_custom_target(run ./matrix.dpcpp)

https://github.com/oneapi-src/oneAPI-samples

SampathRachumallu · ‎05-06-2025

Hi,
Thanks for the reply

I need to trace the code in the kernel and findout which part of the source code is creating the bottleneck
Does -g helping in source analysis for VTune?

I have just tried the vtune self checker to verify if there is any issue with installation

It gave below output

The system is ready for the following analyses:
* Performance Snapshot
* Hotspots and Threading with user-mode sampling
* Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
* Microarchitecture Exploration
* Memory Access
* Hotspots with HW event-based sampling and call stacks
* Threading with HW event-based sampling

The following analyses have failed on the system:
* GPU Compute/Media Hotspots (characterization mode)
* GPU Compute/Media Hotspots (source analysis mode)

Attaching the complete log

yuzhang3_intel · ‎05-06-2025

If you need to profile kernel code, you need to build a kernel with debug information using option, '-g'.

https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2025-1/enabling-linux-kernel-analysis.html

You can profile a GPU workload to see if the gpu profiling is ready, like:

vtune -collect gpu-hotspots -- {your application}

SampathRachumallu · ‎05-07-2025

Hi @yuzhang3_intel ,

Thanks for clarification

I do get the source code analysis enabled now with the -g flag. But, it is pointing to incorrect lines in the source code mapping window

I have multiple variations of a kernel in .hpp and in the source code analysis it is mapping the metrics to the variant which is not even getting executed. Also i am running the vtune analysis in the remote linux machine, getting the vtune output dumps there. I am downloading this output folder and visualizing the results in my windows machine

Will this create any problem with the mapping?

I am also attaching the source code files for your reference (reduction_sum_1d_header.c is the actual kernel file)

Note: I have renamed the .hpp file to _header.c since there portal is reporting some issue with the content of .hpp file

yuzhang3_intel · ‎05-07-2025

The binary with debug information must be matched with the source. For example, if you rebuild the kernel, you need to re-map the binary with the corresponding source code.

SampathRachumallu · ‎05-07-2025

I just rechecked
The binary and source code are matched correctly

Any suggestion on what else can cause this?

yuzhang3_intel · ‎05-07-2025

The binary/symbol and source path set correctly?

SampathRachumallu · ‎05-07-2025

Yes
I downloaded the binary and source path from remote machine and added them to the correct search path in my windows vtune GUI

SampathRachumallu · ‎05-07-2025

I added -O2 flag to the compilation this time and this seems to have resolved the issue
Now i am able to see the correct mapping. Thanks!

yuzhang3_intel · ‎05-07-2025

Great! The original option you used is -O3?

SampathRachumallu · ‎05-07-2025

I have not used any optimization flag previously
I have actually added O3 flag now, not O2. There was a typo in previous answer. Kindly note

Issue with VTune Source analysis for gpu hotspot

Intel VTune™ Profiler