Re: VTune "context already exists" on Source Analysis of a custom SYCL plugin in PyTorch

martinmcgg · ‎01-19-2024

Hello,

I am optimizing SYCL kernels after porting them from CUDA (stylegan3) and can't get the Source Analysis of "GPU Compute/Media Hotspots (preview)" to work for one of the kernels.

When the "filtered_lrelu" kernel runs (either as a part of a network inference via test_inference_simple.py, or separately via test_kernels.py), I get the following error (vtune-gui stdout after profiling the bench3.sh script):

+ python test_inference_simple.py
[...]
ZET_ENABLE_API_TRACING_EXP is deprecated. Use ZE_ENABLE_TRACING_LAYER instead
ZET_ENABLE_API_TRACING_EXP is deprecated. Use ZE_ENABLE_TRACING_LAYER instead
[...]
ZET_ENABLE_API_TRACING_EXP is deprecated. Use ZE_ENABLE_TRACING_LAYER instead
WARNING: SYCL_CACHE_DISABLE_PERSISTENT environment variable is deprecated and has no effect. By default, persistent device code caching is disabled. Use SYCL_CACHE_PERSISTENT=1/0 to enable/disable.
GTPin ERROR: Create Context failed - context already exists
at: CreateContext : 898

When I switch the mode to Characterization, or execute only the other two kernels ("bias_act" and "upfirdn2d"), profiling works fine, but I would like more granularity to see which part of the "filtered_lrelu" kernel is the bottleneck.

Have you encountered this/similar issue? Do you have an idea what may be the cause, or any details about this error?

Details: I run the Python scripts through a script bench3.sh, which sets the needed Conda environment. The environment can be created using the instructions in README, in case anyone wants to replicate it (I can help if necessary).

Note: since the kernel build takes long (or even gets stuck?) when running through VTune, I run my program once without profiling to JIT-build the kernel, then set allow_module_rebuild = False in custom_ops.py and then run it through VTune, only loading the previously loaded kernel.

My HW+SW: i5-13500 + A770, Ubuntu 22.04, VTune Profiler 2024.0.0 (build 626834), Intel oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)

I am happy to answer any additional questions .

Thank you in advance for any tips.

Jennifer_D_Intel · ‎01-23-2024

Thank you for reporting this. Unfortunately I don't have a quick answer for you, but I've passed the info to the developers so they can take a look.

martinmcgg · ‎02-07-2024

Thank you for getting back to me. Here is the requested VTune output directory that was generated when the error happens. I also attached a log for context how the directory was generated: first run of the script to build the modules; then disabled modules build; optional second run to ensure everything works; and finally a run through VTune showing the messages on output.

martinmcgg · ‎02-16-2024

I tried updating to VTune 2024.0.1 (Ubuntu package version 2024.0.1-11, previously I used 2024.0.0-49490), the behavior is the same.

I have also noticed that running the inference with the stylegan2 architecture* reaches the same error, despite it using different kernels (bias_act and upfirdn2d, but not filtered_lrelu, which causes the same crash from test_kernels.py). Therefore I think this issue is unlikely to be caused by one specific kernel, but rather GTPin itself or rather its API usage, possibly when the profiled app has multiple threads. Speculating: perhaps a GTPin context is being created simultaneously from multiple threads?

I briefly looked at https://www.intel.com/content/www/us/en/developer/articles/tool/gtpin.html and downloaded GTPin (4.0) separately to see where the message comes from. 'Create Context failed - context already exists' is present in libgtpin_core.so, whose source I didn't find. If I manage to dig deeper, I will post an update, but if anyone else has an idea what could be the cause, I would be glad for a pointer.

* by uncommenting the last line in bench3.sh and commenting/removing the preceding Python executions and exit

yonyon · ‎02-27-2024

I have a same issue for alderlake igpu on ubuntu 22.04 with the 24.0.1 vtune.

martinmcgg · ‎04-18-2024

I reproduced the issue with VTune Profiler 2024.1.0 (build 627630). Attaching up-to-date logs+outputs, in case it is useful for debugging:

bench3_native.log: running the bench3.sh script (see the source/repository link above) directly, without a profiler. All three kernels run successfully.
r052gh_2024-04-18.log: running the same script inside VTune (with "-collect gpu-hotspots -knob profiling-mode=source-analysis"). Two of the three kernels work, but on the third one ("filtered_lrelu"), VTune stops with the same error (just the line(?) number increased):

GTPin ERROR: Create Context failed - context already exists

at: CreateContext : 980

r052gh_2024-04-18.tar.xz: the directory produced by VTune before the crash.

I tried running my program/script through GTPin version 4.0 (5916640e) (with just "-t funtime"), rather than through the full VTune, but the program (Python / Intel Extension for PyTorch) crashes when initializing the GPU when running the first custom kernel:

bench3_in_gtpin_crash+backtrace.log: "Segmentation fault (core dumped) python test_kernels.py". Inspecting the core dump shows the crash happens in the function xpu::dpcpp::initGlobalDevicePoolState() in libintel-ext-pt-gpu.so. A backtrace is included; the core dump was too large to attach (almost 200 MB compressed by xz), but if interested, I can upload it elsewhere.

If there is any detail I missed or if you want me to test anything else, please let me know.

Jennifer_D_Intel · ‎04-22-2024

Thank you for the additional logs. The development team is looking into them.

VTune "context already exists" on Source Analysis of a custom SYCL plugin in PyTorch

Intel VTune™ Profiler