Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5253 토론

What is __opencl_emutls_get_address ?

TomClabault
새로운 기여자 I
2,269 조회수

I'm writing a ray tracer using SYCL and I've found using Intel VTune that `__opencl_emutls_get_address` takes a significant amount of CPU/GPU time (around 25%).

 

Screenshot from 2023-10-29 18-29-47.png

The function is called (under the hood) by various functions of my code. All the function listed above are hand written and not part of any library.

On this profile session, `__opencl_emutls_get_address` took 156s of CPU time out of 570s total. It is the biggest bottleneck of the application at moment (4 times as costly as triangle intersection...).

What is it and what does it do exactly?

My application was compiled in "Release with debug information" using ICPX from the Intel oneAPI Base kit 2023.2.1.

0 포인트
1 솔루션
TomClabault
새로운 기여자 I
2,019 조회수

Hi,

 

I managed to reduce the overhead of __opencl_emutls_get_address by removing the SYCL_EXTERNAL attribute from the declaration of the functions that were showing up in the report of VTune.

In the end, I'm not sure what SYCL_EXTERNAL is used for considering my code compiles and executes fine without it but it solved my issue.

원본 게시물의 솔루션 보기

0 포인트
6 응답
SeshaP_Intel
중재자
2,228 조회수

Hi,


Thank you for posting in Intel Communities.


Could you please provide the following details you were using so we can investigate the issue from our end?

1. Complete reproducer code with steps

2. Hardware details


Thanks and Regards,

Pendyala Sesha Srinivas


0 포인트
TomClabault
새로운 기여자 I
2,217 조회수

Hi @SeshaP_Intel ,

 

Inlining the functions that were causing __opencl_emutls_get_address to be called seemed to have solved the issue. __opencl_emutls_get_address isn't called by inlined functions.

The behavior I observed didn't seem to be an issue of the SYCL implementation but rather a lack of understanding on my end so I was mainly asking for technical details about how this works and what __opencl_emutls_get_address  does for the application.

0 포인트
AthiraM_Intel
중재자
2,166 조회수

Hi,


Thank you for the update.


To assist you further, please share below details:


  1. OS and Hardware details
  2. VTune version
  3. Sample reproducer code and exact steps to reproduce the issue from our end



Thanks


0 포인트
AthiraM_Intel
중재자
2,058 조회수

Hi,


We have not heard back from you. Could you please give us an update?



Thanks


0 포인트
TomClabault
새로운 기여자 I
2,020 조회수

Hi,

 

I managed to reduce the overhead of __opencl_emutls_get_address by removing the SYCL_EXTERNAL attribute from the declaration of the functions that were showing up in the report of VTune.

In the end, I'm not sure what SYCL_EXTERNAL is used for considering my code compiles and executes fine without it but it solved my issue.

0 포인트
AthiraM_Intel
중재자
1,945 조회수


Hi,


Glad to know that your issue is resolved. 


SYCL_EXTERNAL is an optional macro that enables external linking of SYCL functions and methods to be included inside a SYCL kernel. To access the host (CPU) functions from the device (GPU), we need to label that function with the SYCL_EXTERNAL macro.


Please refer the following documentation for detailed information:

https://www.intel.com/content/www/us/en/developer/articles/technical/use-the-sycl-external-macro.html


Since your issue is resolved, please post any additional questions in a new thread. This thread will be no longer monitored by Intel.



Thanks




0 포인트
응답