Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5097 Discussions

What is __opencl_emutls_get_address ?

TomClabault
New Contributor I
1,333 Views

I'm writing a ray tracer using SYCL and I've found using Intel VTune that `__opencl_emutls_get_address` takes a significant amount of CPU/GPU time (around 25%).

 

Screenshot from 2023-10-29 18-29-47.png

The function is called (under the hood) by various functions of my code. All the function listed above are hand written and not part of any library.

On this profile session, `__opencl_emutls_get_address` took 156s of CPU time out of 570s total. It is the biggest bottleneck of the application at moment (4 times as costly as triangle intersection...).

What is it and what does it do exactly?

My application was compiled in "Release with debug information" using ICPX from the Intel oneAPI Base kit 2023.2.1.

0 Kudos
1 Solution
TomClabault
New Contributor I
1,083 Views

Hi,

 

I managed to reduce the overhead of __opencl_emutls_get_address by removing the SYCL_EXTERNAL attribute from the declaration of the functions that were showing up in the report of VTune.

In the end, I'm not sure what SYCL_EXTERNAL is used for considering my code compiles and executes fine without it but it solved my issue.

View solution in original post

0 Kudos
6 Replies
SeshaP_Intel
Moderator
1,292 Views

Hi,


Thank you for posting in Intel Communities.


Could you please provide the following details you were using so we can investigate the issue from our end?

1. Complete reproducer code with steps

2. Hardware details


Thanks and Regards,

Pendyala Sesha Srinivas


0 Kudos
TomClabault
New Contributor I
1,281 Views

Hi @SeshaP_Intel ,

 

Inlining the functions that were causing __opencl_emutls_get_address to be called seemed to have solved the issue. __opencl_emutls_get_address isn't called by inlined functions.

The behavior I observed didn't seem to be an issue of the SYCL implementation but rather a lack of understanding on my end so I was mainly asking for technical details about how this works and what __opencl_emutls_get_address  does for the application.

0 Kudos
AthiraM_Intel
Moderator
1,230 Views

Hi,


Thank you for the update.


To assist you further, please share below details:


  1. OS and Hardware details
  2. VTune version
  3. Sample reproducer code and exact steps to reproduce the issue from our end



Thanks


0 Kudos
AthiraM_Intel
Moderator
1,122 Views

Hi,


We have not heard back from you. Could you please give us an update?



Thanks


0 Kudos
TomClabault
New Contributor I
1,084 Views

Hi,

 

I managed to reduce the overhead of __opencl_emutls_get_address by removing the SYCL_EXTERNAL attribute from the declaration of the functions that were showing up in the report of VTune.

In the end, I'm not sure what SYCL_EXTERNAL is used for considering my code compiles and executes fine without it but it solved my issue.

0 Kudos
AthiraM_Intel
Moderator
1,009 Views


Hi,


Glad to know that your issue is resolved. 


SYCL_EXTERNAL is an optional macro that enables external linking of SYCL functions and methods to be included inside a SYCL kernel. To access the host (CPU) functions from the device (GPU), we need to label that function with the SYCL_EXTERNAL macro.


Please refer the following documentation for detailed information:

https://www.intel.com/content/www/us/en/developer/articles/technical/use-the-sycl-external-macro.html


Since your issue is resolved, please post any additional questions in a new thread. This thread will be no longer monitored by Intel.



Thanks




0 Kudos
Reply