- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm writing a ray tracer using SYCL and I've found using Intel VTune that `__opencl_emutls_get_address` takes a significant amount of CPU/GPU time (around 25%).
The function is called (under the hood) by various functions of my code. All the function listed above are hand written and not part of any library.
On this profile session, `__opencl_emutls_get_address` took 156s of CPU time out of 570s total. It is the biggest bottleneck of the application at moment (4 times as costly as triangle intersection...).
What is it and what does it do exactly?
My application was compiled in "Release with debug information" using ICPX from the Intel oneAPI Base kit 2023.2.1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I managed to reduce the overhead of __opencl_emutls_get_address by removing the SYCL_EXTERNAL attribute from the declaration of the functions that were showing up in the report of VTune.
In the end, I'm not sure what SYCL_EXTERNAL is used for considering my code compiles and executes fine without it but it solved my issue.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
Could you please provide the following details you were using so we can investigate the issue from our end?
1. Complete reproducer code with steps
2. Hardware details
Thanks and Regards,
Pendyala Sesha Srinivas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @SeshaP_Intel ,
Inlining the functions that were causing __opencl_emutls_get_address to be called seemed to have solved the issue. __opencl_emutls_get_address isn't called by inlined functions.
The behavior I observed didn't seem to be an issue of the SYCL implementation but rather a lack of understanding on my end so I was mainly asking for technical details about how this works and what __opencl_emutls_get_address does for the application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for the update.
To assist you further, please share below details:
- OS and Hardware details
- VTune version
- Sample reproducer code and exact steps to reproduce the issue from our end
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please give us an update?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I managed to reduce the overhead of __opencl_emutls_get_address by removing the SYCL_EXTERNAL attribute from the declaration of the functions that were showing up in the report of VTune.
In the end, I'm not sure what SYCL_EXTERNAL is used for considering my code compiles and executes fine without it but it solved my issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Glad to know that your issue is resolved.
SYCL_EXTERNAL is an optional macro that enables external linking of SYCL functions and methods to be included inside a SYCL kernel. To access the host (CPU) functions from the device (GPU), we need to label that function with the SYCL_EXTERNAL macro.
Please refer the following documentation for detailed information:
Since your issue is resolved, please post any additional questions in a new thread. This thread will be no longer monitored by Intel.
Thanks
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page