When I run the DPC++ compatibility tool on CUDA code I see that every one of my kernel launches in the generated cpp code is bracketed by a try-catch construct. It has been my experience that 'try' can be expensive in execution time (as well the experience of others, e.g. https://stackoverflow.com/questions/52312/what-is-the-real-overhead-of-try-catch-in-c). Is there a way to disable or consolidate the try-catch so that it doesn't occur at each kernel launch but perhaps after a block of launches. We might lose the ability to isolate errors to a particular launch, but that might be a reasonable tradeoff for performance in situations where large numbers of kernels must be launched.
As we haven't heard back from you, we are considering that your issue has been resolved.
We will no longer monitor this thread. If you require any additional assistance from Intel, please start a new thread.
Any further interaction in this thread will be considered community only.