Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*

SYCL kernel hangs on long workloads

TomClabault
New Contributor I
4,571 Views

This post is basically a duplicate yet condensed version of all that happened on this post on the codeplay forum.

 

I have an issue where my SYCL kernels hangs and never finishes after it’s been computing for some time (~10s). This only happens when sycl::gpu_selector_v is selected for the sycl::queue command queue. Using sycl::cpu_selector_v shows no issues.
The kernel only hangs when there is a lot of computations to be done (see the details about LOOP_ITERATION and N below). One detail that might be relevant is that at first, just after launching the program, my laptop is barely usable due to the integrated GPU being used at its maximum. After a few seconds though, my laptop becomes usable again (as if it weren’t computing anything anymore) but the program still is running (and will never stop). At that point, my CPU will show a usage of ~50% (when using sycl::gpu_selector_v) until I decide to manually stop the program.

I managed to reproduce this issue on a simple example:

#include <sycl/sycl.hpp>
#include <vector>

#define LOOP_ITERATION 10000000
#define N 1000000

int main()
{
    std::vector<float> v(N);

    sycl::queue q{sycl::gpu_selector_v};
    sycl::buffer buf{v};

    q.submit([&](sycl::handler& cgh)
    {
        auto acc {buf.get_access(cgh,sycl::read_write)};

        cgh.parallel_for(N, [=](sycl::id<1> id)
        {
            float x = 0.0f;
            for (int i = 0; i < LOOP_ITERATION; i++)
                x += i / 2;

            acc[id] = x;
        });
    }).wait();

    std::cout << "Done!" << std::endl;

    return 0;
}

With the code posted above, I can only get the kernel to hang when N * LOOP_ITERATION is > 1 000 000 * 10 000 000. However, the kernel can still hang with lower LOOP_ITERATION (or N) values if we increase the complexity of the code inside the for (int i = 0; i < LOOP_ITERATION; i++) loop:

#define LOOP_ITERATION (10000000 / 10) //10 times less iterations
#define N 1000000

cgh.parallel_for(N, [=](sycl::id<1> id)
        {
            float x = 0.0f;
            for (int i = 0; i < LOOP_ITERATION; i++)
            {
                x += i / 2;

                float cosine = sycl::cos(sycl::sqrt(x));
                float sine = sycl::sin(x);
                float length = sycl::sqrt(cosine * cosine + sine * sine);

                x /= sycl::cos(length) * sycl::sin(length);
            }

            acc[id] = x;
        });

With LOOP_ITERATION divided by 10, the kernel never hangs unless the LOOP_ITERATION loop becomes more computationally demanding.

Sometimes but not always, this runtime error is thrown:

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -14 (PI_ERROR_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST) -14 (PI_ERROR_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST)

I tried disabling the GPU hangcheck using this guide but to no avail. I also tried writing N to /sys/module/i915/parameters/enable_hangcheck and the solutions proposed here but none of this changed anything either…

For my immediate use case (ray tracing application where rendering an image with too many samples takes too long and hangs the kernel), there is a way for me to get around this issue by calling multiple smaller kernels (smaller amount of samples) but this isn't really a satisfactory solution, more like a workaround.

If relevant, I attached the output of the execution of the `clinfo` command on my system.

0 Kudos
13 Replies
VaishnaviV_Intel
Employee
4,487 Views

Hi,

 

Thanks for posting on Intel communities.

We are working on it internally. We will get back to you soon.

 

Thanks & Regards,

Vankudothu Vaishnavi.


0 Kudos
VaishnaviV_Intel
Employee
4,400 Views

Hi,

 

Thanks for your patience and understanding.

Could you please try upgrading the GPU driver? A new version of oneAPI 2024.0, will be available by the end of November. It would be great if you could test the new version and let us know if the issue remains the same.

 

Thanks & Regards,

Vankudothu Vaishnavi.


0 Kudos
TomClabault
New Contributor I
4,388 Views

Hi,

 

How can I upgrade my drivers? I'm not sure how to do it on Ubuntu 20.04. I tried adding a PPA following this post but it didn't change anything regarding my kernel execution.

 

Tom Clabault.

0 Kudos
VaishnaviV_Intel
Employee
4,333 Views

Hi,

 

Could you please try following the steps mentioned in the below link,

https://dgpu-docs.intel.com/driver/installation.html

And try with the new oneAPI 2024.0.0 version which is yet to be released this month.

 

Thanks & Regards,

Vankudothu Vaishnavi.

 

 

0 Kudos
TomClabault
New Contributor I
4,292 Views

Hi,

I followed these installation instructions (for Intel client GPUs) on Ubuntu 20.04 which I am running but my kernel still hangs.

So I just have to wait for the oneAPI version 2024.0.0 to be released then?

0 Kudos
VaishnaviV_Intel
Employee
4,183 Views

Hi,

 

The latest version, oneAPI 2024.0.0, is now accessible. Kindly download the Intel basekit by visiting the following link:

https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html

After downloading, test it with the latest version and let us know if you are still facing the same issue.

 

Thanks & Regards,

Vankudothu Vaishnavi.

 

0 Kudos
TomClabault
New Contributor I
4,158 Views

Hi,

 

I upgraded to oneAPI 2024.0.0 but even after recompiling an example that was hanging before the upgrade, it stills hangs after the upgrade.

I'm afraid the upgrade to 2024.0.0 didn't solve my issue.

0 Kudos
VaishnaviV_Intel
Employee
4,116 Views

Hi,

 

Thanks for testing it on the latest oneAPI version(2024.0.0).

Could you please provide us with the following details?

  1. Output of sycl-ls
  2. Clinfo after upgrading the graphics driver.
  3. Please share us your output screenshot after executing the code.

 

Thanks & Regards,

Vankudothu Vaishnavi.


0 Kudos
TomClabault
New Contributor I
4,102 Views

Hi,

Here's a screenshot of the output of sycl-ls:

Screenshot from 2023-11-20 22-11-20.png

And a pastebin of the output of clinfo.

0 Kudos
VaishnaviV_Intel
Employee
4,065 Views

 

Hi,

 

Could you please share additional information with us?

Please set SYCL_PI_TRACE=2 and capture the output logs. You can find details about SYCL_PI_TRACE options at https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_pi_trace-options

This would greatly help us.

 

Thanks & Regards,

Vankudothu Vaishnavi.

 

0 Kudos
TomClabault
New Contributor I
4,023 Views

Hi,

Here is a pastebin of the output I get when running

SYCL_PI_TRACE=2 ./test_hang_2024.0

The last line of the Pastebin:

UR <--- UrQueue->executeAllOpenCommandLists()(UR_RESULT_SUCCESS)

is the output line I get before the program hangs. No more output is generated afterwards but the program is still running, hung up.

 

Tom

0 Kudos
VaishnaviV_Intel
Employee
3,943 Views

Hi,


We are working on your issue internally. We'll get back to you soon.


Thanks & Regards,

Vankudothu Vaishnavi.


0 Kudos
TomClabault
New Contributor I
3,763 Views

Hi,

I have not heard back from you, could you please give me an update?

Tom

0 Kudos
Reply