Showing results for 
Search instead for 
Did you mean: 
New Contributor I

GPU freezes until rebooting (USM)



I have found that when using a single huge buffer (3.5GiB) for output computation, it segfaults (if printing values) or freezes (if not printing the values) with USM.

When not printing the values:

#0  0x00007ffff6e7550b in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00007ffff1247251 in NEO::Drm::ioctl(unsigned long, void*) ()
   from /lib/x86_64-linux-gnu/
#2  0x00007ffff123e627 in NEO::BufferObject::wait(long) ()
   from /lib/x86_64-linux-gnu/
#3  0x00007ffff122fd92 in NEO::MemoryManager::freeGraphicsMemory(NEO::GraphicsAllocation*) ()
   from /lib/x86_64-linux-gnu/
#4  0x00007ffff101f2b2 in L0::FenceImp::~FenceImp() () from /lib/x86_64-linux-gnu/
#5  0x00007ffff101f2cd in L0::FenceImp::~FenceImp() () from /lib/x86_64-linux-gnu/
#6  0x00007ffff108e2ee in zeFenceDestroy () from /lib/x86_64-linux-gnu/
#7  0x00007ffff4b601dd in piQueueRelease ()
   from /opt/intel/oneapi/compiler/2021.1-beta10/linux/lib/
#8  0x00007ffff6fbfae3 in std::_Sp_counted_ptr_inplace<cl::sycl::detail::queue_impl, std::allocator<cl::sycl::detail::queue_impl>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() ()
   from /opt/intel/oneapi/compiler/2021.1-beta10/linux/lib/
#9  0x000000000040b0c9 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x82afb0)
    at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:155
#10 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffffffc7a8)
    at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:730
#11 std::__shared_ptr<cl::sycl::detail::queue_impl, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (
    at /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:1169
#12 cl::sycl::queue::~queue (this=0x7fffffffc7a0)
    at /opt/intel/oneapi/compiler/2021.1-beta10/linux/include/sycl/CL/sycl/queue.hpp:46
#13 Options::~Options (this=0x7fffffffc6c0)
    at /home/user/pone.cpp:54
#14 main (argc=<optimized out>, argv=<optimized out>)
    at /home/user/pone.cpp:676

For example, after the GPU computation, when I try to read the values in host (a simple std::cout of some specific values of the huge buffer).

The problem comes here: the gpu cannot be used again (even a simple `intel_gpu_frequency` is blocked forever). I am using an i5 with a 630 GPU. I am running with a bit less frequency and with /sys/module/i915/parameters/enable_hangcheck to N.

I waited up to 7h. Checking the `intel_gpu_top` it has been with the Render/3D/0 engine bar at 99-100% all the time (2100 MiB/s IMC reads, around 2.2Watts)

If I try to execute again the program, it is freezed here:

Thread 1 "pone_auto" received signal SIGINT, Interrupt.
0x00007ffff6e6389b in sched_yield () at ../sysdeps/unix/syscall-template.S:78
78      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007ffff6e6389b in sched_yield () at ../sysdeps/unix/syscall-template.S:78
#1  0x00007ffff101e08e in L0::EventImp::hostSynchronize(unsigned long) ()
   from /lib/x86_64-linux-gnu/
#2  0x00007ffff4b641e7 in piEventsWait ()
   from /opt/intel/oneapi/compiler/2021.1-beta10/linux/lib/
#3  0x00007ffff70b4ec8 in cl::sycl::detail::event_impl::waitInternal() const ()
   from /opt/intel/oneapi/compiler/2021.1-beta10/linux/lib/
#4  0x00007ffff70b5d80 in cl::sycl::detail::event_impl::wait(std::shared_ptr<cl::sycl::detail::event_impl>) const () from /opt/intel/oneapi/compiler/2021.1-beta10/linux/lib/
#5  0x00007ffff71523cd in cl::sycl::event::wait() ()
   from /opt/intel/oneapi/compiler/2021.1-beta10/linux/lib/
#6  0x0000000000405f1b in process_pone (cpu=false, opts=opts@entry=0x7fffffffc6f0)
    at /home/user/pone.cpp:150
#8  main (argc=<optimized out>, argv=<optimized out>)
    at /home/user/pone.cpp:573

The only solution is to reboot.

Any idea how can I reuse the GPU without rebooting?  I accept the segfault, but not that the GPU is freezed forever.

I tried to use the `intel_gpu_abrt` (in case it can help, I don't know), but it says `bad substitution`.

Of course, there are no zombie/alive processes after it segfaults. So, I cannot force killing anything.

Something important is that the max alloc is 3.06GiB. So, I don't understand why it allows computing. On the other side, with OpenCL I can run and print the values (no idea why).


Global memory size 6577778688 (6.126GiB)
Max memory allocation 3288889344 (3.063GiB)


So, summary. I can in theory alloc a buffer of up to 3.06GiB. I alloc a buffer of 3.5GiB.

  • USM: freezes (when destroying the queue without printing the values) or segfaults (when printing values computed in the gpu)
  • NO USM: finishes correctly printing or not the values.
  • OpenCL: finishes correctly printing or not the values.
0 Kudos
1 Reply



Kindly mention your oneAPI base toolkit version and OS details.

Max memory allocation by definition is the maximum memory that can be allocated on a single data structure.


In case of the buffer/accessor model, if you try to allocate memory greater than the limit (to a single buffer), it should ideally fail. But, in your case, it didn't. Could you please share the reproducer code for the same?


Also, in the case of USM, can you try to include exception handling in your code and let me know if it helps. Please share the reproducer code for USM as well.





0 Kudos