Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
655 Discussions

What USM access is possible without usm_atomic_host/shared_allocations

MikeDB
Novice
178 Views

Hi,

 

I'm working on the 11th gen integrated Iris Xe GPU. This GPU does *not* advertise the aspects aspect::usm_atomic_host_allocations or aspect::usm_atomic_shared_allocations.

 

Does this mean that any concurrent access at all from CPU and GPU to USM shared or host memory is unsafe, unportable or could crash?

Is it even safe to do concurrent atomic access between different SYCL kernel work-items to USM host or shared memory ?

 

Or does it mean more narrowly that the specific atomic and fence guarantees in the SYCL 2020 spec are not supported - https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_interaction_with_host_code

 

One possible method for a workgroup to flag that processing has finished for that workgroup's data is finished and ready to access from the CPU would be to sychronize all the workgroup work-items using a barrier and then write a flag in the output data to show that output is complete before the workgroup finishes processing.

 

output_data = sycl::malloc_shared();   // output data for kernel with separate ranges for each workgroup
// zero initialise output_data before kernel runs

const auto e = q.parallel_for ( sycl::nd_range { global_range, local_range },
[ = ] ( sycl::nd_item<2> idx )
{
  // write N results in output_data from workgroup_offset 

  idx.get_group().barrier();

 

  if (idx.get_group().leader())

  {
    // flag to CPU that this workgroup has finished writing its results
    output_data[workgroup_offset + N] = 1;

  }

} );

 

Is this likely to crash or fail to guarantee the output_data results are visible to the CPU if the flag is set and cause race conditions?

 

0 Kudos
1 Reply
Alex_Y_Intel
Moderator
85 Views

It's true that 11th generation integrated Iris Xe GPU does not support the SYCL aspects "aspect::usm_atomic_host_allocation" and "aspect::usm_atomic_shared_allocations," so there are indeed potential limiations or incompatibility when using these aspects. Please check https://github.khronos.org/SYCL_Reference/iface/interaction-with-host-code.html and https://github.com/IntelPython/dpctl/issues/309

0 Kudos
Reply