- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying opencl 2.0 atomics on HD5500, following the https://software.intel.com/en-us/articles/using-opencl-20-atomics.
I use CL_DRIVER_VERSION: 10.18.14.4029.
But I find the atomic operations result is not as expected. The simplified version test is:
kernel void atomics_test(global int *output, volatile global atomic_int* atomicBuffer, uint iterations, uint offset)
{
for (int j = 0; j < MY_INNER_LOOP; j++)
atomic_fetch_add_explicit(&atomicBuffer[0], MY_ADD_VALUE, memory_order_relaxed, memory_scope_device);
}
I only run the kernel on GPU with 1 thread.
Before running the kernel, atomicBuffer[0] is initialized to 1.
Result:
MY_ADD_VALUE=1, MY_INNER_LOOP=1-->atomicBuffer[0]=7 (it seems to be 1+1*6)
MY_ADD_VALUE=1, MY_INNER_LOOP=2-->atomicBuffer[0]=13 (it seems to be 7+1*6)
MY_ADD_VALUE=1, MY_INNER_LOOP=3-->atomicBuffer[0]=19 (it seems to be 13+1*6)
MY_ADD_VALUE=2, MY_INNER_LOOP=1-->atomicBuffer[0]=13 (it seems to be 1+2*6)
MY_ADD_VALUE=2, MY_INNER_LOOP=2-->atomicBuffer[0]=25 (it seems to be 13+2*6)
It seems that atomic_fetch_add does (atom_variable+MY_ADD_VALUE*6), not (atom_variable+MY_ADD_VALUE).
Is it a known issue? Or is my test somewhere wrong?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Laura,
1. Could you please provide a small reproducer program where you see this behavior?
2. Could you please also update the driver 10.18.10.4170?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Laura,
You said: "I only run the kernel on GPU with 1 thread. " - Do you mean that you enqueue that kernel with the global NDRange of 1?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
clEnqueueNDRangeKernel(
commandQueue,
atomic_test_kernel,
1,
NULL,
&global_threads,
&local_threads,
0,
NULL,
&ndrEvt);
By "I only run the kernel on GPU with 1 thread. ", I mean work_dim=1, global_threads=1, local_threads=1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Laura,
Could you please attach a complete buildable sample, so I can try it out and reproduce the issue?
Thanks!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page