Need help: I get unexpected results using opencl 2.0 atomics on HD5500?

laura_t_ · ‎04-27-2015

Hi,

I am trying opencl 2.0 atomics on HD5500, following the https://software.intel.com/en-us/articles/using-opencl-20-atomics.

I use CL_DRIVER_VERSION: 10.18.14.4029.

But I find the atomic operations result is not as expected. The simplified version test is:

kernel void atomics_test(global int *output, volatile global atomic_int* atomicBuffer, uint iterations, uint offset)
{
for (int j = 0; j < MY_INNER_LOOP; j++)

atomic_fetch_add_explicit(&atomicBuffer[0], MY_ADD_VALUE, memory_order_relaxed, memory_scope_device);

}

I only run the kernel on GPU with 1 thread.

Before running the kernel, atomicBuffer[0] is initialized to 1.

Result:

MY_ADD_VALUE=1, MY_INNER_LOOP=1-->atomicBuffer[0]=7 (it seems to be 1+1*6)

MY_ADD_VALUE=1, MY_INNER_LOOP=2-->atomicBuffer[0]=13 (it seems to be 7+1*6)

MY_ADD_VALUE=1, MY_INNER_LOOP=3-->atomicBuffer[0]=19 (it seems to be 13+1*6)

MY_ADD_VALUE=2, MY_INNER_LOOP=1-->atomicBuffer[0]=13 (it seems to be 1+2*6)

MY_ADD_VALUE=2, MY_INNER_LOOP=2-->atomicBuffer[0]=25 (it seems to be 13+2*6)

It seems that atomic_fetch_add does (atom_variable+MY_ADD_VALUE*6), not (atom_variable+MY_ADD_VALUE).

Is it a known issue? Or is my test somewhere wrong?

Robert_I_Intel · ‎04-28-2015

Hi Laura,

1. Could you please provide a small reproducer program where you see this behavior?

2. Could you please also update the driver 10.18.10.4170?

Robert_I_Intel · ‎04-28-2015

Hi Laura,

You said: "I only run the kernel on GPU with 1 thread. " - Do you mean that you enqueue that kernel with the global NDRange of 1?

Thanks!

laura_t_ · ‎04-28-2015

clEnqueueNDRangeKernel(
           commandQueue,
           atomic_test_kernel,
           1,
           NULL,
           &global_threads,
           &local_threads,
           0,
           NULL,
           &ndrEvt);

By "I only run the kernel on GPU with 1 thread. ", I mean work_dim=1, global_threads=1, local_threads=1.

Robert_I_Intel · ‎05-01-2015

Hi Laura,

Could you please attach a complete buildable sample, so I can try it out and reproduce the issue?

Thanks!