OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

Need help: I get unexpected results using opencl 2.0 atomics on HD5500?

laura_t_
Beginner
491 Views

Hi, 

I am trying opencl 2.0 atomics on HD5500, following the https://software.intel.com/en-us/articles/using-opencl-20-atomics.

I use CL_DRIVER_VERSION: 10.18.14.4029.

But I find the atomic operations result is not as expected.     The simplified version test is:

kernel void atomics_test(global int *output, volatile global atomic_int*  atomicBuffer, uint iterations, uint offset)
{
    for (int j = 0; j < MY_INNER_LOOP; j++)

        atomic_fetch_add_explicit(&atomicBuffer[0], MY_ADD_VALUE, memory_order_relaxed, memory_scope_device);

}

 

I only run the kernel on GPU with 1 thread.    

Before running the kernel, atomicBuffer[0] is initialized to 1.

Result:

MY_ADD_VALUE=1, MY_INNER_LOOP=1-->atomicBuffer[0]=7  (it seems to be 1+1*6)

MY_ADD_VALUE=1, MY_INNER_LOOP=2-->atomicBuffer[0]=13 (it seems to be 7+1*6)

MY_ADD_VALUE=1, MY_INNER_LOOP=3-->atomicBuffer[0]=19  (it seems to be 13+1*6)

 

MY_ADD_VALUE=2, MY_INNER_LOOP=1-->atomicBuffer[0]=13  (it seems to be 1+2*6)

MY_ADD_VALUE=2, MY_INNER_LOOP=2-->atomicBuffer[0]=25  (it seems to be 13+2*6)

 

It seems that atomic_fetch_add does (atom_variable+MY_ADD_VALUE*6), not (atom_variable+MY_ADD_VALUE).

 

Is it a known issue?   Or is my test somewhere wrong?

 

 

0 Kudos
4 Replies
Robert_I_Intel
Employee
491 Views

Hi Laura,

1. Could you please provide a small reproducer program where you see this behavior?

2. Could you please also update the driver 10.18.10.4170?

0 Kudos
Robert_I_Intel
Employee
491 Views

Hi Laura,

You said: "I only run the kernel on GPU with 1 thread. " - Do you mean that you enqueue that kernel with the global NDRange of 1?

 

Thanks!

0 Kudos
laura_t_
Beginner
491 Views

clEnqueueNDRangeKernel(
            commandQueue,
            atomic_test_kernel,
            1,
            NULL,
            &global_threads,
            &local_threads,
            0,
            NULL,
            &ndrEvt);  

 

By "I only run the kernel on GPU with 1 thread. ", I mean work_dim=1, global_threads=1, local_threads=1.    

 

0 Kudos
Robert_I_Intel
Employee
491 Views

Hi Laura,

Could you please attach a complete buildable sample, so I can try it out and reproduce the issue?

Thanks!

0 Kudos
Reply