Re: Need for atomic functions?

Altera_Forum · ‎02-24-2015

Hi folks,

I'd like to know if one assumption I do is correct. This is strictly for FPGA, so no GPU or CPU or anything else. Here it is.

Say I have some OpenCL kernel. In it, I increment a __local variable at only one place and it is not in an unrolled loop either. Since in the generated pipeline I'm going to have one work-item going through the __local write instruction at a time, is it safe to assume I don't need to use atomic_inc()?

Similarly, if my kernel has __attribute__((num_compute_units(1))), is it safe to assume I don't need atomic_inc() for a __global variable if that increment is done at only one place in the pipeline and not in an unrolled loop?

Thanks!

Altera_Forum · ‎02-25-2015

--- Quote Start ---

Hi folks,

I'd like to know if one assumption I do is correct. This is strictly for FPGA, so no GPU or CPU or anything else. Here it is.

Say I have some OpenCL kernel. In it, I increment a __local variable at only one place and it is not in an unrolled loop either. Since in the generated pipeline I'm going to have one work-item going through the __local write instruction at a time, is it safe to assume I don't need to use atomic_inc()?

Similarly, if my kernel has __attribute__((num_compute_units(1))), is it safe to assume I don't need atomic_inc() for a __global variable if that increment is done at only one place in the pipeline and not in an unrolled loop?

Thanks!

--- Quote End ---

An increment operation consists of three operations: read current value, increment, and write.

If you have two work-items in the pipeline W0 and W1, and lets say W0 is ahead in the pipeline, it is possile that when W1 is doing the read, W0 has still not performed its write yet. So, both W0 and W1 would have read the same "current" value from the memory, and they will write the same value to the memory. This is a classic scenerio for a race condition and requires atomic_inc.

If we could make the assumption that read+increment+write operations take N cycles, and two work-items are always more than N-cycles apart in the pipeline, then you would not need an atomic_inc, but we cannot make this assumption.

Or, instead of doing an increment, you were just writing to the local/global variable, we could say that the final update would be done by the last work-item (although OpenCL spec does not allow this kind of speculations).

Altera_Forum · ‎02-25-2015

--- Quote Start ---

An increment operation consists of three operations: read current value, increment, and write.

If you have two work-items in the pipeline W0 and W1, and lets say W0 is ahead in the pipeline, it is possile that when W1 is doing the read, W0 has still not performed its write yet. So, both W0 and W1 would have read the same "current" value from the memory, and they will write the same value to the memory. This is a classic scenerio for a race condition and requires atomic_inc.

If we could make the assumption that read+increment+write operations take N cycles, and two work-items are always more than N-cycles apart in the pipeline, then you would not need an atomic_inc, but we cannot make this assumption.

Or, instead of doing an increment, you were just writing to the local/global variable, we could say that the final update would be done by the last work-item (although OpenCL spec does not allow this kind of speculations).

--- Quote End ---

That makes sense.

Thanks!