I am developing an n linear sorter using n work-items. Therefore, I am using two barrier(CLK_LOCAL_MEM_FENCE) to ensure that compares and shifts are done properly. My code works on FPGAs and GPUs. 


I am a bit confused by this AOCL warning:  


"Compiler Warning: Threads may reach barrier out of order - allowing at most 2 concurrent workgroups" 



I thought threads (work-items) reaching a barrier out of order is OpenCL default behaviour or can we assume that on FPGA the work-items within a work-group are always implemented in a lockstep manner? Further, is the compiler trying to implement concurrent work-groups? 



I have met the warning. The work-items within a work-group are not implemented in a lockstep manner, but pipeline (one by one) manner. So, the out-of-order is mainly because of "if-else" statement, when the first work-item(e.g. if) has complex logic and the second work-item(e.g. else) is easy to finish.

