OpenCL equivalent of CUDA warp vote functions

ABoxe · ‎06-12-2014

Do equivalents exist for CUDA __all and __any methods?

http://docs.nvidia.com/cuda/cuda-c-programming-guide/#warp-vote-functions

allanmac1 · ‎06-14-2014

As far as I can tell, for OpenCL 1.x on the CPU you can use an explicit vector type coding style (see the Optimization Guide) and then use the any(vecn) and all(vecn) relational functions. I see no equivalent to the ballot() function.

For OpenCL 2.x there are both work_group and sub_group any/all/broadcast/reduce/scan functions. Sub-groups don't appear to be supported yet in the CPU driver.

For the HD Graphics IGP I don't see a high performance way to do this without support for a vector type coding style. Using shared local memory to implement ANY and ALL is probably the approach everyone is taking. I haven't tested it but it might be safe to implement a fast ANY/ALL within a SIMD8/16/32 group of items as they're probably executed in lock-step. But for anything wider (or if the previous is illegal) then you can implement it the old fashioned way.

I would be interested if anyone else had suggestions, tips, tricks for writing fast HD Graphics IGP code!

ABoxe · ‎06-15-2014

Thanks, allanmac. I didn't know about the opencl 2.0 functions.

I think that I will wait for the 2.0 release to try this feature. AMD is planning on releasing 2.0 support

later this year.

Aaron