- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Do equivalents exist for CUDA __all and __any methods?
http://docs.nvidia.com/cuda/cuda-c-programming-guide/#warp-vote-functions
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
As far as I can tell, for OpenCL 1.x on the CPU you can use an explicit vector type coding style (see the Optimization Guide) and then use the any(vecn) and all(vecn) relational functions. I see no equivalent to the ballot() function.
For OpenCL 2.x there are both work_group and sub_group any/all/broadcast/reduce/scan functions. Sub-groups don't appear to be supported yet in the CPU driver.
For the HD Graphics IGP I don't see a high performance way to do this without support for a vector type coding style. Using shared local memory to implement ANY and ALL is probably the approach everyone is taking. I haven't tested it but it might be safe to implement a fast ANY/ALL within a SIMD8/16/32 group of items as they're probably executed in lock-step. But for anything wider (or if the previous is illegal) then you can implement it the old fashioned way.
I would be interested if anyone else had suggestions, tips, tricks for writing fast HD Graphics IGP code!
