For example, suppose we have a local memory array:
And a much larger, global memory array. Would we copy like:
int memStart = 50;
for (int i = 0; i < 10; ++i)
local[i] = globalMem[memStart + i];
Or should we use #pragma unroll for this copy, to avoid making the loop take one clock cycle per copy? Or is there some other recommended way to move array data between local and global memory?
Transferring Loop-Carried Dependency to Local Memory
For more queries, can you put this to the correct category fall under open cl? https://forums.intel.com/s/topic/0TO0P0000001AUUWA2/intel-high-level-design