Hi,I have one specific questions regarding the memory access optimization in FPGA. As we know, developers need to make sure to coalesce all memory accesses in their code. In GPU, that means all threads in a wrap to access sequential indexes of memory. I browsed the best practices of Intel FPGA with regard to this issue, but there is no specific detail on how memory access coalescing should be done? If we have single thread mode, does that mean we need to have memory indexes being sequental temporally, as opposed to spatially in GPU? What about ND-Range mode? in this mode we have both opportunities of optimizing memory access spatially and temporall. Can anyone elaborate on the memory manager module mechanism for handling memory accesses? Thanks
GPUs have complex and efficient memory controllers and mostly rely on run-time access coalescing of consecutive accesses by threads in a warp. On FPGAs, there is little (or likely no) support for run-time coalescing and accesses must be coalesced at compile-time instead. This can be achieved by unrolling the memory access loop in single work-item kernels, or using SIMD in NDRange kernels. If you check the system viewer section in the area report, you will see that loop unrolling/SIMD will increase the size of the ports going from the kernel to memory.