Are DRAM accesses in the opencl code for FPGA executed in a non-blocking manner?

NSriv2 · ‎02-09-2019

When we access memory in the opencl kernel like this:

for (int i = 0; i < N; i++)
  ... = A[i]

Are they executed in non-blocking manner? Meaning does the generated FSM wait for the memory load to complete before sending another load request to memory, or it sends out mutliple load requests one after another and then handle the responses in-order when they come back?

HRZ · ‎02-09-2019

In case of Single Work-item kernels, loops are pipelined. This also applies to the memory accesses inside loops. Hence, access requests are sent back to back and after a certain delay, data is received in the same order. If the buffer between the kernel and memory becomes empty, then the kernel will stall waiting for new data to arrive.