I am doing a project on CycloneV soc which involves transferring large amount of data from memory to FPGA. As DMA needs to work with physically contiguous memory, I copied data to the share memory first (allocated by clEnqueueMapBuffer) and then FPGA consumes the data accordingly.
The problem I am having now is that moving data to the shared memory in user space is very time-consuming. I think it is due to the fact that Intel OpenCL library disables the caching for cpu access to the shared memory (as shown in the following pic). I can understand that it's hard to manage cache coherence in this case but it is not impossible to achieve!
As the Intel OpenCL library is not open-source, it seems hard for us to do any changes and enable the cache. Can anyone tell me a way around this problem, plz?
Instead of trying to enable the CPU cache which does not guarantee the performance of moving your data, you could try and check the best practices in the optimizing memory access section to possibly improve the performance:
Thanks for your reply!
However, I am satisfied with the performance of the kernel right now. And I really want to optimize the process where data is transferred to and from the shared memory. It costs much more time than the kernel running.