Reduce hidden BRAM usage in ND-raneg OpenCL

hamze60 · ‎12-30-2018

Hi,

I have rather a simple OpenCL code, configured in ND-range, for graph processing application. I have 4-5 global arguments as input/output. The code structure is rather simple and short, with two nested for loops (outer for vertex and inner for edge processing, with a floating point multiplication and addition in its body), without any explicit local variable (As I know OpenCL local goes to FPGA BRAMs).

Code seems to be very simple, but when I implement it on FPGA (Arria 10, on harp machine) only 2 CU fit inside FPGA. Every CU takes around 25% of BRAMs, plus the default so-called blue bitstream (partial configuration) which takes around 30% of resources constantly.

Is there any solution to reduce hidden BRAM usage? I already removed 'strict' keyword from input arguments, as I do not need a cache for them. That reduced BRAM usage, but still it is high.

Thanks

Karl_Q_Intel · ‎01-02-2019

Hi

Have you consulted the best practices guide?

https://www.intel.com/content/www/us/en/programmable/documentation/mwh1391807516407.html

Essentially examine the detailed HTML area report to determine where the block ram usage is coming from.

Also try to use the “volatile” keyword to avoid caching LSUs.

Thanks