- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have rather a simple OpenCL code, configured in ND-range, for graph processing application. I have 4-5 global arguments as input/output. The code structure is rather simple and short, with two nested for loops (outer for vertex and inner for edge processing, with a floating point multiplication and addition in its body), without any explicit local variable (As I know OpenCL local goes to FPGA BRAMs).
Code seems to be very simple, but when I implement it on FPGA (Arria 10, on harp machine) only 2 CU fit inside FPGA. Every CU takes around 25% of BRAMs, plus the default so-called blue bitstream (partial configuration) which takes around 30% of resources constantly.
Is there any solution to reduce hidden BRAM usage? I already removed 'strict' keyword from input arguments, as I do not need a cache for them. That reduced BRAM usage, but still it is high.
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Have you consulted the best practices guide?
https://www.intel.com/content/www/us/en/programmable/documentation/mwh1391807516407.html
Essentially examine the detailed HTML area report to determine where the block ram usage is coming from.
Also try to use the “volatile” keyword to avoid caching LSUs.
Thanks
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page