Hi all!I'm new to OpenCL and the FPGA world in general and I'm just starting to grasp how to properly use M20K blocks for parallel accesses. Since I need very many of them over several arrays that are actually shallow in depth, the result is that the RAM usage explodes, while the single block is heavily underused. I bumped into the __attribute__((merge(...))) directive in the documentation. This should allow to implement different variables in the same memory subsystem. However, when I insert it, the "aoc -c ..." complains saying it doesn't know such an attribute. Is there a workaround or a solution to the problem? Edit: I'm using OpenCL 16.1 which is the most recent version supported by my board
I had never seen this attribute before, it seems it has been recently (and silently) added to the documentation, without proper description or examples. This attribute, whatever it actually does, is certainly not supported in Quartus v16.1, or likely even 17.0.I would recommend seeking alternatives to reduce your Block RAM utilization. You should make sure to minimize the number of accesses to your local arrays. If you are unrolling loops, make sure the accesses inside of the loop are consecutive so that they can be coalesced into one larger access. You can also consider reducing the number of access ports to your local buffers using the numreadports or numwriteports attributes, which forces the compiler to share ports. Of course this will come at the cost of lowered performance. Since each Block RAM has a limited number of ports, merging different arrays into the same Block RAM will very likely require port sharing, which will also result in lowered performance.
Thank you very much!Since every Block Ram has two physical ports, respectively for R/W, I am in the situation where one of the variables needs to be read and the other one updated, so that the total number of accesses is two. Merging variables into the same block would have provided a good deal of saving. I'm disappointed they actually introduced that in the doc. without implementing
--- Quote Start --- Since every Block Ram has two physical ports, respectively for R/W, I am in the situation where one of the variables needs to be read and the other one updated, so that the total number of accesses is two. Merging variables into the same block would have provided a good deal of saving. --- Quote End --- Actually you can effectively double the number of ports by double-pumping the Block RAMs; however, the compiler usually does that automatically. I am not sure what you mean by "one of the variables needs to be read and the other one updated"; every variable will have at least one read point, and one write point in every kernel and hence, without double-pumping, there is no way two variables can share one Block RAM without port sharing. --- Quote Start --- I'm disappointed they actually introduced that in the doc. without implementing --- Quote End --- They have definitely implemented it, but likely in Quartus v17.1. It is not going to work with earlier versions.