Controlling NDRange kernel M20K RAM replication

Altera_Forum · ‎11-07-2017

Hello all,

Is there a way to limit the number of work groups executing simultaneously in a compute unit (NDRange kernel)? I have an issue where the compiler is replicating RAM so much that my RAM resource usage is way over 100% solely due to its crazy replication scheme (30x!). Each work item has it's own private memory so it's not replicating it for banking purposes, and the compiler reports the replication is to "efficiently support simultaneous workgroups". Does the compiler really prioritize performance over being able to build the kernel at all??

I saw an excellent post earlier about single work item kernels and# pragma max_concurrency which I didn't know about; and am hoping there's something similar for NDRange kernels.

Thanks in advance.

Altera_Forum · ‎11-07-2017

Yeah, this is another one of the stupid thinks the compiler does, but as far as I know, CANNOT be controlled by the user. In fact, in one the earlier versions of Altera's documents it was clearly written that this replication factor cannot be controlled by the user, but they have removed it later. You can check my reply to a user asking a similar question before, and the quote from Altera's older document, here (https://www.alteraforum.com/forum/showthread.php?t=54741). I remember I saw very high replication factors before in one of the NDRange kernels I was working on, and I managed to work around it by using "max_work_group_size" instead of "reqd_work_group_size" and force the compiler to reduce replication factor; you can try to see if that would help in your case.

BTW, it is always a good idea to open a support ticket with Altera and "complain"; when multiple complain about the same thing, they are more likely to fix it.