I am trying to compile a multi-kernel file in which one kernel, say A, has 16-bit arithmetic operations. What I see is when I change various parameters in other kernels, DSP usage in kernel A sometimes changes to slice usage. I want to control it or at least keep it consistent. I saw that there is DSP Block Balancing setting in Quartus. How can I control it from OpenCL SDK?Thanks
Are you sure about this? I have compiled over one thousand floating-point kernels and never encountered such thing. The rule for using DSPs in OpenCL is pretty simple: on both Stratix V and Arria 10, all integer additions are implemented using logic, and integer multiplications are implemented using DSPs. For floating-point operations, on Stratix V, only multiplication uses DSP and addition uses logic, but on Arria 10, both use DSPs. Of course this is if none of the numbers are constant. OpenCL does NOT provide any documented way of forcing operations to use logic instead of DSP; I specifically inquired Altera about this once and they said such functionality is not provided. They might have a secret/undocumented way to do it, though.If you are masking bits out, I guess multiplying less than 9 bits by 9 bits could be implemented using logic, but anything bigger than that will very likely use DSP.
Thanks for your comment. I reran some synthesis and have some updates. I realized the change from DSP to logic happened when I did change something in the kernel A which I thought should not have an effect. I put a loop unroll pragma before one of the loops but passed 1 as the parameter, which essentially shouldnt unroll the loop. I added that only for later experiments with kernel A but didnt think it would have an effect for other experiments if I keep unroll as 1. However it did and used logic instead of DSPs and reverted back to DSPs if I removed the pragma.The loop has some integer additions and one integer multiplication, compiling for Stratix V. I am not sure why is it happening but for now I will just remove the unroll pragma as it is not absolutely necessary.
That is an interesting observation. The only case I have seen that having a# pragam unroll 1 before a loop will have an effect is when the compiler decides to automatically unroll that loop, and adding that pragma would prevent it.
This is in fact a very useful comment. Indeed the DSP utilization and performance without the pragma is higher which suggests the loop might have been unrolled by the compiler. With pragma though, the LEs and FFs utilization increases (and DSP utilization decreases) for the kernel A which made me think that it was now using logic instead of DSP for the arithmetic computations. I do not have a break down of resource utilization for different sections of the kernel to confirm any theory. I will try to explore this more.
If the compiler automatically unrolls a loop and you use the "-v" switch, you will see a message in the log about the automatic unrolling. You can see resource utilization per line of your code in the HTML report (if you are using Quartus v16 and above).
No unfortunately I only have version 15 and it does not generate the HTML report. I saw the log report but it only show messages concerning pipeline and data dependencies. It is not showing anything regarding unrolling.
--- Quote Start --- No unfortunately I only have version 15 and it does not generate the HTML report. I saw the log report but it only show messages concerning pipeline and data dependencies. It is not showing anything regarding unrolling. --- Quote End --- you should be able to run aocl analyze-area to view the report with version 15. I'd be interested to see what you find
Thanks for referring to analyze-area. I have been able to extract more resource information. The short answer is yes compiler was auto-unrolling and with unroll 1, it was being limited and restricted the DSP usage.There are still some terms in the area report that I do not understand e-g different block names for.condX.i.preheader, for.bodyY.i, for.endZ.i, where X, Y, Z are random (for now) numbers that I think are because of nested loops in source code but I have not been able to make complete sense of so far. Also it describes resource usage for 'No Source Line' multiple times that I am not sure represents what.
This was with -g option. Without that it would not execute analyze-area for me. No this is still for 15.1. I will try to see if my license works for 16.1. Thanks