I am accelerating my application on altera FPGA, When I go with SIMD 32 the resources drops apart of increasing. I studied somewhere that its a performance saturation. My question is, how to prove it? Where could i find the answer of this question? Could i find in report somewhere?
When is see reports, there are only two memory banks created . In case of 16, there are only 16 memory banks that i can see in reports.
Is there any memory bound issue? If it is, then what it is? Please guide me in this matter.
That is because the compiler does not support SIMD sizes above 16 and if you choose such SIMD size, it will automatically revert to a SIMD size of 1 and hence, resource utilization will decrease. There should be warning about this in the compilation log, or at least there was one before. A lot of the important warning have been removed in the newer versions of the compiler, hope this one is still there.
Of course there is zero logical reason to have any restriction on SIMD size for FPGAs since, unlike GPUs, FPGAs do not have a fixed architecture; however, this has been like this since the very first version of the compiler and will probably never change.
Thanks for your answer, so can we say, more or less memory bound issue? because when SIMD gets bigger then resource usage increases, and when some of the resources usage is more than hundred percent then offline compilation terminates with some quartus error in normal cases. But my question is , in SIMD case, if this is memory bound issue then why it doesn't terminate the off-line compilation except doping the resources?
Thank you .
Not really, this has nothing to do with memory bandwidth, this is an artificial compiler limitation. The following compiler warning is generated when compiling your kernel:
Compiler Warning: Kernel Vectorization: requested number of SIMD work items is larger than ... cannot vectorize efficiently beyond OpenCL widest vector type.
If you write the kernel using the Single Work-item model and use an unroll size of 32, which would have a similar effect to using a SIMD size of 32 in an NDRange kernel, the kernel will compile just fine and the area usage will keep increasing as you increase the unroll factor. Depending on your kernel and FPGA size, you might not be able to fit the design with literally any SIMD size (even 1), or you might be able to still fit it with a hypothetical SIMD size of 32 or more. The compiler cannot know if your design will fit or not without place and routing it; hence, it will not terminate the compilation if some resource is expected to be overutilized. Note that the area utilization numbers you get from the "-report" switch are based on estimation, and final area utilization could be more or less than that.
Memory bandwidth depends on a lot of factors, only one of which is SIMD/unroll size. You can find a comprehensive analysis of memory performance on Intel FPGAs in the following document:
For your information, according to https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/opencl-sdk/aocl-best-pra... in chapter 7.3.1 shows the limitation in implement num_simd_work_items attribute.