I am trying doing accumulation and multiplication operations on 8 bit of data.
"for(int i =0 ;i<4;i++)
sum[i] += inp1[i] * inp2[i];
now both my inp1 and inp2 are 8 bit of data , and sum is 20 bits to. But somehow when I compile my code it is using dsp for multiplication , but of addition it is using luts. But for floating point 32 bits operations I have seen it using 1 dsp only for 1 accumulation and 1 multiplication. Is there a way where I can use dsp for addition instead of luts ?
In my experience, the OpenCL compiler generates math IP cores based on the size of the result, not the operands. If your result is 20 bits, then you will not get one FMA (Fused Multiply and Add) per DSP. I think the result should be less than 18 bits (or be exactly 32-bit float) to get FMA, since there are two 18x19 multipliers in each DSP. Also, if you are using anything below Quartus/AOC v18.1, do not expect good mapping of math operations to DSPs.
Sorry for the delay. Based on my understanding, Quartus will auto-select the optimal implementation for your operation and I am not aware of specific settings to tell Quartus to not using LUTs for addition.
Just would like to check with you which specific device and Quartus version that you are using?
For testing purpose, you may try to use the multiply and add IPs under the IP Catalog -> Basic Functions -> Arithmetic to see if it helps.
Please let me know if there is any concern. Thank you.