Solved: Doing low bit-width fixed precision FMA on DSP in OpenCL

SBioo · ‎05-03-2019

Hi,

I'm developing a specific design, where my variables are either 8 or 16 bit fix precision. For example, data are stored as `char` or `short`. Now each iteration, multiple FMAs are being done, but the design does not utilize the DSP.

Something tells me that there should be a way to offload these computations onto the DSPs and open up more space for scalability of the design. Unfortunately, I have no idea how it could be done in OpenCL. Even the OpenCL documentations do no provide any information.

My first question is, does such thing ever possible? My assumption is, it can help doing multiple FMAs on a single DSP!

Second, if it is possible, is there any specific documentation on how this can be done in OpenCL?

Thanks

HRZ · ‎05-09-2019

The appropriate IP Core is either directly used by the OpenCL compiler, or eventually employed by the mapper depending on the width of your variables. I remember there were some topics in the forum about this subject before and the compiler's behavior was kinda buggy, though. You can also take a look at the variable-width integer extension and the instructions for correctly inferring fixed-point arithmetic in the Best Practices Guide.

View solution in original post

HRZ · ‎05-04-2019

The most complex operation you can do using one Arria 10/Stratix 10 DSP is an "18 × 18 Sum of 2 fixed-point" operation. You cannot do more than one FMA per DSP on these devices regardless of bit-width since each DSP has only one adder and FP32 FMA is the only natively-supported FMA operation. You can refer to "Intel® Arria® 10 Native Fixed Point DSP IP Core User Guide" and "Intel® Stratix® 10 Variable Precision DSP Blocks User Guide" for more info.

P.S. You might be able to manually pack multiple low-bit-width FMAs into one 32-bit FMA by inserting zeroes between the numbers and using bit masking, but there is no guarantee you would get the intended DSP packing in the end.

SBioo · ‎05-07-2019

Thanks much for the reply,

I've taken a look at the documentations you mentioned, and understood various IP cores that can be used for something like 18x18 FMA. Now my question is. how should I change my code so the compiler adopts these specific IP cores for me? Should I just extend my variables from 8 bit to 18 bits? Or are these FMA modes are provided as built-in functions, so I can just call them?

Thanks

HRZ · ‎05-09-2019

The appropriate IP Core is either directly used by the OpenCL compiler, or eventually employed by the mapper depending on the width of your variables. I remember there were some topics in the forum about this subject before and the compiler's behavior was kinda buggy, though. You can also take a look at the variable-width integer extension and the instructions for correctly inferring fixed-point arithmetic in the Best Practices Guide.