Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Vector FMA

wchan212
Beginner
969 Views
I need to do something like this A = B + C * D where A, B, C, D are vectors and the operation is done elemnt wise.
Currently I am using vdMul followed by a vdAdd ... This is not entirely efficient (even if my processor has no FMA instruction set) because of cache coherency and how the instructions are issued... i.e. all my adders are sitting idle when i do my muls, and all my muls are idle when i do my adds.
Is there a more efficient way to do this?
Side: is dger the only way to do the "outer-product" between two vectors? in MKL?
0 Kudos
1 Reply
Chao_Y_Intel
Moderator
969 Views

Hello,

vdMu and vdAdd are the function that could be used now( You may block the data if it is too large to put into cache). I am checking the function owner to find any more efficient to use. Generally, for such simple loops, the compiler is expecting to generate some high performance code.

Thanks,
Chao

0 Kudos
Reply