Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

How to use fused multiply–add with MKL?

Konovalov__Pavel
Beginner
1,400 Views
I want to do basic a*x + b operation, where a, x and b are the vectors(or matrixes) with utilization of FMA processor capabilities. I think I am using v?Mul + v?Add I will get two separate operations. How to use FMA with the help of MKL and Intel compiler? Must I use FMA Intrinsics only?
0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
1,400 Views

The C and Fortran compilers will generate FMA instructions from ordinary source code loops for which the FMA operation is appropriate, provided that

  • the target instruction set includes FMA (AVX2 or newer -- note that the default is SSE, which does not support FMA), and
  • the optimization level is high enough (at least O1, but preferably O2), and
  • you have not prohibited FMA with a different compiler flag (-no-fma or some options to the -fp-model flag).

There are a few cases involving reduction operations where the compiler will choose not to use FMA operations because it estimates that there will be a shorter critical path by splitting the operation (doing the multiplication earlier and the addition later).

View solution in original post

0 Kudos
4 Replies
TimP
Honored Contributor III
1,400 Views
Do ?axpy work for you?
0 Kudos
Konovalov__Pavel
Beginner
1,400 Views

No, Tim. in ?axpy a is a scalar, not a vector and there is no vector b.

0 Kudos
McCalpinJohn
Honored Contributor III
1,401 Views

The C and Fortran compilers will generate FMA instructions from ordinary source code loops for which the FMA operation is appropriate, provided that

  • the target instruction set includes FMA (AVX2 or newer -- note that the default is SSE, which does not support FMA), and
  • the optimization level is high enough (at least O1, but preferably O2), and
  • you have not prohibited FMA with a different compiler flag (-no-fma or some options to the -fp-model flag).

There are a few cases involving reduction operations where the compiler will choose not to use FMA operations because it estimates that there will be a shorter critical path by splitting the operation (doing the multiplication earlier and the addition later).

0 Kudos
Konovalov__Pavel
Beginner
1,400 Views
Thank you, John! Is there any indication from the compiler ouput that loop is FMAsed :) ? like with vectorisation report
0 Kudos
Reply