Solved: How to use fused multiply–add with MKL?

Konovalov__Pavel · ‎12-12-2017

I want to do basic a*x + b operation, where a, x and b are the vectors(or matrixes) with utilization of FMA processor capabilities. I think I am using v?Mul + v?Add I will get two separate operations. How to use FMA with the help of MKL and Intel compiler? Must I use FMA Intrinsics only?

McCalpinJohn · ‎12-12-2017

The C and Fortran compilers will generate FMA instructions from ordinary source code loops for which the FMA operation is appropriate, provided that

the target instruction set includes FMA (AVX2 or newer -- note that the default is SSE, which does not support FMA), and
the optimization level is high enough (at least O1, but preferably O2), and
you have not prohibited FMA with a different compiler flag (-no-fma or some options to the -fp-model flag).

There are a few cases involving reduction operations where the compiler will choose not to use FMA operations because it estimates that there will be a shorter critical path by splitting the operation (doing the multiplication earlier and the addition later).

View solution in original post

TimP · ‎12-12-2017

Do ?axpy work for you?

Konovalov__Pavel · ‎12-12-2017

No, Tim. in ?axpy a is a scalar, not a vector and there is no vector b.

McCalpinJohn · ‎12-12-2017

The C and Fortran compilers will generate FMA instructions from ordinary source code loops for which the FMA operation is appropriate, provided that

the target instruction set includes FMA (AVX2 or newer -- note that the default is SSE, which does not support FMA), and
the optimization level is high enough (at least O1, but preferably O2), and
you have not prohibited FMA with a different compiler flag (-no-fma or some options to the -fp-model flag).

There are a few cases involving reduction operations where the compiler will choose not to use FMA operations because it estimates that there will be a shorter critical path by splitting the operation (doing the multiplication earlier and the addition later).

Konovalov__Pavel · ‎12-12-2017

Thank you, John! Is there any indication from the compiler ouput that loop is FMAsed :) ? like with vectorisation report