Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
40 Views

How to use fused multiply–add with MKL?

Jump to solution
I want to do basic a*x + b operation, where a, x and b are the vectors(or matrixes) with utilization of FMA processor capabilities. I think I am using v?Mul + v?Add I will get two separate operations. How to use FMA with the help of MKL and Intel compiler? Must I use FMA Intrinsics only?
0 Kudos

Accepted Solutions
Highlighted
Black Belt
40 Views

The C and Fortran compilers will generate FMA instructions from ordinary source code loops for which the FMA operation is appropriate, provided that

  • the target instruction set includes FMA (AVX2 or newer -- note that the default is SSE, which does not support FMA), and
  • the optimization level is high enough (at least O1, but preferably O2), and
  • you have not prohibited FMA with a different compiler flag (-no-fma or some options to the -fp-model flag).

There are a few cases involving reduction operations where the compiler will choose not to use FMA operations because it estimates that there will be a shorter critical path by splitting the operation (doing the multiplication earlier and the addition later).

"Dr. Bandwidth"

View solution in original post

0 Kudos
4 Replies
Highlighted
Black Belt
40 Views
Do ?axpy work for you?
0 Kudos
Highlighted
40 Views

No, Tim. in ?axpy a is a scalar, not a vector and there is no vector b.

0 Kudos
Highlighted
Black Belt
41 Views

The C and Fortran compilers will generate FMA instructions from ordinary source code loops for which the FMA operation is appropriate, provided that

  • the target instruction set includes FMA (AVX2 or newer -- note that the default is SSE, which does not support FMA), and
  • the optimization level is high enough (at least O1, but preferably O2), and
  • you have not prohibited FMA with a different compiler flag (-no-fma or some options to the -fp-model flag).

There are a few cases involving reduction operations where the compiler will choose not to use FMA operations because it estimates that there will be a shorter critical path by splitting the operation (doing the multiplication earlier and the addition later).

"Dr. Bandwidth"

View solution in original post

0 Kudos
Highlighted
40 Views
Thank you, John! Is there any indication from the compiler ouput that loop is FMAsed :) ? like with vectorisation report
0 Kudos