Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
12 Views

FMA not used

Hi,

I have been surprised to spot the following behavior of Intel compiler (17.0.2 20170213 on Linux), using -xCORE-AVX2. The following code generates FMA instructions

double norm(double* x, int n) {
  ans = 0.0;
  for (int i = 0; i < n; ++i) {
    ans += x * x;
  }
  return ans;
}

but the following code does not

float norm(float* x, int n) {
  ans = 0.0f;
  for (int i = 0; i < n; ++i) {
    ans += x * x;
  }
  return ans;
}

Is there a reason for this, or is it a missed optimization form the compiler?

Best regards,

Francois

0 Kudos
1 Reply
Highlighted
Black Belt
12 Views

When icc does use FMA for float data type dot product, it riffles by more than sufficiently large factor to cover the extra latency of FMA in the case where the operands are present in L1.  icc will choose not to use FMA if cost evaluation shows FMA may be slower.  This may be influenced by the assumed trip count, which you can adjust by pragma.  Your choice of 32- or 64-bit target also may influence the choice.

As a matter of interest (at least to me), gcc would need the -mno-fma -ffast-math options to show best AVX2 performance here, as there is no riffling.  The fma may run 60% longer, in accordance with the documented latency.

0 Kudos