Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

FMA not used

velvia
Beginner
717 Views

Hi,

I have been surprised to spot the following behavior of Intel compiler (17.0.2 20170213 on Linux), using -xCORE-AVX2. The following code generates FMA instructions

double norm(double* x, int n) {
  ans = 0.0;
  for (int i = 0; i < n; ++i) {
    ans += x * x;
  }
  return ans;
}

but the following code does not

float norm(float* x, int n) {
  ans = 0.0f;
  for (int i = 0; i < n; ++i) {
    ans += x * x;
  }
  return ans;
}

Is there a reason for this, or is it a missed optimization form the compiler?

Best regards,

Francois

0 Kudos
1 Reply
TimP
Honored Contributor III
717 Views

When icc does use FMA for float data type dot product, it riffles by more than sufficiently large factor to cover the extra latency of FMA in the case where the operands are present in L1.  icc will choose not to use FMA if cost evaluation shows FMA may be slower.  This may be influenced by the assumed trip count, which you can adjust by pragma.  Your choice of 32- or 64-bit target also may influence the choice.

As a matter of interest (at least to me), gcc would need the -mno-fma -ffast-math options to show best AVX2 performance here, as there is no riffling.  The fma may run 60% longer, in accordance with the documented latency.

0 Kudos
Reply