- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have been surprised to spot the following behavior of Intel compiler (17.0.2 20170213 on Linux), using -xCORE-AVX2. The following code generates FMA instructions
double norm(double* x, int n) { ans = 0.0; for (int i = 0; i < n; ++i) { ans += x * x; } return ans; }
but the following code does not
float norm(float* x, int n) { ans = 0.0f; for (int i = 0; i < n; ++i) { ans += x * x; } return ans; }
Is there a reason for this, or is it a missed optimization form the compiler?
Best regards,
Francois
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When icc does use FMA for float data type dot product, it riffles by more than sufficiently large factor to cover the extra latency of FMA in the case where the operands are present in L1. icc will choose not to use FMA if cost evaluation shows FMA may be slower. This may be influenced by the assumed trip count, which you can adjust by pragma. Your choice of 32- or 64-bit target also may influence the choice.
As a matter of interest (at least to me), gcc would need the -mno-fma -ffast-math options to show best AVX2 performance here, as there is no riffling. The fma may run 60% longer, in accordance with the documented latency.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page