Macros indicating FP model

nemequ · ‎11-09-2020

GCC and clang have macros to indicate compliance with IEC 60559 / IEEE 754. If -ffinite-math-only is specified, __FINITE_MATH_ONLY__ is defined as 1. If -ffast-math is specified (which includes -ffinite-math-only), __FAST_MATH__ and __FINITE_MATH_ONLY__ are both defined to 1.

ICC is not conformant by default, but can be made so with -fp-model=precise. However, there is no difference in the predefined macros (see `icc -dM -E - </dev/null` vs. `icc -fp-model=precise -dM -E - </dev/null`, so AFAICT there is no way to detect the model at compile time other than defining macros in the command (i.e., build system integration).

This ruins an important opportunity for optimization opportunity for SIMDe (and probably lots of other code). For example, on GCC and clang we can implement the vmaxq_f32 function from NEON using _mm_max_ps when finite-math-only is enabled. However on ICC, even though fast math is the default, since we can't automatically determine whether -fp-model=precise was passed we have to always use a _mm_cmpunord_ps, _mm_max_ps, _mm_andnot_ps, _mm_or_ps, _mm_and_ps, and _mm_set1_ps. Obviously that's substantially slower, and of course it's not just that one function.

In order to take advantage of code which already detects fast math in GCC/clang, I'd suggest defining the following macros when appropriate:

__FINITE_MATH_ONLY__
__NO_MATH_ERRNO__
__FAST_MATH__
__SUPPORT_SNAN__

Another option might be to not define __STDC_IEC_559__ when not in a IEC 60559 / IEEE 754 conforming mode (which ICC should be doing anyways, but I don't feel like tilting at that particular windmill right now).

I would also be okay with Intel-specific macros which match the semantics of the FP model ICC uses, but obviously that would require code to actually handle that instead of reusing existing code which handles fast math on GCC and clang. I'm willing to add that to my code (well, at least to the open-source stuff where I try to support ICC), but I suspect a lot of people wouldn't bother.

AbhishekD_Intel · ‎11-11-2020

Hi,

Thanks for reaching out to us.

We are forwarding this issue to the SME.

Warm Regards,

Abhishek

Viet_H_Intel · ‎11-11-2020

I see these macros were missing with icc. If you have oneAPI HPC Toolkit, then icx defines these macros.

$ icx -ffast-math -E -dM t.c |grep FAST_MATH

#define __FAST_MATH__ 1

$ icx -ffast-math -E -dM t.c |grep FINITE_MATH_ONLY

#define __FINITE_MATH_ONLY__ 0

$ icx -ffinite-math-only -E -dM t.c |grep FINITE_MATH_ONLY

#define __FINITE_MATH_ONLY__ 0

Thanks,

nemequ · ‎11-11-2020

Yes, icx fixes a ton of problems I have with icc, including this one. If I could just switch to icx I'd be much happier. TBH, supporting icc (and icl) has been a fairly significant pain point, and so far icx hasn't been a problem at all.

Unfortunately I don't have the much say in which compiler is used; this is for an open source library, and people are going to use whatever they want (even MSVC!). Actually, it's worse than than; it's an open-source library designed largely to make porting code easier, so a substantial portion of our users are just slapping my code on top of something they wrote many years ago and are even less like to switch compilers.

Basically, I have to support icc as well as I can for a while… probably for as long as Intel continues to support it. I already have a work-around in place in my code: people can just define a macro to indicate that they want fast math.

I just figured this would be an easy win for Intel; it's a small change which could potentially result in significant performance improvements without the user having to do anything. It would also reduce the delta between icc and icx, making the transition slightly easier for some people. The percentage of projects which have to switch behavior based on this is relatively low, but based on a quick GitHub search they definitely exist, and they're probably mostly projects which are performance-sensitive, otherwise why bother checking the macros at all?

nemequ · ‎11-11-2020

I just realized my reply could be read as snarky; that's really not how I meant it so if you interpreted it that way I'm sorry. Using icx instead is a completely valid suggestion, it just doesn't really help me because it's not something I can control, and that's not something you could be expected to know before the suggestion.

Viet_H_Intel · ‎11-12-2020

I did report this issue to our compiler Developer and just wanted to see if you can use icx instead.

Thanks,

Viet_H_Intel · ‎05-20-2021

Since you have a workaround, our Developer doesn't have a plan to fix it. Hence, we are going to close this thread as "wont fix".