Quote:Tim P. wrote:

tjahns · ‎05-30-2016

Hello, I've found that -ftz only really applies when also an optimization level of 1 or above is chosen. Can this be somehow worked around, i.e. when using a code that produces denomal numbers without -ftz have the correspondig mxcsr flags set such that 0.0 gets computed instead in a non-optimized code. I see this behaviour with compiler versions from 2012 up to at least 16.0.2 on systems with SandyBridge and newer Xeons but haven't investigated 32bit mode or older CPUs/compilers. Regards, Thomas Jahns

TimP · ‎05-30-2016

maybe by using simd intrinsics https://software.intel.com/en-us/node/513376 if your compiler is recent enough that it uses SSE instructions at -O0. Evidently, there is no such facility under -mia32.

Changes in the default setting shouldn't be occurring without notification, but SandyBridge and newer CPUs were changed so that underflow in add/subtract is not expensive. The original reason for setting ftz for performance was supposed to be eliminated.

If your main() is compiled with gcc or msvc++, normal compile options would not give you ftz except by using the intrinsic. So the expected icc behavior is non-portable. As those compilers improved support for vectorization, the hardware needed change to fix the performance issue.

The article https://software.intel.com/en-us/node/513376 apparently forgot to mention the effect of -fp-model settings on default ftz setting.

tjahns · ‎05-30-2016

Tim P. wrote:

The article https://software.intel.com/en-us/node/513376 apparently forgot to mention the effect of -fp-model settings on default ftz setting.

So is the compiler also changing -fp-model when I use -O0 or what are you implying here? My post was mostly meant as a notifier, that the documentation of -ftz is not sufficient, because it fails to mention the dependency of -ftz on having optimization active.

TimP · ‎05-30-2016

I don't know whether that is a good way of putting it, but fp-model doesn't have normal effects at -O0. What I meant is that certain settings, such as -fp-model precise, imply -no-ftz, although that hasn't been consistent. That's another view of your assertion that -ftz isn't fully documented in any one place, and my assertion that modern CPUs would allow for it to be set by default in fewer contexts (provided, of course, that we are given sufficient warning of changes).

I've had to test whether -ftz interacts with -[no]prec-div -[no]prec-sqrt as well, seeing unexpected settings and results there, as it is a possible cause of NaN results.

jimdempseyatthecove · ‎06-01-2016

Try compiling your main with full optimizations and debug options (main will init the floating point mode), and compile everything else with debug (-O0). If necessary make a new main stub that calls your (renamed) main.

Jim Dempsey

Debugging numeric codes with -ftz