Solved: Fp model linux vs. windows

RS · ‎03-02-2009

I have some code compiled with /fp:extended on windows and -fp-model extended on linux. The program loads data from a long double array from memory and does some calculation on it. Looking at the resulting assembler code it seems that any extended precision (80 bit) data loaded from memory are rounded to double precision (64 bit) on windows. This does not seem to be the case on linux. Does anyone know if this is the expected behavior ? Any way of getting the compiler to not round the input to double precision on windows ?

TimP · ‎03-09-2009

Quoting - tim18

Reading between the lines, the doc says you _must_ set /Qpc80 along with /Qlong-double (or use the facility to set 64-bit precision mode). If you want long double math function support, you must replace by . It's not clear whether any of this is supported for 64-bit mode; I'll try it when I get a chance. There will still not be long double run-time support for the other standard headers.
Microsoft never supported long double as extended precision, and continues to throw up obstacles.
There is no support for long double in the new instruction sets, so performance of long double will not keep pace.

I've found the following command line option requirement for long double to be supported in 64-bit precision mode in ICL for Windows (both 32- and 64-bit):
icl /fp:source /Qlong-double /Qpc80

Also required is something like:

#ifdef __INTEL_COMPILER
#include // needed to support Windows /Qlong-double
#ifndef __STDC__
#define __STDC__ 1 // only partial STDC support is available
#endif
#else
#include
#endif

is needed when using ICL 80-bit long double, even if the application didn't require . Of course, the mathimf requirement is in violation of __STDC__, as is the breakage of printf(). However, if the application uses __STDC__ to check for presence of long double support, you will need to set it.

/Qlong-double breaks printf() functions entirely on 64-bit Windows, for all float data types, so you would need to segregate long double into functions which don't require printf(), and set /Qlong-double only for those functions. On 32-bit Windows, printf() should work for values which don't require range or precision beyond double.

/Qpc80 sets 64-bit precision mode, without which there is little point in /Qlong-double. Without /Qpc80, but with /Qlong-double set, long double expressions are evaluated in 53-bit precision, with long double exponent range.
The actual working of /Qpc80?
For 32-bit Windows, /Qpc64 (the default) sets 53-bit precision mode for your task at the beginning of main().
In 64-bit Windows, the OS sets 53-bit precision mode before starting the .exe. /Qpc80 sets your task back to 64-bit precision mode in main().
In 53-bit precision mode, every floating point operation is rounded so that the 11 low order bits are zero.
The Qpc options have no effect in functions other than main().

The reason for setting /fp:source is to be able to check the precision resulting from expressions. /fp:fast (ICL default) doesn't follow language standards. In fact, /Qlong-double is not a good solution for accuracy problems associated with /fp:fast. Note that ICL /fp:source has effect similar to VC /fp:fast.

From the above, you will see that Windows support for long double, even when using Intel compilers, leaves something to be desired. In any case, casts between double and long double are slow on SSE CPUs.
Windows compilers apparently don't attempt to align long double for performance, so you can expect performance variations for /Qlong-double.
icc for linux takes 128-bit alignment as the natural boundary for long double, as that should improve performance. This differs from gcc, which prefers 96-bit alignment, so as not to suffer much in performance but also not waste space.

View solution in original post

TimP · ‎03-02-2009

If your Windows main() is compiled with default options, the CPU will be initialized to 53-bit precision mode. Compiling main() and your long double functions with /Qlong-double is required on Windows if you want your declared long double to be processed in 64-bit precision mode. Compiling main() with /fp:extended or /PC80, or explicit SSE intrinsics, may also set 64-bit precision mode. Behavior of Windows run-time libraries isn't entirely predictable in 64-bit precision mode, as Microsoft doesn't support it.
Precision mode is a somewhat separate question from the use of 32- or 64-bit operating systems; the support for 64-bit x87 precision mode is cut back in the Windows X64 compilers.
The original specification for Windows X64 prohibited use of x87 80-bit registers. When it was released, they had relented to the extent of taking care of those registers during context switch. It was intended to support "legacy" 32-bit applications which used those registers. Still, neither Microsoft nor Intel X64 compilers support x87 code generation, as far as I know. I assumed above that you meant to use 32-bit Windows compilers.

srimks · ‎03-03-2009

Quoting - RS

I have some code compiled with /fp:extended on windows and -fp-model extended on linux. The program loads data from a long double array from memory and does some calculation on it. Looking at the resulting assembler code it seems that any extended precision (80 bit) data loaded from memory are rounded to double precision (64 bit) on windows. This does not seem to be the case on linux. Does anyone know if this is the expected behavior ? Any way of getting the compiler to not round the input to double precision on windows ?

Could you check http://msdn.microsoft.com/en-us/library/a32tsf7t(VS.80).aspx

Intel C++ Compiler for x64 Windows supports long double precision and __m64. Probably, it is safe to use FP registers and MMX registers in 64-bit Windows, except in kernel mode drivers.

Note: The MMX & FP stack registers (MM0-MM7/ST0-ST7) are preserved across context swicthes. There is no explict calling convention for these registers. The use of these registers is strictly prohibited in kernel mode.

~BR

RS · ‎03-06-2009

Thank's for your very informative answer. I am compiling for 64 bit windows. After adding /Qlong-double in addition to /fp:extended the compiler seems to generate x87 code almost equivalent with what icc under linux produce. However as far as i can tell the internal precision is only 53 bits. I tried using _controlfp to set the precision manually, but that resulted in a runtime assertion.

It seems that extended precision is something that might not be supported in the long term and not by all compilers. In that case I might be better off not using it.

TimP · ‎03-06-2009

Quoting - RS

Thank's for your very informative answer. I am compiling for 64 bit windows. After adding /Qlong-double in addition to /fp:extended the compiler seems to generate x87 code almost equivalent with what icc under linux produce. However as far as i can tell the internal precision is only 53 bits. I tried using _controlfp to set the precision manually, but that resulted in a runtime assertion.

It seems that extended precision is something that might not be supported in the long term and not by all compilers. In that case I might be better off not using it.

Reading between the lines, the doc says you _must_ set /Qpc80 along with /Qlong-double (or use the facility to set 64-bit precision mode). If you want long double math function support, you must replace by . It's not clear whether any of this is supported for 64-bit mode; I'll try it when I get a chance. There will still not be long double run-time support for the other standard headers.
Microsoft never supported long double as extended precision, and continues to throw up obstacles.
There is no support for long double in the new instruction sets, so performance of long double will not keep pace.

TimP · ‎03-09-2009

Quoting - tim18

Reading between the lines, the doc says you _must_ set /Qpc80 along with /Qlong-double (or use the facility to set 64-bit precision mode). If you want long double math function support, you must replace by . It's not clear whether any of this is supported for 64-bit mode; I'll try it when I get a chance. There will still not be long double run-time support for the other standard headers.
Microsoft never supported long double as extended precision, and continues to throw up obstacles.
There is no support for long double in the new instruction sets, so performance of long double will not keep pace.

I've found the following command line option requirement for long double to be supported in 64-bit precision mode in ICL for Windows (both 32- and 64-bit):
icl /fp:source /Qlong-double /Qpc80

Also required is something like:

#ifdef __INTEL_COMPILER
#include // needed to support Windows /Qlong-double
#ifndef __STDC__
#define __STDC__ 1 // only partial STDC support is available
#endif
#else
#include
#endif

is needed when using ICL 80-bit long double, even if the application didn't require . Of course, the mathimf requirement is in violation of __STDC__, as is the breakage of printf(). However, if the application uses __STDC__ to check for presence of long double support, you will need to set it.

/Qlong-double breaks printf() functions entirely on 64-bit Windows, for all float data types, so you would need to segregate long double into functions which don't require printf(), and set /Qlong-double only for those functions. On 32-bit Windows, printf() should work for values which don't require range or precision beyond double.

/Qpc80 sets 64-bit precision mode, without which there is little point in /Qlong-double. Without /Qpc80, but with /Qlong-double set, long double expressions are evaluated in 53-bit precision, with long double exponent range.
The actual working of /Qpc80?
For 32-bit Windows, /Qpc64 (the default) sets 53-bit precision mode for your task at the beginning of main().
In 64-bit Windows, the OS sets 53-bit precision mode before starting the .exe. /Qpc80 sets your task back to 64-bit precision mode in main().
In 53-bit precision mode, every floating point operation is rounded so that the 11 low order bits are zero.
The Qpc options have no effect in functions other than main().

The reason for setting /fp:source is to be able to check the precision resulting from expressions. /fp:fast (ICL default) doesn't follow language standards. In fact, /Qlong-double is not a good solution for accuracy problems associated with /fp:fast. Note that ICL /fp:source has effect similar to VC /fp:fast.

From the above, you will see that Windows support for long double, even when using Intel compilers, leaves something to be desired. In any case, casts between double and long double are slow on SSE CPUs.
Windows compilers apparently don't attempt to align long double for performance, so you can expect performance variations for /Qlong-double.
icc for linux takes 128-bit alignment as the natural boundary for long double, as that should improve performance. This differs from gcc, which prefers 96-bit alignment, so as not to suffer much in performance but also not waste space.