Solved: SSE optimizations and IEEE 754

jeff_keasler · ‎08-24-2009

Hi,

I'm using the options below as my best tradeoff between SSE performance and IEEE 754 conformance. Are there any other flags I could add that would bring me even closer to precise arithmetic without sacrificing much performance?

-g -O3 -ip -ansi-alias -restrict -msse3 -unroll-aggressive -vec-report1 -fp-model strict -fp-model source -prec-div -prec-sqrt -no-ftz

Thanks,
-Jeff

TimP · ‎08-25-2009

-prec-div -prec-sqrt -no-ftz (all included in both -fp-model source and -fp-model strict) are the primary IEEE 754 compliance options. -no-ftz is the only one which affects setting of mxcsr (and then only in main()).
-fp-model source requires operations in float or double as specified in your source code. -fp-model strict should give more support for exception handling. -fp-model precise would be similar to strict, but with double evaluation of float expressions.
Both -fp-model options are fairly strict about associativity in accordance with C standard, disabling vector sum reductions and most vector math functions.

View solution in original post

aazue · ‎08-24-2009

Quoting - jeff_keasler

Hi,

I'm using the options below as my best tradeoff between SSE performance and IEEE 754 conformance. Are there any other flags I could add that would bring me even closer to precise arithmetic without sacrificing much performance?

-g -O3 -ip -ansi-alias -restrict -msse3 -unroll-aggressive -vec-report1 -fp-model strict -fp-model source -prec-div -prec-sqrt -no-ftz

Thanks,
-Jeff

Hi
Performance depend essential processor used
Give exactly your typed processor(s) machine probably can be answered by that have experience similar.

Your remark: to precise arithmetic without sacrificing much performance?
you can test with use MPFR Library well and performance not decreased.

(flag option compiler require exactly accorded choice but is difficult you can have only one type for all )
Several test take long time but is better way for result certain ...
Best regards

jeff_keasler · ‎08-25-2009

Quoting - bustaf

Hi
Performance depend essential processor used
Give exactly your typed processor(s) machine probably can be answered by that have experience similar.

Your remark: to precise arithmetic without sacrificing much performance?
you can test with use MPFR Library well and performance not decreased.

(flag option compiler require exactly accorded choice but is difficult you can have only one type for all )
Several test take long time but is better way for result certain ...
Best regards

Hi,

This has to work on a range of chips that are guaranteed to have at least SSE3, but possibly no more than SSE3.

The MPFR library looks interesting, but I need the speed of SSE hardware floating point, and I just need to be as close to IEEE 754 as possible.

I'm suspecting there may be compiler options that reduce how aggresive the math optimizations are and set the hardware control registers as best as is possible for accuracy. I'm currently using the 11.1.046 compiler, if that helps.

Thanks,
-Jeff

TimP · ‎08-25-2009

-prec-div -prec-sqrt -no-ftz (all included in both -fp-model source and -fp-model strict) are the primary IEEE 754 compliance options. -no-ftz is the only one which affects setting of mxcsr (and then only in main()).
-fp-model source requires operations in float or double as specified in your source code. -fp-model strict should give more support for exception handling. -fp-model precise would be similar to strict, but with double evaluation of float expressions.
Both -fp-model options are fairly strict about associativity in accordance with C standard, disabling vector sum reductions and most vector math functions.

jeff_keasler · ‎09-04-2009

I've just been informed by a colleague that adding -nolib-inline to my original set of compiler options produces slightly better mathematical results.

karma_kid · ‎09-17-2010

On a related note... I have a user who requires sctrict ANSI compliance and must use the
-fp_model strict
command-line option. This is on an AMD Opteron using Intel version 10 compilers under a Linux OS.

Recently, he discovered that his makefile has been using another option all along: -mp1.
Based on manpage descriptions, I assume the two options together do not step on each other. In fact, I would expect the "-mp1" option to have no effect on generated code when combined with the "-fp_model strict" option. Are there any potential compatibility problems with using both of those options simultaneously?

Man page for -mp1 says: Improves floating-point precision and consistency. This option disables fewer optimizations and has less impact on performance than -fltconsistency or -mp.

It may be that -mp1 is a deprecated command-line option.

TimP · ‎09-17-2010

Yes, the old -mp1 (and similar) options should be replaced by appropriate -fp-model options. You would have to investigate carefully (experimentally) the effect of using both the old and new options together, if you think you have a reason for doing so.