Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

My program is numerically unstablewith SSE2 enabled

I expected that enabling SSE2 will provide better numerical stability between runs. However, when I enabled SSE2 on a part of my program, I am getting more numerical noise between runs.

Does anyone know if this is an expected behaviour with SSE2 enabled and if yes why.

Thank you for your help,

0 Kudos
3 Replies
Black Belt
I'm guessing this is more of a compiler options question than a threading question. As you haven't specified which compiler it is, the following may be more verbose than you intended:
One of the issues I run into with C float (Fortran default real) is that the aggressive default optimizations of Intel compilers are more likely to bite when using SSE code, as there is no promotion of intermediate calculations to higher precision. For C code, you may need one of the /fp (linux -fp-model) options, one less aggressive than the default /fp:fast. For ifort, I often use -assume protect_parens -prec-div -prec-sqrt, all of which are included in -fp-model precise. The default optimizations reduce the range of validity of divide and sqrt; -prec-div and -prec-sqrt require those to be done according to IEEE, subject to whether you have gradual underflow enabled (/fp:precise sets /Qftz-). /fp:precise does not promote Fortran default real intermediates to double, while it does promote C float intermediates to double. I hope I have not further confused the issue beyond what is written in the compiler docs.
/fp:precise also removes optimizations where numerical results may depend on data alignment.
With Microsoft 32-bit C, if you set /arch:SSE2, you don't get good float performance unless you set /fp:fast (or possibly /fp:source), as the implicit float to double conversions of /fp:precise (their default) are expensive. So, good performance with SSE comes at the expense of the protection offered by extended range and precision evaluation.
gcc -ffast-math (roughly equivalent to -fp-model fast of Intel compilers) has different reliability issues between x87 and SSE code.
Thanks a bunch, Tim. Your reply has helped me play with many compiler options.
Thanks , I have the same problem .smiley [:-)]
diseo web diseo grafico