Trivial (?) differences between OpenMP and non-OpenMP code

IanH · ‎07-08-2008

I have a numerically intensive application which would appear to benefit from parallelisation through OpenMP. At a couple of key loops I have applied the appropriate DO PARALLEL directives and after a bit of trouble shooting everything seems to be working nicely and the resulting application is noticeably faster. Wonderful stuff.

However, I am seeing an ever so slight difference (fifth significant figure after a thousand odd iterations) in the calculated results between a version compiled with OpenMP active, and a straight optimised release version. This difference is present even when the OpenMP version is restricted to one thread (eg via the OMP_NUM_THREADS environment variable). It is very conceivable that the slight difference is simply due to differences in the evaluation order of expressions/rounding/etc, but I need to check in case there's something else astray.

The only difference in compiler options is the /Qopenmp switch, ie:

/QaxP /Qopenmp /real_size:64 /fpe:0 /libs:static /threads

versus

/QaxP /real_size:64 /fpe:0 /libs:static /threads

Can the introduction of OpenMP change the order of evaluation of expressions *within* a parallelised DO loop, or do you just get exactly the same sequence of instructions as per the non-omp case but they run side by side? All this using 10.1.024.

Thanks for any input,

IanH

Steven_L_Intel1 · ‎07-08-2008

Oh, the code can be very different. Also, when you use OpenMP, all local arrays that would be statically allocated by default are allocated on the stack instead. If you have not properly initialized the array, you can get differences.

TimP · ‎07-08-2008

If you have used sum reduction operations, those specifically imply a different order of additions and different roundoff from serial code, even though your code is completely correct.
While it is not likely to be the cause of the differences you observe, /QaxP specifies generation of 2 code paths with different numerical properties, chosen according to the CPU on which it is run, which seems contradictory to your desire to get nearly identical results.
/QxP /Qopenmp /real_size:64 /assume:protect_parens,minus0 /Qprec-div /Qprec-sqrt affords fewer opportunities for unexpected numerical differences. I'm somewhat concerned about the implication of /fpe:0 when combined with OpenMP.
The option /Qauto (implied by /Qopenmp) would place local arrays on the stack.

IanH · ‎07-09-2008

Thanks for the responses.

tim18:
I'm somewhat concerned about the implication of /fpe:0 when combined with OpenMP.

Could you elaborate? I've selected that simply to make things explode when a dodgy floating point operation occurs.