Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28616 Discussions

Vectorization of code leads to binary inconsistency

MetMan
Beginner
518 Views

Hello,

 I vectorized the following code, using the OpenMP SIMD instruction, but the result after vectorization is not guaranteed to be binary consistent. I'm curious what operation is causing the inconsistency. 

FYI, I'm using the Intel ifort 2021 compiler.

 

MetMan_0-1721392226258.png

 

0 Kudos
12 Replies
Steve_Lionel
Honored Contributor III
466 Views

That's just a fact of life. See Improving Numerical Reproducibility in C/C++/Fortran

0 Kudos
Andrew_Smith
Valued Contributor I
465 Views

Variables like QV that are used by all streams need to be listed in a PRIVATE clause. !$OMP SIMD PRIVATE(QV, MU, ...)

0 Kudos
MetMan
Beginner
390 Views

I only posted the innermost loop code.  Variables like QV  are declared private in the OMP PARALLE DO directive of the outermost loop.

0 Kudos
jimdempseyatthecove
Honored Contributor III
438 Views

@Andrew_Smith 

The private needs to be on the !$omp parallel...

@Steve_Lionel 

In your article and in the documentation, /fp:precise does not state how it addresses potential FMA optimizations, whereas /fp:consistent does state it disables FMA.

 

Note: /fp:strict states "enables precise, disables contractions,..." which would seem to indicate that /fp:precise does not disable FMA.

 

Can you comment to as if this is a documentation omission or not?

 

Jim Dempsey

 

0 Kudos
Steve_Lionel
Honored Contributor III
425 Views

Jim, I have no idea. Also keep in mind that my presentation was delivered 11 years ago.

0 Kudos
MetMan
Beginner
390 Views

@Steve_Lionel @jimdempseyatthecove  

 

I use the following compilation options:

"-heap-arrays -assume byterecl -real-size 64 -no-vec -fp-model precise -fp-speculation=safe -fpe -mp1 -qopenmp -O2 -init=zero,array"

 

Do you have any suggestions for compilation options to enhance the numerical reproducibility of vectorization?

 

0 Kudos
Steve_Lionel
Honored Contributor III
313 Views

-no-vec disables vectorization, so vectorization is not responsible, yes? If you are parallelizing, you have no control over the order of operations, and this is most likely to cause reproducibility errors. If you run this with -qopenmp-stubs, what happens?

0 Kudos
MetMan
Beginner
160 Views

I use -no-vec to turn off automatic vectorization by the compiler and use OMP SIMD to explicitly control the vectorized code. Although compiled with OpenMP, it only runs with single threads

0 Kudos
jimdempseyatthecove
Honored Contributor III
306 Views

Also check if your code is using uninitialized data.

And, if parallel regions, determine if you need firstprivate instead of private.

And,  if parallel regions, and the regions is calling/using any random number generator, the sequences may differ.

 

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
300 Views

FWIW the code you showed should vectorize along the k index (k:VectorSize-1). There is no conflict in output side, nor loop order dependencies. If (in this code section) the use of FMA in your non-parallel test code should be consistent with the use of FMA in your parallel production code. (same with how the RHS expression is evaluated).

Either deviation can cause minor differences in results.

If you see major differences in results, then check the uninitialize data or the issue with multi-threaded random numbers.

Note, while this loop is not directly using random numbers, one or more of its inputs may have been generated using random numbers. Where your reference data was serialized, and your production code is parallel, these input arrays may differ.

 

Jim Dempsey

0 Kudos
MetMan
Beginner
154 Views

@Steve_Lionel @jimdempseyatthecove 

 

Thank you guys for your answers. You guys reminded me to deal with compiler options. I added "-fp-model source -fimf-arch-consistency=true -no-fma" according to the article ‘Consistency of floating-point results using the Intel compiler or why doesn't my application always give the same answer’  written by Corden and Kreitzer's, and binary consistency is guaranteed.

In fact, consistency is guaranteed by using only the -fimf-arch-consistency=true option. I looked at the code and guessed that it might have something to do with the power operation in code "pre =(Rd*mu*rhotheta/p0)**gamma*p0".

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
24 Views

Sorry for posting a screenshot, copy and paste is broken on this forum!!!

 

jimdempseyatthecove_0-1721649815996.png

Jim

0 Kudos
Reply