Vectorization of code leads to binary inconsistency

MetMan · ‎07-19-2024

Hello，

I vectorized the following code, using the OpenMP SIMD instruction, but the result after vectorization is not guaranteed to be binary consistent. I'm curious what operation is causing the inconsistency.

FYI, I'm using the Intel ifort 2021 compiler.

Steve_Lionel · ‎07-19-2024

That's just a fact of life. See Improving Numerical Reproducibility in C/C++/Fortran

Andrew_Smith · ‎07-19-2024

Variables like QV that are used by all streams need to be listed in a PRIVATE clause. !$OMP SIMD PRIVATE(QV, MU, ...)

MetMan · ‎07-19-2024

I only posted the innermost loop code. Variables like QV are declared private in the OMP PARALLE DO directive of the outermost loop.

jimdempseyatthecove · ‎07-19-2024

@Andrew_Smith

The private needs to be on the !$omp parallel...

@Steve_Lionel

In your article and in the documentation, /fp:precise does not state how it addresses potential FMA optimizations, whereas /fp:consistent does state it disables FMA.

Note: /fp:strict states "enables precise, disables contractions,..." which would seem to indicate that /fp:precise does not disable FMA.

Can you comment to as if this is a documentation omission or not?

Jim Dempsey

Steve_Lionel · ‎07-19-2024

Jim, I have no idea. Also keep in mind that my presentation was delivered 11 years ago.

MetMan · ‎07-19-2024

@Steve_Lionel @jimdempseyatthecove

I use the following compilation options:

"-heap-arrays -assume byterecl -real-size 64 -no-vec -fp-model precise -fp-speculation=safe -fpe -mp1 -qopenmp -O2 -init=zero,array"

Do you have any suggestions for compilation options to enhance the numerical reproducibility of vectorization?

Steve_Lionel · ‎07-20-2024

-no-vec disables vectorization, so vectorization is not responsible, yes? If you are parallelizing, you have no control over the order of operations, and this is most likely to cause reproducibility errors. If you run this with -qopenmp-stubs, what happens?

MetMan · ‎07-20-2024

I use -no-vec to turn off automatic vectorization by the compiler and use OMP SIMD to explicitly control the vectorized code. Although compiled with OpenMP, it only runs with single threads

jimdempseyatthecove · ‎07-20-2024

Also check if your code is using uninitialized data.

And, if parallel regions, determine if you need firstprivate instead of private.

And, if parallel regions, and the regions is calling/using any random number generator, the sequences may differ.

Jim Dempsey

jimdempseyatthecove · ‎07-20-2024

FWIW the code you showed should vectorize along the k index (k:VectorSize-1). There is no conflict in output side, nor loop order dependencies. If (in this code section) the use of FMA in your non-parallel test code should be consistent with the use of FMA in your parallel production code. (same with how the RHS expression is evaluated).

Either deviation can cause minor differences in results.

If you see major differences in results, then check the uninitialize data or the issue with multi-threaded random numbers.

Note, while this loop is not directly using random numbers, one or more of its inputs may have been generated using random numbers. Where your reference data was serialized, and your production code is parallel, these input arrays may differ.

Jim Dempsey

MetMan · ‎07-20-2024

@Steve_Lionel @jimdempseyatthecove

Thank you guys for your answers. You guys reminded me to deal with compiler options. I added "-fp-model source -fimf-arch-consistency=true -no-fma" according to the article ‘Consistency of floating-point results using the Intel compiler or why doesn't my application always give the same answer’ written by Corden and Kreitzer's, and binary consistency is guaranteed.

In fact, consistency is guaranteed by using only the -fimf-arch-consistency=true option. I looked at the code and guessed that it might have something to do with the power operation in code "pre =(Rd*mu*rhotheta/p0)**gamma*p0".

jimdempseyatthecove · ‎07-22-2024

Sorry for posting a screenshot, copy and paste is broken on this forum!!!

Jim