- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I vectorized the following code, using the OpenMP SIMD instruction, but the result after vectorization is not guaranteed to be binary consistent. I'm curious what operation is causing the inconsistency.
FYI, I'm using the Intel ifort 2021 compiler.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Variables like QV that are used by all streams need to be listed in a PRIVATE clause. !$OMP SIMD PRIVATE(QV, MU, ...)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I only posted the innermost loop code. Variables like QV are declared private in the OMP PARALLE DO directive of the outermost loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The private needs to be on the !$omp parallel...
In your article and in the documentation, /fp:precise does not state how it addresses potential FMA optimizations, whereas /fp:consistent does state it disables FMA.
Note: /fp:strict states "enables precise, disables contractions,..." which would seem to indicate that /fp:precise does not disable FMA.
Can you comment to as if this is a documentation omission or not?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim, I have no idea. Also keep in mind that my presentation was delivered 11 years ago.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Steve_Lionel @jimdempseyatthecove
I use the following compilation options:
"-heap-arrays -assume byterecl -real-size 64 -no-vec -fp-model precise -fp-speculation=safe -fpe -mp1 -qopenmp -O2 -init=zero,array"
Do you have any suggestions for compilation options to enhance the numerical reproducibility of vectorization?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-no-vec disables vectorization, so vectorization is not responsible, yes? If you are parallelizing, you have no control over the order of operations, and this is most likely to cause reproducibility errors. If you run this with -qopenmp-stubs, what happens?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I use -no-vec to turn off automatic vectorization by the compiler and use OMP SIMD to explicitly control the vectorized code. Although compiled with OpenMP, it only runs with single threads
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also check if your code is using uninitialized data.
And, if parallel regions, determine if you need firstprivate instead of private.
And, if parallel regions, and the regions is calling/using any random number generator, the sequences may differ.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FWIW the code you showed should vectorize along the k index (k:VectorSize-1). There is no conflict in output side, nor loop order dependencies. If (in this code section) the use of FMA in your non-parallel test code should be consistent with the use of FMA in your parallel production code. (same with how the RHS expression is evaluated).
Either deviation can cause minor differences in results.
If you see major differences in results, then check the uninitialize data or the issue with multi-threaded random numbers.
Note, while this loop is not directly using random numbers, one or more of its inputs may have been generated using random numbers. Where your reference data was serialized, and your production code is parallel, these input arrays may differ.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Steve_Lionel @jimdempseyatthecove
Thank you guys for your answers. You guys reminded me to deal with compiler options. I added "-fp-model source -fimf-arch-consistency=true -no-fma" according to the article ‘Consistency of floating-point results using the Intel compiler or why doesn't my application always give the same answer’ written by Corden and Kreitzer's, and binary consistency is guaranteed.
In fact, consistency is guaranteed by using only the -fimf-arch-consistency=true option. I looked at the code and guessed that it might have something to do with the power operation in code "pre =(Rd*mu*rhotheta/p0)**gamma*p0".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for posting a screenshot, copy and paste is broken on this forum!!!
Jim
![](/skins/images/045A6C88D0527A93E76B179D7F5E2AFE/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page