Dear MPI Support Team,
we are very grateful for the new I_MPI_CBWR control. However, there seem to be some unexpected side effects, e.g. when I_MPI_CBWR=2 is set, we cannot change other algorithms that should not be touch by numerical reproducibility like gatherv.
To demonstrate the issue I performed several IMB-MPI1 benchmarks with gatherv (see attached outpus)
impi-test.sh-plain is run with default settings for the reference algorithm selection.
impi-test.sh-cbwr2 is run with cbwr=2, and as expected the algorithm selection in gatherv seems to be the same as the timings do not change.
impi-test.sh-gatherv3 is run without cbwr but with I_MPI_ADJUST_GATHERV=3 and indeed the timings change significantly by forcing this specific gatherv algorithm.
Impi-test.sh-gatherv_cbwr2 is run with cbwr=2 and gatherv=3, and here problematic point is clearly visible. I would expect the same behavior as with gatherv alone, but instead we get the performance identical to plain and cbwr=2 only, indicating that gatherv=3 is ignored although gatherv should not be affected by reproducibility issues at all.
Is this really the expected behavior to disable algorithm selection when using I_MPI_CBWR?
please have look at
So my question is it possible to turn off avx512 use by environment variables in Intel MPI 2021.6 library?
Otherwise we can't use valgrind for memchecking. Sometimes we also use Intel inspector, but it is often much too slow.
Thanks for posting in Intel Communities.
Could you please let us know the Intel MPI version?
And also, could you please provide us with the OS, CPU, and hardware details you are using?
Thanks & Regards,