Output reproducability with version 2023.1

stefan-maxar · ‎05-24-2023

Hello!

We are noticing some output differences when recompiling and running our hybrid MPI-openMP application using Intel oneAPI 2023.1 base and HPC kits compared to the older compiler and runtime environments using Intel oneAPI 2022.2.0 base and HPC kits.

Specifically, we produce output in a gridded binary format and compare the SHA256 sums of the files produced (and also do "diff" file comparisons as well). Compiling and running the identical application using the same exact inputs produces output with different results from the two Intel oneAPI versions. Specifically, of the 65+ files we analyzed, each file had a different SHA256 sum of from Intel oneAPI 2022.2.0 and Intel oneAPI 2023.1. We ran several tests across multiple clusters to confirm the difference.

Additionally, we found that the results from Intel oneAPI 2023.1 (and 2022.2.0) are reproducible within its version - the SHA256 sums of the output from the application run several times with the same input data and specific Intel version produced identical results. Interestingly, when we've made upgrades previously (e.g., from Intel oneAPI 2021.4 to 2022.2.0) , the SHA256 sums of the output were identical in our comparisons.

None of our compiler options (flags, etc.) have changed, nor has our runtime environment. The only changes made were to recompile and run with Intel oneAPI 2023.1 kits rather than 2022.2.0.

Some compiler and runtime details below:

OS: AmazonLinux 2 (AWS)

Interconnect: EFA v1.21.0 (AWS) with Libfabric 1.16.1

Scheduler: SLURM v22.05.5 using Intel Hydra as the submission mechanism

Compilers: Intel C++/Fortran classic with MPI

Some pertinent environmental variables:

export I_MPI_OFI_PROVIDER="efa"
export I_MPI_OFI_LIBRARY_INTERNAL=0
export I_MPI_FABRICS=shm:ofi
export I_MPI_HYDRA_BOOTSTRAP="slurm"
export I_MPI_HYDRA_RMK="slurm"
export I_MPI_HYDRA_BRANCH_COUNT=-1
export I_MPI_PIN=1
export I_MPI_PIN_RESPECT_CPUSET=0
export I_MPI_PIN_DOMAIN="omp"
export KMP_STACKSIZE=16G
export KMP_AFFINITY="compact"
export I_MPI_EXTRA_FILESYSTEM=1
export I_MPI_EXTRA_FILESYSTEM_FORCE="lustre"

Were there fundamental changes that could explain such a change in output? Mainly, we were surprised to see such a dramatic change and are trying to understand what might have changed underneath the hood that could explain such changes. Thanks!

Gregg_S_Intel · ‎05-24-2023

If the files contain floating-point data I suggest this is not the way to compare them. Read the values and compare the differences against a tolerance value. Simple changes to a reduction algorithm can perturb floating point results, because floating-point arithmetic is not associative. Both the old and new values are correct.

stefan-maxar · ‎05-24-2023

Thanks for the advice! Yes, we also did that sort of analytical comparison of the data. The results are what we'd consider substantially different between the two versions and out of a range of tolerance for the Intel oneAPI 2023.1 version. Were there changes to the default reduction algorithms used between the two versions worth nothing?

Gregg_S_Intel · ‎05-24-2023

Substantial difference is also a possibility depending on the numerical sensitivity of the simulation. What does the application do?

stefan-maxar · ‎05-24-2023

It is a weather forecasting application. Yes, we are aware of and understanding of the non-linearity and numerical sensitivity of our application to these sort of changes. It was just noteworthy that with previous Intel oneAPI upgrades we have done have not resulted in such stark differences in the output. In fact, the diffs/sha256sums were identical from previous upgrades as noted previously. We've done this sort of upgrade across Intel oneAPI suite a few times now and this is the first time we've encountered what we'd considered substantial difference - hence starting this thread!

Gregg_S_Intel · ‎05-24-2023

For reproducible results,

- export I_MPI_CBWR=1 or I_MPI_CBWR=2

- compile with -fp-model precise

- export KMP_DETERMINISTIC_REDUCTION=yes

However, each of these has an impact on performance.

I_MPI_CBWR

Control reproducibility of floating-point operations results across different platforms, networks, and topologies in case of the same number of processes.

Syntax

I_MPI_CBWR=<arg>

Arguments

<arg>	CBWR compatibility mode	Description
0	None	Do not use CBWR in a library-wide mode. CNR-safe communicators may be created with MPI_Comm_dup_with_info explicitly. This is the default value.
1	Weak mode	Disable topology aware collectives. The result of a collective operation does not depend on the rank placement. The mode guarantees results reproducibility across different runs on the same cluster (independent of the rank placement).
2	Strict mode	Disable topology aware collectives, ignore CPU architecture, and interconnect during algorithm selection. The mode guarantees results reproducibility across different runs on different clusters (independent of the rank placement, CPU architecture, and interconnection)

stefan-maxar · ‎05-25-2023

Thanks for this! We are using -fp-model precise but will look into the other options as well. For my awareness, did any of the defaults for the I_MPI_* variables change from Intel OneAPI 2022.X to 2023.1? Thinking of variables like the I_MPI_ADJUST_* series. Thanks again for the help.

AishwaryaCV_Intel · ‎06-14-2023

Hi,

Apologies for the delayed response.

>>>For my awareness, did any of the defaults for the I_MPI_* variables change from Intel OneAPI 2022.X to 2023.1?

To clarify, the default values for the I_MPI_* variables in Intel OneAPI did not change from version 2022.X to 2023.1. By default, all these variables are set to -1, indicating that they will act according to tuning. This default behavior has remained consistent since the beginning.

In the event of a change in behavior, there are two possible reasons.

The algorithm implementation on the MPI side may have been modified, resulting in a difference in behavior. Unfortunately, there is no workaround for this scenario.
Tuning settings may have been altered. If this is the case, it is possible to use tuning files from previous versions. This would essentially replicate the behavior of setting the I_MPI_ADJUST_* variables to their old values.

Could you please verify which tuning file was used in both cases by enabling the I_MPI_DEBUG=6 flag. By doing so, you will see output similar to the following:

[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.9.0/etc/tuning_skx_ofi_tcp-ofi-rxm_10.dat" not found

[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.9.0/etc/tuning_skx_ofi_tcp-ofi-rxm.dat"

When IMPI attempts to find a file that matches the specific configuration, it may make multiple attempts. Therefore, seeing the message "File ... not found" is completely normal.

The last line, "Load tuning file," indicates the exact tuning file used during the run. To use collectives settings from the 2022.X version with the 2023.1 version, the customer can specify I_MPI_TUNING_BIN, pointing to the corresponding tuning file from the 2022.X installation.

However, it's important to note that there is no guarantee of reproducible results outside of I_MPI_CBWR mode. So, the provided workarounds serve as suggestions.

Utilize the I_MPI_CBWR mode for reproducibility, although it may not replicate the old results exactly.
Use the tuning file from the 2022.X version with the 2023.X version to precisely replicate the old behavior.

Thanks And Regards,

Aishwarya

stefan-maxar · ‎06-16-2023

Thanks! I will take a look at the tuning, etc. as you suggest.

AishwaryaCV_Intel · ‎06-19-2023

Hi,

Could you please confirm whether we can close this thread?

Thanks And Regards,

Aishwarya

AishwaryaCV_Intel · ‎06-26-2023

Hi,

We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.

Thanks And Regards,

Aishwarya