Numerical reproducibility for threaded MKL + MPI

JaspervdK · ‎05-02-2025

Hello,

I have an MPI based Fortran code that essentially solves a bunch of embarrassingly parallel problems, and then combines the results to solve one big problem. The first set of problems is split across MPI processes, and uses MKL routines DSYEVD and DGEMM. Then the bigger problem uses PDGEMM and PDGESV.

I have noticed that my results vary from run to run (by small amounts). I would like to confirm that this is purely due to nondeterminism from parallelization and not due to some bug. In the MKL documentation I found the CNR mode, which I think should give me numerical reproducibility. However it doesn't for my code, unless I set the number of openMP and MKL threads to 1. Then all results are consistent from run to run as desired.

I do not have any explicit openMP pragmas in the code, so I assume the problem must come from threaded MKL routines.

Is it possible that, even in strict CNR mode, these routines remain non-deterministic? Could this be due to MPI? I could not find documentation on CNR in MPI, but one of the reproducibility conditions quoted in the MKL documentation reads:
"Calls to Intel® oneAPI Math Kernel Library occur in a single executable". I am not sure if MPI respects this condition, as many processes could be calling MKL routines at the same time.

Thanks for any help!

JaspervdK · ‎05-07-2025

In case anyone is reading in the future:

After more testing I seem to have found the culprit. We would always set the number of threads N for MKL and openMP as follows.

call OMP_SET_NUM_THREADS(N)

call MKL_SET_NUM_THREADS(N)

What I found is that when N*Nproc, where Nproc is the number of MPI processes, is larger than the number of cores on the system, the results would vary from run to run. If I just ensure that I do not exceed this limit, the results are fully reproducible from run to run, even without CNR mode.

Aleksandra_K · ‎05-07-2025

Hi,

could you provide some simple code reproducing your issue?

Thanks,

Alex

Aleksandra_K · ‎05-13-2025

Hi,

Could you let us know if you are still interested in the issue? If so, could you prepare a reproducer to help us address it?

Regards,

Alex

JaspervdK · ‎05-13-2025

Hi Alex,

Apologies for the lack of response. Unfortunately I haven't been able to reproduce the behavior in a simpler code, and I cannot share the full code here. In any case, for my purposes the problem is solved with the fix I posted before. If in the future I do find a way to reproduce the issue I will make a new thread.

Thanks for the help,

Regards,

Jasper

Numerical reproducibility for threaded MKL + MPI

Error

Performance