Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7184 Discussions

Numerical reproducibility for threaded MKL + MPI

JaspervdK
Beginner
264 Views
Hello,

I have an MPI based Fortran code that essentially solves a bunch of embarrassingly parallel problems, and then combines the results to solve one big problem. The first set of problems is split across MPI processes, and uses MKL routines DSYEVD and DGEMM. Then the bigger problem uses PDGEMM and PDGESV.

I have noticed that my results vary from run to run (by small amounts). I would like to confirm that this is purely due to nondeterminism from parallelization and not due to some bug. In the MKL documentation I found the CNR mode, which I think should give me numerical reproducibility. However it doesn't for my code, unless I set the number of openMP and MKL threads to 1. Then all results are consistent from run to run as desired.

I do not have any explicit openMP pragmas in the code, so I assume the problem must come from threaded MKL routines.

Is it possible that, even in strict CNR mode, these routines remain non-deterministic? Could this be due to MPI? I could not find documentation on CNR in MPI, but one of the reproducibility conditions quoted in the MKL documentation reads:
"Calls to Intel® oneAPI Math Kernel Library occur in a single executable". I am not sure if MPI respects this condition, as many processes could be calling MKL routines at the same time.

Thanks for any help!
Labels (2)
0 Kudos
3 Replies
JaspervdK
Beginner
107 Views

In case anyone is reading in the future:

After more testing I seem to have found the culprit. We would always set the number of threads N for MKL and openMP as follows.

 

call OMP_SET_NUM_THREADS(N)

call MKL_SET_NUM_THREADS(N)

 

What I found is that when N*Nproc, where Nproc is the number of MPI processes, is larger than the number of cores on the system, the results would vary from run to run. If I just ensure that I do not exceed this limit, the results are fully reproducible from run to run, even without CNR mode.

 

 

 

 

0 Kudos
Aleksandra_K
Moderator
90 Views

Hi,

could you provide some simple code reproducing your issue?


Thanks,

Alex


0 Kudos
Aleksandra_K
Moderator
6 Views

Hi,

Could you let us know if you are still interested in the issue? If so, could you prepare a reproducer to help us address it?


Regards, 

Alex


0 Kudos
Reply