- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have an MPI based Fortran code that essentially solves a bunch of embarrassingly parallel problems, and then combines the results to solve one big problem. The first set of problems is split across MPI processes, and uses MKL routines DSYEVD and DGEMM. Then the bigger problem uses PDGEMM and PDGESV.
I have noticed that my results vary from run to run (by small amounts). I would like to confirm that this is purely due to nondeterminism from parallelization and not due to some bug. In the MKL documentation I found the CNR mode, which I think should give me numerical reproducibility. However it doesn't for my code, unless I set the number of openMP and MKL threads to 1. Then all results are consistent from run to run as desired.
I do not have any explicit openMP pragmas in the code, so I assume the problem must come from threaded MKL routines.
Is it possible that, even in strict CNR mode, these routines remain non-deterministic? Could this be due to MPI? I could not find documentation on CNR in MPI, but one of the reproducibility conditions quoted in the MKL documentation reads:
"Calls to Intel® oneAPI Math Kernel Library occur in a single executable". I am not sure if MPI respects this condition, as many processes could be calling MKL routines at the same time.
Thanks for any help!
I have an MPI based Fortran code that essentially solves a bunch of embarrassingly parallel problems, and then combines the results to solve one big problem. The first set of problems is split across MPI processes, and uses MKL routines DSYEVD and DGEMM. Then the bigger problem uses PDGEMM and PDGESV.
I have noticed that my results vary from run to run (by small amounts). I would like to confirm that this is purely due to nondeterminism from parallelization and not due to some bug. In the MKL documentation I found the CNR mode, which I think should give me numerical reproducibility. However it doesn't for my code, unless I set the number of openMP and MKL threads to 1. Then all results are consistent from run to run as desired.
I do not have any explicit openMP pragmas in the code, so I assume the problem must come from threaded MKL routines.
Is it possible that, even in strict CNR mode, these routines remain non-deterministic? Could this be due to MPI? I could not find documentation on CNR in MPI, but one of the reproducibility conditions quoted in the MKL documentation reads:
"Calls to Intel® oneAPI Math Kernel Library occur in a single executable". I am not sure if MPI respects this condition, as many processes could be calling MKL routines at the same time.
Thanks for any help!
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In case anyone is reading in the future:
After more testing I seem to have found the culprit. We would always set the number of threads N for MKL and openMP as follows.
call OMP_SET_NUM_THREADS(N)
call MKL_SET_NUM_THREADS(N)
What I found is that when N*Nproc, where Nproc is the number of MPI processes, is larger than the number of cores on the system, the results would vary from run to run. If I just ensure that I do not exceed this limit, the results are fully reproducible from run to run, even without CNR mode.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
could you provide some simple code reproducing your issue?
Thanks,
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you let us know if you are still interested in the issue? If so, could you prepare a reproducer to help us address it?
Regards,
Alex

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page