I have come across the following problem with one-sided communication using MPI_ACCUMULATE. The versions are:
ifort (IFORT) 22.214.171.124 20190206
Intel(R) MPI Library for Linux* OS, Version 2019 Update 3 Build 20190214 (id: b645a4a54)
The attached program does a very basic calculation using one-sided communication with MPI_ACCUMULATE (and MPI_WIN_FENCE to synchronize). Compile it with
mpif90 test.f donothing.f
The program accepts a command line argument. For example,
mpiexec -np 1 ./a.out 10
simply runs the calculation ten times (on a single process).
When I run the program, it crashes with a segmentation fault in MPI_WIN_FENCE if the argument is larger than 8615. (Or around that number.) But only if one (!) process is used. For any other number of processes, the program run is successful!
When I set FI_PROVIDER to tcp (unset before), the behavior is different: Then, the program run gets stuck for an argument larger than 12, and for very large arguments, the program crashes with "Fatal error in PMPI_Win_fence: Other MPI error".
(The dummy routine "donothing" is a substitution for "mpi_f_sync_reg", which does not exist in this version of IntelMPI.)
I have tried to reproduce your issue with the given code and the program ran smoothly even with input as big as 100000.
Could you please specify more details regarding your issue and the environment you are using.
thank you very much! Could you tell me what versions you used?
The environment is
- Two Intel Xeon E5-2680 v3 Haswell CPUs per node
- 2 x 12 cores, 2.5 GHz
- Intel Hyperthreading Technology (Simultaneous Multithreading)
- AVX 2.0 ISA extension
Perhaps the problem is rather related to our installation/hardware.
First i have checked with IMPI 2019.6 which is the latest version available and it seems to work fine.
But now i have checked with your version IMPI 2019.3 and faced similar kind of errors. I will raise this issue with the concerned team.
Meanwhile is it possible for you to update your IMPI to the latest version and confirm me that it is working fine without any errors.
And could you please specify what is the default libfabric provider before you set FI_PROVIDER to tcp. You can check that by setting environment variable I_MPI_DEBUG=5.
thank you. Indeed, with 2019.6, the program runs successfully.
So, it seems to be a library bug. The information you asked about libfabric is:
 MPI startup(): libfabric version: 1.8.0a1
 MPI startup(): libfabric provider: verbs;ofi_rxm
Glad to hear that your code runs.
The issue might have been fixed in the newer version.
We are closing this thread now . Please raise a new thread if you face any further issues.