I am testing Intel MPI 4.1 with test.c (the provided test program).
Whenever I run > 2000 ranks the program executes correctly but fails to end gracefully.
mpiexec.hydra -n 2001 -genv I_MPI_FABRICS shm:ofa -f hostfile ./testc
It stalls at
Hello World: Rank 2000 running on host xxxx
##<stalls here; does not return to command prompt>
(If I use -n 2000 or less, it runs perfectly.)
I have testing 3000 ranks using OpenMPI, so it doesn't seem to be a cluster/network issue.
1. DAPL UD works with > 2000 ranks.
2. Attached is the output from I_MPI_FABRICS shm:ofa - stalls after rank 0 receives a single message from ranks 1-2000
3. The cluster has more than 2000 slots: for OpenMPI/OFA I use --map-by socket, with no oversubscription to force the MPI to go across all the nodes.
I am using Mellanox OFED 2.2-1.0.1 on a mlx4 card.
The problem seems to be a MPI -> OFED interaction.