I am using intel mpi 4.1.3 with different process managers, mpd/hydra and got very different behavior
when i use mpd (mpiexec -perhost 32 -nolocal -n 384 -env I_MPI_FABRICS shm:dapl ./wrf.exe ) i've got all 32 cores on 100% on each nodes.when i use hydra (mpirun -perhost 32 -nolocal -n 384 -env I_MPI_FABRICS shm:dapl ./wrf.exe) i've got only 26 cores on 100% the rest are about 0% cpu time. and the performance decrease about two times.
Can you run IMB under both MPD and Hydra and send the results? Also, please compare the performance of Hydra under 4.1 Update 3 with Version 5.0 Update 1 (the most recent).
Please run again with I_MPI_DEBUG=5. Just "IMB-MPI1 PingPong" is sufficient, no need for the other tests.
Are you running under a job scheduler?
You should be able to use -hostfile for both MPD and Hydra. Best to keep this consistent.
Ok, this appears to be a pinning problem. We have improved the pinning since 4.1 Update 3, so updating to the current version could fix this. You can install it in your user folder to test without having to upgrade your entire cluster.
If you need to stay on 4.1 Update 3, there are some steps we can try. First, try setting
If this doesn't work, try setting one of the following (the first is better, as it is less specific)