- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am using intel mpi 4.1.3 with different process managers, mpd/hydra and got very different behavior
when i use mpd (mpiexec -perhost 32 -nolocal -n 384 -env I_MPI_FABRICS shm:dapl ./wrf.exe ) i've got all 32 cores on 100% on each nodes.
when i use hydra (mpirun -perhost 32 -nolocal -n 384 -env I_MPI_FABRICS shm:dapl ./wrf.exe) i've got only 26 cores on 100% the rest are about 0% cpu time. and the performance decrease about two times.SK
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you run IMB under both MPD and Hydra and send the results? Also, please compare the performance of Hydra under 4.1 Update 3 with Version 5.0 Update 1 (the most recent).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
Thanks for replying me.
I have run IMB both with mpd and hydra, the attached txt files. i've also attached two pictures that show the cpu usage in both cases.
To do the ubdate will take more time.
Regads,
SK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please run again with I_MPI_DEBUG=5. Just "IMB-MPI1 PingPong" is sufficient, no need for the other tests.
Are you running under a job scheduler?
You should be able to use -hostfile for both MPD and Hydra. Best to keep this consistent.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
>Are you running under a job scheduler? - no
>You should be able to use -hostfile for both MPD and Hydra. Best to keep this consistent. - it doesn't work with mpd: invalid "local" arg: -hostfile. may be i should use -machinefile instead?
the results are attached.
Regards,
SK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, this appears to be a pinning problem. We have improved the pinning since 4.1 Update 3, so updating to the current version could fix this. You can install it in your user folder to test without having to upgrade your entire cluster.
If you need to stay on 4.1 Update 3, there are some steps we can try. First, try setting
[plain]I_MPI_PIN_MODE=lib[/plain]
If this doesn't work, try setting one of the following (the first is better, as it is less specific)
[plain]I_MPI_PIN_PROCESSOR_LIST=all
I_MPI_PIN_PROCESSOR_LIST=0-31[/plain]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page