Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2153 Discussions

mpi performance/settings issue

kalen__stoi
Beginner
744 Views

Hi,

I am using intel mpi 4.1.3 with different process managers, mpd/hydra and got very different behavior

when i use mpd (mpiexec -perhost 32 -nolocal -n 384 -env I_MPI_FABRICS shm:dapl ./wrf.exe ) i've got all 32 cores on 100% on each nodes.  

when i use hydra (mpirun -perhost 32 -nolocal -n 384 -env I_MPI_FABRICS shm:dapl ./wrf.exe) i've got only 26 cores on 100% the rest are about 0% cpu time. and the performance decrease about two times.
What is the explanation? how to fix this?
 
Regards,

SK

0 Kudos
6 Replies
James_T_Intel
Moderator
744 Views

Can you run IMB under both MPD and Hydra and send the results?  Also, please compare the performance of Hydra under 4.1 Update 3 with Version 5.0 Update 1 (the most recent).

0 Kudos
kalen__stoi
Beginner
744 Views

Hi James,

Thanks for replying me.

I have run IMB both with mpd and hydra, the attached txt files. i've also attached two pictures that show the cpu usage in both cases.

To do the ubdate will take more time.

Regads,

SK

0 Kudos
James_T_Intel
Moderator
744 Views

Please run again with I_MPI_DEBUG=5.  Just "IMB-MPI1 PingPong" is sufficient, no need for the other tests.

Are you running under a job scheduler?

You should be able to use -hostfile for both MPD and Hydra.  Best to keep this consistent.

0 Kudos
kalen__stoi
Beginner
744 Views

Hi James,

>Are you running under a job scheduler? - no

>You should be able to use -hostfile for both MPD and Hydra.  Best to keep this consistent. - it doesn't work with mpd: invalid "local" arg: -hostfile. may be i should use -machinefile instead?

the results are attached.

Regards,

SK

 

0 Kudos
James_T_Intel
Moderator
744 Views

Ok, this appears to be a pinning problem.  We have improved the pinning since 4.1 Update 3, so updating to the current version could fix this.  You can install it in your user folder to test without having to upgrade your entire cluster.

If you need to stay on 4.1 Update 3, there are some steps we can try.  First, try setting

[plain]I_MPI_PIN_MODE=lib[/plain]

If this doesn't work, try setting one of the following (the first is better, as it is less specific)

[plain]I_MPI_PIN_PROCESSOR_LIST=all

I_MPI_PIN_PROCESSOR_LIST=0-31[/plain]

0 Kudos
kalen__stoi
Beginner
744 Views

the last is working :)

Thx!

SK

0 Kudos
Reply