Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

mpd error

sudh
Beginner
691 Views

Hello,

I get the following error on my cluster when I submit jobs

mpiexec_node050: cannot connect to local mpd (/tmp/mpd2.console_sudharshan); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)

While I see that this error has been discussed in the threads before, what I see is that the error pops up quite unpredictably. While my job runs fine with a particular number of processors, and when I submit it again with a different number of processors, this error comes up. It is not clear under what conditions I get this issue. I have been getting this error for the same number of processors with which I have been able to run jobs fine, with the same scripts and with the same code. Any siggestion/help shall be sincerely appreciated.

0 Kudos
2 Replies
Sangamesh_B_
Beginner
691 Views
Quoting sudh
While my job runs fine with a particular number of processors, and when I submit it again with a different number of processors, this error comes up. It is not clear under what conditions I get this issue. I have been getting this error for the same number of processors with which I have been able to run jobs fine, with the same scripts and with the same code. Any siggestion/help shall be sincerely appreciated.

Before you execute mpiexec command, does mpdtrace show list of all the nodes on which you want to run your job?

Are you using -machinefile option in your mpiexec command?
0 Kudos
Dmitry_K_Intel2
Employee
691 Views

Hi sudh,

Could you provide command line and library version?

Regards!

Dmitry

0 Kudos
Reply