Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2161 Discussions

mpdboot take two via .. is correct?

jperaltac
Novice
383 Views
I run using mpirun version of intel mpi 4.0 and sometimes the process down with the message

[04:34 PM]claudial@cystorm1:Ba10P6O24F2$ cat nodelist
n202
n204
n21
n22
n23
n24
n25
n26
[04:34 PM]claudial@cystorm1:Ba10P6O24F2$ cat Ba10.celldynamic-30.out
running mpdallexit on n204
LAUNCHED mpd on n204 via
RUNNING: mpd on n204
LAUNCHED mpd on n202 via n204
LAUNCHED mpd on n26 via n204
LAUNCHED mpd on n25 via n204
LAUNCHED mpd on n24 via n204
RUNNING: mpd on n202
LAUNCHED mpd on n23 via n202
LAUNCHED mpd on n22 via n202
LAUNCHED mpd on n21 via n202
mpdboot_n204 (handle_mpd_output 846): mpdboot: can not get anything from the mpd daemon; please check connection to n22

This is a problem of mpirun option?

Regards
0 Kudos
1 Reply
Gergana_S_Intel
Employee
383 Views

Hi JP,

Considering that mpirun launches the daemons successfully on nodes n204 and n202, it might be an issue with your connection to node n22. Can you verify the node is up and running (maybe via 'ping')? Also, make sure you can log into the node without being prompted for a password. For example, can you do:

ssh n22 hostname

from n204 (or any other node)?

Finally, make sure you don't have any security settings and/or firewalls preventing you to connect to the other nodes on your cluster.

Let me know what happens or if you have questions.

Regards,
~Gergana

0 Kudos
Reply