- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I run using mpirun version of intel mpi 4.0 and sometimes the process down with the message
[04:34 PM]claudial@cystorm1:Ba10P6O24F2$ cat nodelist
n202
n204
n21
n22
n23
n24
n25
n26
[04:34 PM]claudial@cystorm1:Ba10P6O24F2$ cat Ba10.celldynamic-30.out
running mpdallexit on n204
LAUNCHED mpd on n204 via
RUNNING: mpd on n204
LAUNCHED mpd on n202 via n204
LAUNCHED mpd on n26 via n204
LAUNCHED mpd on n25 via n204
LAUNCHED mpd on n24 via n204
RUNNING: mpd on n202
LAUNCHED mpd on n23 via n202
LAUNCHED mpd on n22 via n202
LAUNCHED mpd on n21 via n202
mpdboot_n204 (handle_mpd_output 846): mpdboot: can not get anything from the mpd daemon; please check connection to n22
This is a problem of mpirun option?
Regards
[04:34 PM]claudial@cystorm1:Ba10P6O24F2$ cat nodelist
n202
n204
n21
n22
n23
n24
n25
n26
[04:34 PM]claudial@cystorm1:Ba10P6O24F2$ cat Ba10.celldynamic-30.out
running mpdallexit on n204
LAUNCHED mpd on n204 via
RUNNING: mpd on n204
LAUNCHED mpd on n202 via n204
LAUNCHED mpd on n26 via n204
LAUNCHED mpd on n25 via n204
LAUNCHED mpd on n24 via n204
RUNNING: mpd on n202
LAUNCHED mpd on n23 via n202
LAUNCHED mpd on n22 via n202
LAUNCHED mpd on n21 via n202
mpdboot_n204 (handle_mpd_output 846): mpdboot: can not get anything from the mpd daemon; please check connection to n22
This is a problem of mpirun option?
Regards
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi JP,
Considering that mpirun launches the daemons successfully on nodes n204 and n202, it might be an issue with your connection to node n22. Can you verify the node is up and running (maybe via 'ping')? Also, make sure you can log into the node without being prompted for a password. For example, can you do:
ssh n22 hostname
from n204 (or any other node)?
Finally, make sure you don't have any security settings and/or firewalls preventing you to connect to the other nodes on your cluster.
Let me know what happens or if you have questions.
Regards,
~Gergana
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page