- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, i have this error when i run the followed script on a cluster:
--------------SRIPT---------------------
#!/bin/bash
# Start mpd daemons on all compute nodes
echo "Shutting down any existing mpd daemon"
mpdallexit
echo "Starting MPI on all nodes"
mpdboot -r ssh -n 8 -f $HOME/mpd.hosts
echo "MPI was initialized on the following nodes:"
mpdtrace
-------------- ERROR --------------------
Shutting down any existing mpd daemon
mpdroot: cannot connect to local mpd at: /tmp/mpd2.console_root
probable cause: no mpd daemon on this machine
possible cause: unix socket /tmp/mpd2.console_root has been removed
mpdallexit (__init__ 1470): forked process failed; status=255
Starting MPI on all nodes
mpdboot_gdc-cluster (handle_mpd_output 883): Failed to establish a socket connection with compute-00-00:53317 : (111, 'Connection refused')
mpdboot_gdc-cluster (handle_mpd_output 900): failed to connect to mpd on compute-00-00
MPI was initialized on the following nodes:
mpdroot: cannot connect to local mpd at: /tmp/mpd2.console_root
probable cause: no mpd daemon on this machine
possible cause: unix socket /tmp/mpd2.console_root has been removed
mpdtrace (__init__ 1470): forked process failed; status=255
-----------------------------------------------
Anyone knows why is happening that? thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please give us information about MPI library.
And please confirm that you have set password-less connection between nodes.
Try to run: 'ssh compute-00-01 hostname'
Regards!
Dmitry

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page