Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

mpdboot error: Failed to establish a socket connection with node:53317 (111, 'Connection refused')

Minia_Oseguera
Beginner
945 Views

Hi, i have this error when i run the followed script on a cluster:

--------------SRIPT---------------------

#!/bin/bash

# Start mpd daemons on all compute nodes

echo "Shutting down any existing mpd daemon"

mpdallexit

echo "Starting MPI on all nodes"

mpdboot -r ssh -n 8 -f $HOME/mpd.hosts

echo "MPI was initialized on the following nodes:"

mpdtrace

-------------- ERROR --------------------

Shutting down any existing mpd daemon

mpdroot: cannot connect to local mpd at: /tmp/mpd2.console_root

probable cause: no mpd daemon on this machine

possible cause: unix socket /tmp/mpd2.console_root has been removed

mpdallexit (__init__ 1470): forked process failed; status=255

Starting MPI on all nodes

mpdboot_gdc-cluster (handle_mpd_output 883): Failed to establish a socket connection with compute-00-00:53317 : (111, 'Connection refused')

mpdboot_gdc-cluster (handle_mpd_output 900): failed to connect to mpd on compute-00-00

MPI was initialized on the following nodes:

mpdroot: cannot connect to local mpd at: /tmp/mpd2.console_root

probable cause: no mpd daemon on this machine

possible cause: unix socket /tmp/mpd2.console_root has been removed

mpdtrace (__init__ 1470): forked process failed; status=255

-----------------------------------------------

Anyone knows why is happening that? thanks!

0 Kudos
1 Reply
Dmitry_K_Intel2
Employee
945 Views
Hi Minia,

Could you please give us information about MPI library.
And please confirm that you have set password-less connection between nodes.
Try to run: 'ssh compute-00-01 hostname'

Regards!
Dmitry
0 Kudos
Reply