I've followed the instructions from the Intel MPI Getting started documentation, but I'm having problems getting mpd running across my system. I've done the following:
1) Verified that no python / mpd processes are running on compute nodes
2) Started mpdboot from head node "mpdboot -d -v -n 20 -r ssh"
mpdboot fails with a connection error. each time I run it it errors out on a different system..
debug: mpd on n14 on port 43729
mpdboot_n1 (handle_mpd_output 703): Failed to establish a socket connection with n14:43729 : (111, 'Connection refused')
mpdboot_n1 (handle_mpd_output 720): failed to connect to mpd on n14
When I ssh to n14, I do see mpd running...
n14:~ # ps -ef | grep python
root 7535 1 99 Jun09 ? 18:05:45 python /opt/intel/impi/3.1/bin/mpd.py -h icn4 -p 62021 --ifhn=172.18.1.14 --ncpus=1 --myhost=icn14 --myip=172.18.1.14 -e -d -s 20
After I clean up these mpd processes on the compute nodes, and try to re-run mpdboot.... i'll get a connection error on a different node..
Any ideas? By the way I can connect via SSH to any of the nodes OK.
my mpdboot fails also. I get similar errors, but I believe the reason for mpdboot failing is that it trys to start mpd on some random port number, which always turns out to be a blocked port (firewall). Cluster prodution environments are usually firewalled. see the error in my case
mpdboot_blade13 (handle_mpd_output 730): Failed to establish a socket connection with blade11:36126 : (113, 'No route to host')
First this could kind of error could be prevented by setting an ideal safe port range like in the MPICH2. In mpich2-1.0.7 the way to specify the port range for mpd to work in is to to set the MPICH_PORT_RANGE variable to a range of ports that are opened for use by the iptables. i.e.
How could one achieve the same objective in intel mpi (environment)?
The usual way to set up for mpi communication is to assure that you can ssh (without password) to each node. This is needed for initial installation of Intel MPI. Then, mpdboot should work with the -r ssh option.
ssh works fine without password. The question is how do u specify the port range that mpdboot should use to avoid forked mpd from engaging a port that is firewalled/blocked. mpdboot currently runs using any random port number. Are there any eny environmental variables to set to correct this.
Case in point MPICH2 current uses MPICH_PORT_RANGE=port_number_0:port_number_N and then you open this ports (port range) in your machine via iptables for MPICH2 mpdboot to use.