Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Problems mpdboot

carlos_veralive_cl
458 Views
Hello,

I am having a the following problem when executing mpdboot:

$ mpdboot -n 2 -f /home/comsol/mpd.hosts -r ssh
mpdboot_cluster (handle_mpd_output 672): Failed to establish a socket connection with cl1n001:42406 : (111, 'Connection refused')
mpdboot_cluster (handle_mpd_output 689): failed to connect to mpd on cl1n001


I need to utilize mpi to be able to make Comsol 3.5 work in parallel form.
Comsol is paralleled in the following form:
cluster comsol35/bin> ./comsol -nn 2 mpd boot -f /home/comsol/mpd.hosts


The error I get is:

mpdboot_cluster (handle_mpd_output 725): from mpd on cl1n001, invalid port info:
cl1n001: Connection refused

Information:
Operating System: SLES 10 sp2
Version Intel Mpi: 3.1

I really hope someone can help me.

Thank you.
0 Kudos
2 Replies
TimP
Honored Contributor III
458 Views
Does ssh without password connect to that node, or does it refuse to connect? This can be as simple as stale entries in ~/.ssh/known_hosts or a disconnected or powered off component.
0 Kudos
Gergana_S_Intel
Employee
458 Views
Hi Carlos,

The issue here is that, when you try to start the MPD daemons from the 'cluster' node, it's unable to connect to the 'cl1n001' node.

As Tim mentioned, can you verify that passwordless SSH is setup on the cluster? Meaning that you can ssh from cluster to cl1n001 without being prompted for a password? That's a requirement for the Intel MPI Library.

Also, make sure that no old MPD daemons are running on the cluster. To do so, execute:

$ ps aux | grep mpd

If you see a listing of any 'mpd' python processes running under your account, kill -9 those to clear out the port Intel MPI is trying to use (both for cluster and cl1n001).

Finally, this could be an issue where Intel MPI tries to create the initial mpd logfile but it can't. By default, this will be done in /tmp on the node. Can you verify that you have access and can indeed write into /tmp, or if there is a file called /tmp/mpd2.logfile_?

Generally, I would also recommend upgrading to the latest Intel MPI Library 3.2 Update 1.

Regards,
~Gergana
0 Kudos
Reply