Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Intel MPI 2.0 : unable to ping

martialp
Beginner
546 Views
My application (using IntelMPI2.0)works OK on Linux workstation but failed on Clusters with the following message:
"mpdboot_gtda127_0 (mpdboot 499): problem has been detected during mpd(boot) startup at 1 gtda128; output:
mpdboot_gtda128_1 (err_exit 526): mpd failed to start correctly on gtda128
reason: 1: unable to ping local mpd; invalid msg from mpd :{}:
mpdboot_gtda128_1 (err_exit 539): contents of mpd logfile in /tmp:
logfile for mpd with pid 23309
mpdboot_gtda127_0 (err_exit 526): mpd failed to start correctly on gtda127

All nodes seems to be accessible via the ping command and rsh. A clean up of all the nodes has been done using mpdallexit. As this application failed for a remote customer I have not much information on his cluster.
Can you please explain whatcan bethe causes of this error message, thanks.
0 Kudos
3 Replies
Gergana_S_Intel
Employee
546 Views

Hi martialp,

The Intel MPI Library 2.0 is a very old version of the product. The first thing I would advise is to upgrade to the latest Intel MPI Library 3.2 Update 1 for Linux*. You can download the latest package from the Intel Registration Center.

Are you able to log into each node of the cluster without being prompted for a password? That's a requirement for the Intel MPI Library. Does this error always happen for node gtda128? If yes, can you verify the node is functioning correctly? Additionally, can you provide the /tmp/mpd2.logfile_ contents from gtda127 and gtda128? That might provide us with additional information on the error.

Finally, you can also try logging into gtda128 and running "ps aux | grep mpd" to make sure there are no existing mpd processes running. If there are, go ahead and "kill -9" them, and try again.

Regards,
~Gergana

0 Kudos
martialp
Beginner
546 Views

Hi martialp,

The Intel MPI Library 2.0 is a very old version of the product. The first thing I would advise is to upgrade to the latest Intel MPI Library 3.2 Update 1 for Linux*. You can download the latest package from the Intel Registration Center.

Are you able to log into each node of the cluster without being prompted for a password? That's a requirement for the Intel MPI Library. Does this error always happen for node gtda128? If yes, can you verify the node is functioning correctly? Additionally, can you provide the /tmp/mpd2.logfile_ contents from gtda127 and gtda128? That might provide us with additional information on the error.

Finally, you can also try logging into gtda128 and running "ps aux | grep mpd" to make sure there are no existing mpd processes running. If there are, go ahead and "kill -9" them, and try again.

Regards,
~Gergana


Hello Gergana

Thanks a lot for your suggestions.
At last my customer found that the rshused on these machneswas from Kerberos and not the one under /usr/bin. It works OK now.

Regards
Martial
0 Kudos
Gergana_S_Intel
Employee
546 Views
Quoting - martialp
At last my customer found that the rshused on these machneswas from Kerberos and not the one under /usr/bin. It works OK now.

Hi Martial,

I'm glad everything worked out and your customer is able to use the library successfully. As a final note, I still want to recommend upgrading to the latest version 3.2 Update 1 of the Intel MPI Library, if you or your customer is able to do it.

Let us know if you hit any further problems.

Regards,
~Gergana

0 Kudos
Reply