Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2226 Discussions

Unable to run IntelMPI on two differnet machines

martialp
Beginner
634 Views
Hello

I don't know if it is the right forum but I don't find a specific forum for IntelMPI.

Here is a sum up of the problems my custormer is currently facing:

Trying to run the job on 4 cpus on a single machine, it works fine but when he try to run the job on 2 cpus on machineA and 2 cpus on machineB it fails on machineB (whatever the machine: reversing the machine order in the host.list file, it's always the second machine which fails to run the job) with a message telling "You can't run mpdboot on machine 'name of the second machine" version of python should be >= 2.4 current version is ' ' (empty)". The job is launched from the first machine listed on the host.list file of the following command:

mpirun -f host.list -np $6 $IWRUN/bin/$NOMOS/csh_presti_ex -fl $input_data -output `pwd`/diagnostic -io_driver $_io_driver >> $IWETU/liste_presti 2>&1 ($6 is the number of cpu).

This is done using IntelMPI2.0 (which is, I know obsolete) but trying IntelMPI3.2 there is no error message but no jobs start either on machineA or machineB (even with verbose and debug option). We ask him to set the envronment variable I_MPI_DEBUG to 7 and we are waiting for the result of this test.

It seems that the problem seems related to a test in mpdboot.py inplying the function getversionpython.
NB The tests have been done using both rsh and ssh with the same results.

Thanks a lot for any suggestion.

0 Kudos
3 Replies
Intel_Software_Netw1
634 Views

I'm moving this to the correct forum. We are sorry for the delay.

Intel MPI Library is a fully supported product, so your customer can also seek help here: http://software.intel.com/en-us/articles/intel-mpi-library-support-resources/

==
Aubrey W.
Intel Software Network Support

0 Kudos
Gergana_S_Intel
Employee
634 Views

Hi Martial,

Starting with Intel MPI Library 3.2, all python checks have been disabled at startup. I actually think this points to a possible configuration issue with the system. Because, as you point out, version 2.0 is no longer support, I'd suggest you continue your experiments with 3.2.

Are the two nodes setup to use passworldless ssh? Meaning, can you log in from node1 to node2 (and vice versa) without being prompted for a password? If no, then the customer needs to set that up as it's a requirement for Intel MPI. Do you get any other prompts (e.g. RSA authenticity host checking)?

It'll also be good to do "mpdboot -d -v -r ssh -n 2 -f host.list" and provide the output. This will give more verbose information on what's going on during startup.

Looking forward to hearing back from you.

Regards,
~Gergana

0 Kudos
martialp
Beginner
634 Views

Hi Gergana

Thanksa lot for your answer.
I think we are facing a strange configuration at our customer site because all nodes are setup to use passwordless ssh and even the verbose mode gives no information, just as if mpdboot was not launched at all.
I hope we will have more information as soon as we will go to the site and check the complete installation. This thread can be closed now as we are quite sure this is related to the client installation

Once agaiin, thanks a lot.

Martial
0 Kudos
Reply