Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2161 Discussions

mpirun failed to start when TMPDIR=. is set

linfa
Beginner
748 Views

Hi,

We found when the environment variable TMPDIR is set to the current directory, not matter it is '.', './', or full path, the Intel MPI failed to run.

This happens on all MPI versions ( including 4.0)

[linfa@babbage testrun]$ setenv TMPDIR .
[linfa@babbage testrun]$ {/opt/intel/impi/3.2.0.011/bin64/mpirun} -n 2
mpdboot_babbage.tx.altair.com (handle_mpd_output 730): Failed to establish a socket connection with babbage:54848 : (111, 'Connection refused')
mpdboot_babbage.tx.altair.com (handle_mpd_output 747): failed to connect to mpd on babbage


Is this a bug? Is there any workaround ? Thanks.

0 Kudos
4 Replies
Gergana_S_Intel
Employee
748 Views

Hey linfa,

I would actually recommend upgrading to our newest version: Intel MPI Library 4.0 Update 3. You can grab it from the Intel Registration Center. While I was able to reproduce this with the 4.0 release, I don't see this issue with the 4.0.3 release:

[user@host1:~]> export TMPDIR=.
[user@node1:~]> /opt/intel/impi/4.0.0.025/bin64/mpirun -n 2 hostname
mpdboot_node1 (handle_mpd_output 949): Failed to establish a socket connection with node1:33751 : (111, 'Connection refused')
mpdboot_node1 (handle_mpd_output 969): failed to connect to mpd on node1
[user@node1:~]> /opt/intel/impi/4.0.3/bin64/mpirun -n 2 hostname
node1
node1

We have a new default process manager in 4.0.3. I don't believe we supported the shorthand symbols with our old PM.

Give this a try and let us know how it goes.

Regards,
~Gergana

0 Kudos
linfa
Beginner
748 Views
Hi Gergana,
Thanks for your quick reply. I have several questions
1) What is "default process manager"? How is it related to this issue?Could you explain to me a little bit more?
2) What should I update, SDK for building executable or run-time library only?
3) It is not a problem for me to update. But it is more difficult to ask our customer to do it. So I am wondering if there is an workaround.
Thanks.
Linfa
0 Kudos
Gergana_S_Intel
Employee
748 Views

Hi Linfa,

1) A process manager is the part of the library that launches the MPI ranks, interracts with the job or batch schedulers, makes the physically connections between the nodes (e.g. via ssh), etc. It would also do parsing of your command and any env variables you're using (like TMPDIR) to start that job.
In older versions of our library, we used the Multi-Purpose Daemons (MPDs) as the process manager. In the 4.0.3 version and later, we use the Hydra process manager. Hydra has some advantages to the MPDs - as you can see here.

2) I recommend updating the full SDK - if you have a valid license, that would be free and easy to do. But, if not possible, all of our 4.0.x packages are compatible with each other. So you can simply update the runtimes and be ok.

3) The only workaround would be to use the full path:

[user@node1:~]> export TMPDIR=/home/user
[user@node1:~]> /opt/intel/impi/4.0.0.025/bin64/mpirun -n 2 hostname
node1
node1

Is your customer just running your application? If yes, they can simply update the runtimes. Those are available as a free download from our website: www.intel.com/go/mpi.

Does that sound reasonable?

Regards,
~Gergana

0 Kudos
linfa
Beginner
748 Views
Thanks. That's what I want
0 Kudos
Reply