- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We found when the environment variable TMPDIR is set to the current directory, not matter it is '.', './', or full path, the Intel MPI failed to run.
This happens on all MPI versions ( including 4.0)
[linfa@babbage testrun]$ setenv TMPDIR .
[linfa@babbage testrun]$ {/opt/intel/impi/3.2.0.011/bin64/mpirun} -n 2
mpdboot_babbage.tx.altair.com (handle_mpd_output 730): Failed to establish a socket connection with babbage:54848 : (111, 'Connection refused')
mpdboot_babbage.tx.altair.com (handle_mpd_output 747): failed to connect to mpd on babbage
Is this a bug? Is there any workaround ? Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey linfa,
I would actually recommend upgrading to our newest version: Intel MPI Library 4.0 Update 3. You can grab it from the Intel Registration Center. While I was able to reproduce this with the 4.0 release, I don't see this issue with the 4.0.3 release:
[user@host1:~]> export TMPDIR=.
[user@node1:~]> /opt/intel/impi/4.0.0.025/bin64/mpirun -n 2 hostname
mpdboot_node1 (handle_mpd_output 949): Failed to establish a socket connection with node1:33751 : (111, 'Connection refused')
mpdboot_node1 (handle_mpd_output 969): failed to connect to mpd on node1
[user@node1:~]> /opt/intel/impi/4.0.3/bin64/mpirun -n 2 hostname
node1
node1
We have a new default process manager in 4.0.3. I don't believe we supported the shorthand symbols with our old PM.
Give this a try and let us know how it goes.
Regards,
~Gergana
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Linfa,
1) A process manager is the part of the library that launches the MPI ranks, interracts with the job or batch schedulers, makes the physically connections between the nodes (e.g. via ssh), etc. It would also do parsing of your command and any env variables you're using (like TMPDIR) to start that job.
In older versions of our library, we used the Multi-Purpose Daemons (MPDs) as the process manager. In the 4.0.3 version and later, we use the Hydra process manager. Hydra has some advantages to the MPDs - as you can see here.
2) I recommend updating the full SDK - if you have a valid license, that would be free and easy to do. But, if not possible, all of our 4.0.x packages are compatible with each other. So you can simply update the runtimes and be ok.
3) The only workaround would be to use the full path:
[user@node1:~]> export TMPDIR=/home/user
[user@node1:~]> /opt/intel/impi/4.0.0.025/bin64/mpirun -n 2 hostname
node1
node1
Is your customer just running your application? If yes, they can simply update the runtimes. Those are available as a free download from our website: www.intel.com/go/mpi.
Does that sound reasonable?
Regards,
~Gergana
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page