- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I'm trying to run Intel MPI-3.2.1 on a SGI Altix Linux cluster under SGE-6.2. It fails with following error:
cat output.32.Hello
/var/sge/default/spool/r1i0n12/active_jobs/32.1/pe_hostfile
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
mpdroot: cannot connect to local mpd at: /tmp/32.1.all.q/mpd2.console_root_r1i0n12
probable cause: no mpd daemon on this machine
possible cause: unix socket /tmp/32.1.all.q/mpd2.console_root_r1i0n12 has been removed
mpiexec_r1i0n12 (__init__ 1162): forked process failed; status=255
But, if job is submitted without using SGE(i.e. from command line) then it works well on the same set of nodes
The mpi job is submitted using mpiexec command and mpd's are already booted by root and user has MPD_USE_ROOT_MPD=1 in .mpd.conf file in his home directory.
What could be the reason for failure here?
Thanks
I'm trying to run Intel MPI-3.2.1 on a SGI Altix Linux cluster under SGE-6.2. It fails with following error:
cat output.32.Hello
/var/sge/default/spool/r1i0n12/active_jobs/32.1/pe_hostfile
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
mpdroot: cannot connect to local mpd at: /tmp/32.1.all.q/mpd2.console_root_r1i0n12
probable cause: no mpd daemon on this machine
possible cause: unix socket /tmp/32.1.all.q/mpd2.console_root_r1i0n12 has been removed
mpiexec_r1i0n12 (__init__ 1162): forked process failed; status=255
But, if job is submitted without using SGE(i.e. from command line) then it works well on the same set of nodes
The mpi job is submitted using mpiexec command and mpd's are already booted by root and user has MPD_USE_ROOT_MPD=1 in .mpd.conf file in his home directory.
What could be the reason for failure here?
Thanks
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi San,
It seems to me that SGE changes TMPDIR environment variable and after that mpdroot cannot find console file.
Could you set I_MPI_MPD_TMPDIR=/tmp before you create an mpd ring and give it a try? Don't forget to set this variable for the user.
Please let me know if it doesn't help.
Regards!
Dmitry
It seems to me that SGE changes TMPDIR environment variable and after that mpdroot cannot find console file.
Could you set I_MPI_MPD_TMPDIR=/tmp before you create an mpd ring and give it a try? Don't forget to set this variable for the user.
Please let me know if it doesn't help.
Regards!
Dmitry

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page