Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Sangamesh_B_
Beginner
55 Views

mpiexec fails under SGE

Hi everyone,

I'm trying to run Intel MPI-3.2.1 on a SGI Altix Linux cluster under SGE-6.2. It fails with following error:

cat output.32.Hello
/var/sge/default/spool/r1i0n12/active_jobs/32.1/pe_hostfile
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
r1i0n12
mpdroot: cannot connect to local mpd at: /tmp/32.1.all.q/mpd2.console_root_r1i0n12
probable cause: no mpd daemon on this machine
possible cause: unix socket /tmp/32.1.all.q/mpd2.console_root_r1i0n12 has been removed
mpiexec_r1i0n12 (__init__ 1162): forked process failed; status=255

But, if job is submitted without using SGE(i.e. from command line) then it works well on the same set of nodes

The mpi job is submitted using mpiexec command and mpd's are already booted by root and user has MPD_USE_ROOT_MPD=1 in .mpd.conf file in his home directory.

What could be the reason for failure here?

Thanks
0 Kudos
1 Reply
Dmitry_K_Intel2
Employee
55 Views

Hi San,

It seems to me that SGE changes TMPDIR environment variable and after that mpdroot cannot find console file.
Could you set I_MPI_MPD_TMPDIR=/tmp before you create an mpd ring and give it a try? Don't forget to set this variable for the user.

Please let me know if it doesn't help.

Regards!
Dmitry
Reply