Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

running Intel MPI 4.0 with SLURM

Avi_Purkayastha
Beginner
2,779 Views
Hi,
I have just installed intel mpi (v4.0.0.027) on a nahelem-IB based cluster system which uses the SLURM resource manager. All of the compilers and mpi stacks are installed using modules, including the intel mpi. After I load the intel-mpi module, build the application, and try to run it using a SLURM batch file, the program crashes, as the intel-mpi runtime environment does not obtain all of the SLURM environments. I get the message..
<---
mpiexec_rm1867: cannot connect to local mpd (/tmp/mpd2.console_apurkay); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
:
--->
It does not get the information on the compute nodes and tries instead to run on the login-console which is not the place to run, and hence fails.
I assume then that the SLURM environments relating to the mpd.hosts file was not captured by the intel-mpi? If so, what runtime parameters or environments do need to pass/define in the slurm batch script?
BTW, the default setup with OMPI/OpenFabrics and SLURM works fine.
Thanks for any help.
-- Avi
0 Kudos
19 Replies
Dmitry_K_Intel2
Employee
2,779 Views
Hi Ali,

Intel MPI Library recognizes SLURM_JOBID, SLURM_NNODES, SLURM_NODELIST and some other environment variables. But you need to use mpirun to start your application. Only in this case mpd ring will be created.

You probably need to add '-nolocal' option because a node on which you start your application will be added to the ring automatically.

Regards!
Dmitry
0 Kudos
Avi_Purkayastha
Beginner
2,779 Views
Hi Dmitry,
thanks for the pointers.
I verified that the SLURM_NODELIST was recognized and echoed that variable inside the batch script.
However when I run mpirun inside the slurm batch-script with the following options..
mpirun -np 2 -nolocal ./a.out
I am getting the following errors..
<--
Node list is rm[1203-1204]
mpdboot_rm1203 (handle_mpd_output 905): from mpd on rm1204, invalid port info:
connect to address 10.1.4.180 port 544: Connection refused
connect to address 10.1.4.180 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
rm1204: Connection refused
-->
What am I missing? I can print out other relevant SLURM variables to obtain additional clues.
Thanks
-- Avi
0 Kudos
Dmitry_K_Intel2
Employee
2,779 Views
Hi Avi,

Are you using rsh or ssh connection between nodes?
If you are using ssh you need to provide '-r ssh' option.
Please be sure that there is passwordless connection and you can login both from rm1203 to rm1024 and vice versa.

Regards!
Dmitry
0 Kudos
reuti_at_intel
2,779 Views
Hi,

passwordless (better: passphraseless) ssh needs to be set up for each user, and they might be tempted to copy these keypairs to other systems to ease their login. It's advantageous to have a hostbased login instead:

http://gridengine.sunsource.net/howto/hostbased-ssh.html

-- Reuti
0 Kudos
Avi_Purkayastha
Beginner
2,779 Views
Hi,
we are using passphraseless ssh, and have used it with our other MPI installs with SLURM. However when I tried running adding the '-r ssh' as an mpirun run option with the intel-mpi, I keep getting the same error message..
<---
Node list is rr[108,129]
mpdboot_rr108 (handle_mpd_output 905): from mpd on rr129, invalid port info:
connect to address 10.1.0.129 port 544: Connection refused
connect to address 10.1.0.129 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
Node list is rr[108,129]mpdboot_rr108 (handle_mpd_output 905): from mpd on rr129, invalid port info:connect to address 10.1.0.129 port 544: Connection refusedconnect to address 10.1.0.129 port 544: Connection refusedtrying normal rsh (/usr/bin/rsh)
:
-->
I also looked at the 'mpirun --help', and did not see '-r ssh' as an option that was listed. In fact there were no options listed for using ssh.
Thanks
-- Avi
0 Kudos
TimP
Honored Contributor III
2,779 Views
mpirun -help
......
--rsh specifies the name of the command used to start remote mpds; it
defaults to rsh; an alternative is ssh
--shell says that the Bourne shell is your default for rsh'
--verbose shows the ssh attempts as they occur; it does not provide
confirmation that the sshs were successful

mpirun -version
Intel MPI Library for Linux Version 4.0
Build 20100422 Platform Intel 64 64-bit applications

As Dmitry said, you must try stand-alone ssh in both directions among the offending nodes, from the relevant account, to guard against problems such as ~/.ssh/known_hosts containing stale information.

mpdboot (with the appropriate nodelist) followed by mpdtrace and mpdallexit can be used for one-time check on this problem without actually waiting for a chance to run the entire application.
0 Kudos
Avi_Purkayastha
Beginner
2,779 Views
I had set N=n=2 in the slurm script so one proc in each node were to communicate with each other. When I ran
Node list is rm[1562-1563]
% mpdboot-n 2 -v -r ssh
I got back..
totalnum=2 numhosts=1
there are not enough hosts on which to start all processes
mpdtrace
mpdtrace: cannot connect to local mpd (/tmp/mpd2.console_apurkay); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
This may be the reason why mpirun is not running as mpd is not running, given that a compute node from where the job is launched is also the host and a compute node at the same time.
Any suggestions for a fix?
Thanks
-- Avi
0 Kudos
Dmitry_K_Intel2
Employee
2,779 Views
Avi,

By default, mpdboot is looking for mpd.hosts in the current directory to get information about nodes. Mpdboot doesn't recognize SLURM settings!
If you don't have mpd.hosts file use '-f hosts_file.txt'
In your case it might look like: 'mpdboot -f $SLURM_NODELIST -n 2 -r ssh'

Regards!
Dmitry
0 Kudos
Avi_Purkayastha
Beginner
2,779 Views
Hi Dimitry,
I made the change that you suggested in the script file..
% mpdboot -f $SLURM_NODELIST -n 2 -v -r ssh
% mpdtrace
Unfortunately the result is the same as before..
:
mpdboot hostlists
totalnum=2 numhosts=1
there are not enough hosts on which to start all processes
mpdtrace
mpdtrace: cannot connect to local mpd (/tmp/mpd2.console_apurkaya); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
So it appears the key to the problem to be resolved is for recognition of the mpd.hosts file or equivalent by mpdboot, which is still not happening.
Cheers
-- Avi
0 Kudos
Dmitry_K_Intel2
Employee
2,779 Views
Avi,

Could you print out $SLURM_NODELIST? (echo $SLURM_NODELIST)

Regards!
Dmitry
0 Kudos
Dmitry_K_Intel2
Employee
2,779 Views
Avi,

Just one thought:
Do you run your commands after salloc? Or might be you use sbatch or srun commands?


Could you provide details about all commands used to start an application?

Regards!
Dmitry
0 Kudos
Avi_Purkayastha
Beginner
2,779 Views
I have actually been doing that. I may not have cut-pasted it on the messages on this thread. But here is a snippet from the batch script and the latest output file...
In the batch script, I have ..
echo "Node list is" $SLURM_NODELIST
In the output file, I get..
Node list is rr[10,72]
-- Avi
0 Kudos
Avi_Purkayastha
Beginner
2,779 Views
Dmitry,
We use a batch slurm script using sbatch, srun and mpirun. We do not use salloc, although there is nothing to prevent us from using salloc.
Here is a simple SLURM batch script that we have used. We use something like "% sbatch intel.batch" to submit the job where intel.batch looks something like..
<---
#!/bin/bash
#SBATCH --time=01:00:00 # WALLTIME
#SBATCH -N 2 # Number of nodes
#SBATCH -n 2 # Number of cores/processors
#SBATCH -o intel-mpi-%j.log
#SBATCH -p pbatch
#SBATCH --job-name intel-mpi-test # job name
echo "Node list is" $SLURM_NODELIST
### testing mpd ###
### module load intelMPI/4.0.0.027
echo "mpdboot hostlists"
mpdboot -f $SLURM_NODELIST -n 2 -v -r ssh
echo "mpdtrace"
mpdtrace
echo "mpdallexit"
mpdallexit
########
mpirun -np 2 -nolocal --ssh ./a.out
-->
Cheers
-- Avi
0 Kudos
Dmitry_K_Intel2
Employee
2,779 Views
Avi,

and what is the output when you run "% sbatch intel.batch"?

Regards!
Dmitry
0 Kudos
Avi_Purkayastha
Beginner
2,779 Views
When the job is submitted, there is a normal response with a job id # coming back.
rrlogin1<8>sbatch intel-rr.batch
Submitted batch job 3852
rrlogin1<8>sbatch intel-rr.batchSubmitted batch job 3852
-- Avi
0 Kudos
Dmitry_K_Intel2
Employee
2,779 Views
Avi,

In your script intel-rr.batch, please remove "mpdboot -f $SLURM_NODELIST -n 2 -v -r ssh" - you cannot use 'mpdboot'! You need to use mpirun instead.
Change your command line for mpirun:
mpirun -r ssh -nolocal -np 2 ./a.out
Check the log after "sbatch intel-rr.batch" - the format of node list ('rr[10,72]') should be parsed correctly.

Please provide the output if the problem persists.

Regards!
Dmitry



0 Kudos
Avi_Purkayastha
Beginner
2,779 Views
Dmitry,
I had originally tried just with mpirun only to no avail. So here's the complete run script and output..
<--- run script
#!/bin/bash
#SBATCH --time=10:00 # WALLTIME
#SBATCH -N 2 # Number of nodes
#SBATCH -n 2 # Number of cores/processors
#SBATCH --job-name intel-mpi_test # Name of job
##SBATCH -p inter # see "Queues" section for details
#SBATCH -o intel-mpi-rr.out.%j
echo "Node list is" $SLURM_NODELIST
cd /home/apurkaya/apps/OMB-3.1.1/intel-mpi/tests/2-node
mpirun -r ssh -nolocal -np 2 ./osu_bw
--->
<--- output
Node list is rr[76-77]
mpdboot_rr76 (handle_mpd_output 846): mpdboot: can not get anything from the mpd daemon; please check connection to rr77
-->
0 Kudos
Dmitry_K_Intel2
Employee
2,779 Views
Avi,

mpirun was able to parse host name correctly.
Please check that you are able to login on node rr77 from rr76 and vice versa without entering password:
ssh rr77
(from rr77) ssh rr76

Regards!
Dmitry
0 Kudos
Andrey_D_Intel
Employee
2,779 Views
Avi,

Could you try the following scenario?
I assume that you have hello application. It may be any simple MPI test

1. Create test.sh file. For instance,
$ cat test.sh

#!/bin/bash

srun hostname -s | sort -u >mpd.hosts
source /opt/mpi-4.0.026/bin64/mpivars.sh

# Example 1
# Launch application using hydra process manager
mpiexec.hydra -f mpd.hosts -n $SLURM_NPROCS -env I_MPI_DEBUG 5 ./hello

# Example 2
# Launch application using MPD process manager
mpdboot -n $SLURM_NNODES -r ssh
mpiexec -n $SLURM_NPROCS -env I_MPI_DEBUG 5 ./hello
mpdallexit

2. submit a job using sbatch command. For instance,
$ sbatch -n 4 test.sh

Please let me know how the suggestion help.

Best regards,
Andrey

0 Kudos
Reply