Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

mpdboot for a cluster

homng
Beginner
784 Views
Dear all,
I have following lines in my job script:
# Number of cores:
#SBATCH --nodes=8 --ntasks-per-node=8
## Set up job environment
source/site/bin/jobsetup
#startmpd:
mpdboot -n 64
## Run the program:
../bin/sem3dcaf ../input/test_nproc64_sf2.psem
I amtryingto run a coarray program with 64 cores (8cores per node). But I could not correctly start mpd. I am new to intel cluster toolkit. I would be grateful for the suggestion.
Thanks.
0 Kudos
4 Replies
Dmitry_K_Intel2
Employee
784 Views
Hi homng,

Seems that it would be nice to read Getting Started from Intel MPI Library documentation.

mpdboot create an mpd ring and '-n' says how many nodes will be used. In case of 'mpdboot -n 12' mpd ring will be created using 12 nodes - doesn't matter how many cores on each.

So, you need to change your script:
#startmpd:
mpdboot -n 8

To successfully create an mpd ring you need to have passwodless ssh connection. You can check it by:
$ ssh node_one
From node_one:
$ ssh node_two
and so on.

Usually mpdboot gets list of nodes from mpd.hosts file located in the currect directory, but you can use '-f' option.

mpdtrace shows you a list of nodes in the ring.

I assumed that environment was set up properly.

Regards!
Dmitry

0 Kudos
homng
Beginner
784 Views
Thank you for the suggestion. I went through the documentation but could not really figure out the way to start the mpd from job script. For example, I tried following job script:

#SBATCH --nodes=2 --ntasks-per-node=8
## Set up job environment
source/site/bin/jobsetup
#startmpd:
mpdboot -n 2
But I get the following errors:
totalnum=2 numhosts=1
there are not enough hosts on which to start all processe
totalnum=2 numhosts=1there are not enough hosts on which to start all processes
Possible cause would be that other node is not visible, butI confirmed that I can ssh to all the nodes without password. I can run ordinary MPI program without problem using"mpirun -n 16 test"in the job script. But of course I cannot define the mpd.hosts file because in the big cluster I don't know in advance which nodes will be allocated for my job. I think I am missing some key points!
Your help is greatly appreciated.
Thanks
0 Kudos
homng
Beginner
784 Views
Finally, I think I solved the problem. Thanks.
0 Kudos
Dmitry_K_Intel2
Employee
784 Views
If you run your application under any job scheduler you need to use 'mpirun -n #_of_processes program' because mpirun understands scheduler's settings readding appropriate environment variables.

In case of co-array, it seems to me that compiler creates a config file which is used by a program itself and you don't need to create an mpd ring. I believe that people from Fortran forum may provide more information about co-array programs invocations.

Regards!
Dmitry
0 Kudos
Reply