Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

How to use correctly mpitune to optimize mpi on our cluster ?

Guillaume_De_Nayer
1,020 Views
Hi,

we have a little cluster with:
- a master (2*4 cores)
- 8 nodes (2*6 cores per node)
- infiniband (DDR connectX)
- we are using the Intel Cluster Toolkit. I_MPI_FABRIC is set to "shm:ofa".

I'm trying to optimize the use of the mpi library with "mpitune". And I have a few questions...

- Does "mpitune" work under a batch system like torque ohne special tricks ? at the moment I have to define a --ppn-range to limit mpitune. If i'm starting a job with "nodes=1:ppn=4" I have to limit mpitune with "--ppn-range 1:4". Is there a better way ?

- If I run "mpitune" on the full cluster, mpitune will generate a lot of files (for example mpiexec_shm-ofa_nn_1_np_2_ppn_2.conf) for each configuration. What should I do with these files ? Can I just copy them under the etc directory of impi and the a "mpiexec -tune" will use the correct file ? For example if I'm running a job with "nodes=2:ppn=4", will mpiexec select the good tuning conf file automaticly ?

Thx a lot!
Best regards,
Guillaume


0 Kudos
1 Solution
Dmitry_K_Intel2
Employee
1,020 Views
Guillaume,

Unfortunately, mpitune doesn't understand scheduler's settings. Providing just '-hf ./pbs_machines' mpitune will run tuning for 1 and 2 nodes, for all available PPNs (from 1 to 12 in your case).
If you want to run mpitune for only 2 nodes and ppn=4 you need to use the following options:

mpitune -hf -fl shm:ofa -dl -pr 4 -hr 2

where:
fl - fabric list - in your case you need only one fabric as I understood
dl - device list - it should be empty in your case
pr - number of processes
hr - number of nodes (hosts range)

mpirun just pass -tune option through to mpiexec, so mpirun will work fine as well.

BTW: you can update Intel MPI library up to version 4.0.1 - The library itself is a bit faste and mpitune is a bit better.

Regards!
Dmitry

View solution in original post

0 Kudos
5 Replies
Dmitry_K_Intel2
Employee
1,020 Views
Hi Guillaume,

What iMPI version do you use?

If your application requires 4 cores only then "--ppn-range 1:4" option is correct.

Yes, you can move all conf files into etc directory. In this case running 'mpiexec -tune' you don't need to provide the full path to the config file - it will be selected automatically.
Alternatively you can use exact name of the config file. Starting mpiexec with -ppn 4 you just need to chose conf file with ppn_4 in the name (together with -tune option).

Regards!
Dmitry

0 Kudos
Guillaume_De_Nayer
1,020 Views
Hi,

I'm using the intel cluster toolkit version 4.0.0.020 with iMPI version 4.0.0.028.

The problem with a mpitune, started under torque, is:
if I'm using this batch file:
#!/bin/bash
#PBS -N mpi_tuning_2x4
#PBS -j oe
#PBS -l nodes=2:ppn=4
export TERM=xterm
NUMPROCS=`wc -l < $PBS_NODEFILE`
cat $PBS_NODEFILE > $PBS_O_WORKDIR/pbs_machines
echo $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
mpitune -hf ./pbs_machines

mpitune will be started on the 2 nodes and the full available cores on these 2 nodes (so 2x12 and not 2x4). So my question was is there an option (other that --ppn-range) that mpitune starts just on the cores that I ask in the batch files ?

Naturally I can modify my batch file for each configuration with a different value of --ppn-range, but I'm lazy :)


Ok so then mpitune and mpiexec are "intelligent"! great news! so an "mpirun -tune" will work too, isn't it ?

Best regards!
Guillaume
0 Kudos
Dmitry_K_Intel2
Employee
1,021 Views
Guillaume,

Unfortunately, mpitune doesn't understand scheduler's settings. Providing just '-hf ./pbs_machines' mpitune will run tuning for 1 and 2 nodes, for all available PPNs (from 1 to 12 in your case).
If you want to run mpitune for only 2 nodes and ppn=4 you need to use the following options:

mpitune -hf -fl shm:ofa -dl -pr 4 -hr 2

where:
fl - fabric list - in your case you need only one fabric as I understood
dl - device list - it should be empty in your case
pr - number of processes
hr - number of nodes (hosts range)

mpirun just pass -tune option through to mpiexec, so mpirun will work fine as well.

BTW: you can update Intel MPI library up to version 4.0.1 - The library itself is a bit faste and mpitune is a bit better.

Regards!
Dmitry
0 Kudos
Guillaume_De_Nayer
1,020 Views
Great! Thx a lot Dmitry! mpitune "burns" now our cluster!

Have a nice day,
Guillaume
0 Kudos
Dmitry_K_Intel2
Employee
1,020 Views
Guillaume,

Really it was not clear why you need to run a task on 2 nodes using 4 cores but not 1 node and 8 cores...

Might it's not clear from the documentation, but there are 2 modes of tuning: cluster specific and application specific.
Cluster specific tuning will try to find optimal settings for your cluster depending on the workload and it creates different *.conf files for different number of nodes and PPNs.
Application specific tuning worth to run if you are going to get optimal settings for one application only and for specific configuration. In this case *.conf files should not be moved into etc directory.

Best wishes,
Dmitry
0 Kudos
Reply