Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

mpitune

jon
Beginner
437 Views
For our 128 node cluster, is it necessary to run mpitune for 1 node, 2 nodes, ... , 127 nodes, 128 nodes; or is it sufficient to run it once for all 128 nodes?
0 Kudos
1 Solution
Gergana_S_Intel
Employee
437 Views
Hi jon,

Running once on all 128 nodes will be sufficient. The mpitune utility uses the Intel MPI Benchmarks (IMB) to determine the best cluster settings. IMB itself makes sure to run over all proc sets: 1, 2, 4, ... up to 128 procs.

For example, running the Bcast benchmarks over 8 nodes would yield:

#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#----------------------------------------------------------------


then

#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#----------------------------------------------------------------


then

#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 8
#----------------------------------------------------------------


Each one of those for varying size messages.

Hope this helps.

Regards,
~Gergana

View solution in original post

0 Kudos
2 Replies
Gergana_S_Intel
Employee
438 Views
Hi jon,

Running once on all 128 nodes will be sufficient. The mpitune utility uses the Intel MPI Benchmarks (IMB) to determine the best cluster settings. IMB itself makes sure to run over all proc sets: 1, 2, 4, ... up to 128 procs.

For example, running the Bcast benchmarks over 8 nodes would yield:

#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#----------------------------------------------------------------


then

#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#----------------------------------------------------------------


then

#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 8
#----------------------------------------------------------------


Each one of those for varying size messages.

Hope this helps.

Regards,
~Gergana
0 Kudos
jon
Beginner
437 Views
Gergana,

I also had the impression that mpitune ran the benchmark suite for process counts of 1 to the number specified with -np, but it appears that the default rules.xml file runs the benchmarks only for the maximum number of processes. The relevant lines in rules.xml are all of the form

cmd_line = "IMB-MPI1 -npmin %procs% Sendrecv"

and when I run mpitune with logging enabled (with --logs), the log files indicate that the mulitnode tests are run only with the maximum number of processes.

Even so, it's not clear to me whether or not mpitune needs to be run with every possible process count, or if the values generated running with the maximum number of processes are good enough for other process counts.

Jon
0 Kudos
Reply