How to guarantee that the number of processes are properly allocated to each of the nodes

Sungho_H_ · ‎10-16-2017

Hello, currently I am new to using IMB benchmark and wanted to make sure

whether checking the configured number of processes (np) is evenly allocated in the nodes and each cores

Currently the IMB-benchmark provides the throughput per sec based on the fixed size of message with fixed size of duration

For example like this one below.

$ mpirun -np 64 -machinefile hosts_infin ./IMB-MPI1 -map 32x2 Sendrecv

#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         0.76         0.76         0.76         0.00
            1         1000         0.85         0.85         0.85         2.35
            2         1000         0.79         0.79         0.79         5.06
            4         1000         0.80         0.80         0.80        10.00
            8         1000         0.78         0.78         0.78        20.45
           16         1000         0.79         0.80         0.80        40.16
           32         1000         0.79         0.79         0.79        80.61
           64         1000         0.79         0.79         0.79       162.59
          128         1000         0.82         0.82         0.82       311.41
          256         1000         0.91         0.91         0.91       565.42
          512         1000         0.95         0.95         0.95      1082.13
         1024         1000         0.99         0.99         0.99      2076.87
         2048         1000         1.27         1.27         1.27      3229.91
         4096         1000         1.71         1.71         1.71      4802.87
         8192         1000         2.49         2.50         2.50      6565.97
        16384         1000         4.01         4.01         4.01      8167.28
        32768         1000         7.08         7.09         7.08      9249.23
        65536          640        22.89        22.89        22.89      5725.50
       131072          320        37.45        37.45        37.45      6999.22
       262144          160        65.74        65.76        65.75      7972.53
       524288           80       120.10       120.15       120.12      8727.37
      1048576           40       228.63       228.73       228.68      9168.57
      2097152           20       445.38       445.69       445.53      9410.86
      4194304           10       903.77       905.97       904.87      9259.29

#-----------------------------------------------------------------------------

However, this does not guarantee that the processes are evenly distributed to the nodes when I configure the different varying number of processors.

Would assigning to the specific core be possible in IMB-benchmark? or do I need to use the alternative benchmark to do this?

Klaus-Dieter_O_Intel · ‎11-07-2017

Running the code with the environment variable I_MPI_DEBUG=4 and higher will show you the actual pinning of processes to (groups of) cores on the nodes.

The distribution between nodes can be changed by flags like -ppn / -perhost or specifying the exact number of processes per node in the machinefile.

On each node the default pinning can be influenced by the I_MPI_DOMAIN (best for hybrid MPI+threads codes) or the I_MPI_PIN_PROESSOR_LIST variables.

All described in sections "2.3. Hydra Process Manager Command" and "3.2 Process Pinning" of the Intel® MPI Developer Reference available on https://software.intel.com/en-us/articles/intel-mpi-library-documentation/.