- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, currently I am new to using IMB benchmark and wanted to make sure
whether checking the configured number of processes (np) is evenly allocated in the nodes and each cores
Currently the IMB-benchmark provides the throughput per sec based on the fixed size of message with fixed size of duration
For example like this one below.
$ mpirun -np 64 -machinefile hosts_infin ./IMB-MPI1 -map 32x2 Sendrecv
#----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 0.76 0.76 0.76 0.00 1 1000 0.85 0.85 0.85 2.35 2 1000 0.79 0.79 0.79 5.06 4 1000 0.80 0.80 0.80 10.00 8 1000 0.78 0.78 0.78 20.45 16 1000 0.79 0.80 0.80 40.16 32 1000 0.79 0.79 0.79 80.61 64 1000 0.79 0.79 0.79 162.59 128 1000 0.82 0.82 0.82 311.41 256 1000 0.91 0.91 0.91 565.42 512 1000 0.95 0.95 0.95 1082.13 1024 1000 0.99 0.99 0.99 2076.87 2048 1000 1.27 1.27 1.27 3229.91 4096 1000 1.71 1.71 1.71 4802.87 8192 1000 2.49 2.50 2.50 6565.97 16384 1000 4.01 4.01 4.01 8167.28 32768 1000 7.08 7.09 7.08 9249.23 65536 640 22.89 22.89 22.89 5725.50 131072 320 37.45 37.45 37.45 6999.22 262144 160 65.74 65.76 65.75 7972.53 524288 80 120.10 120.15 120.12 8727.37 1048576 40 228.63 228.73 228.68 9168.57 2097152 20 445.38 445.69 445.53 9410.86 4194304 10 903.77 905.97 904.87 9259.29 #-----------------------------------------------------------------------------
However, this does not guarantee that the processes are evenly distributed to the nodes when I configure the different varying number of processors.
Would assigning to the specific core be possible in IMB-benchmark? or do I need to use the alternative benchmark to do this?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Running the code with the environment variable I_MPI_DEBUG=4 and higher will show you the actual pinning of processes to (groups of) cores on the nodes.
The distribution between nodes can be changed by flags like -ppn / -perhost or specifying the exact number of processes per node in the machinefile.
On each node the default pinning can be influenced by the I_MPI_DOMAIN (best for hybrid MPI+threads codes) or the I_MPI_PIN_PROESSOR_LIST variables.
All described in sections "2.3. Hydra Process Manager Command" and "3.2 Process Pinning" of the Intel® MPI Developer Reference available on https://software.intel.com/en-us/articles/intel-mpi-library-documentation/.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page