Community
cancel
Showing results for 
Search instead for 
Did you mean: 
ying__qi
Beginner
69 Views

IMB-MPI1 pingpong tested failed with 4M message size

Hi All, 

I have two Dell R815 server with 4 AMD opteron 6380 (16 cores each) connected directly by two infiniband cards. I have trouble running the IMB-MPI1 test even on a single node:

mpirun -n 2 -genv I_MPI_DEBUG=3 -genv I_MPI_FABRICS=ofi  /opt/intel/impi/2019.5.281/intel64/bin/IMB-MPI1

The run aborted with the following error:

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         2.23         0.00
            1         1000         2.24         0.45
            2         1000         2.25         0.89
            4         1000         2.26         1.77
            8         1000         2.24         3.57
           16         1000         2.25         7.12
           32         1000         2.27        14.08
           64         1000         2.43        26.33
          128         1000         2.55        50.26
          256         1000         3.60        71.08
          512         1000         4.12       124.40
         1024         1000         5.04       203.00
         2048         1000         6.89       297.38
         4096         1000        10.56       387.76
         8192         1000        13.98       585.83
        16384         1000        22.74       720.65
        32768         1000        30.12      1087.81
        65536          640        46.17      1419.45
       131072          320        76.43      1714.87
       262144          160       334.23       784.32
       524288           80       511.22      1025.57
      1048576           40       850.76      1232.51
      2097152           20      1518.37      1381.19
Abort(941742351) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Send: Other MPI error, error stack:
PMPI_Send(155)............: MPI_Send(buf=0x3a100f0, count=4194304, MPI_BYTE, dest=1, tag=1, comm=0x84000003) failed
MPID_Send(572)............:
MPIDI_send_unsafe(203)....:
MPIDI_OFI_send_normal(414):
(unknown)(): Other MPI error

However, it runs fine with shm:

mpirun -n 2 -genv I_MPI_DEBUG=3 -genv I_MPI_FABRICS=shm  /opt/intel/impi/2019.5.281/intel64/bin/IMB-MPI1

Try to run with 2 CPUs on two different nodes also fail at 4M message size. 

I have been struggling with this for a few days now without success. Any suggestions where to look at or what to try?

Thanks!

Qi

0 Kudos
0 Replies
Reply