Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Simone_Tinti
Beginner
37 Views

MPI issue on KNL x200 with Allgather function

Hello,

I've run into a problem when using KNL x200 series when using more than 2 server (up to two nodes everything works) and omnipath interface. I've noticed this using HPL, then obtained the same error using INTEL MPI benchmark. The error happens during Allgather only with packages > 2MB 

I use Intel MPI as included in parallel studio 2017 update2, I've run different runtime option (e.g. expliciting tmi fabrics):

mpirun  -ppn 1  -np 4 -hosts wn51,wn52,wn53,wn54  /opt/intel/imb/2017.1.174/src/IMB-MPI1

 

 

#----------------------------------------------------------------

# Benchmarking Allgather

# #processes = 4

#----------------------------------------------------------------

       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]

            0         1000         0.67         0.76         0.72

            1         1000        11.37        11.96        11.66

            2         1000        11.15        11.52        11.34

            4         1000        11.38        11.58        11.48

            8         1000        13.69        13.98        13.88

           16         1000        14.86        15.57        15.15

           32         1000        14.94        15.35        15.18

           64         1000        15.00        15.24        15.12

          128         1000        15.66        15.89        15.79

          256         1000        15.54        15.95        15.73

          512         1000        17.34        17.63        17.46

         1024         1000        18.14        18.32        18.24

         2048         1000        19.20        19.45        19.34

         4096         1000        24.10        24.57        24.38

         8192         1000        30.93        31.43        31.18

        16384         1000        43.16        43.84        43.41

        32768         1000        63.46        63.96        63.79

        65536          640       116.96       119.40       118.01

       131072          320       199.62       204.70       202.43

       262144          160       209.99       215.50       212.73

       524288           80       590.63       609.70       599.36

      1048576           40       714.76       735.01       725.28

      2097152           20      1549.57      1599.07      1573.02

 

IMB-MPI1:176946 terminated with signal 11 at PC=7f7f4e776325 SP=7fff0709d780.  Backtrace:

 

IMB-MPI1:175134 terminated with signal 11 at PC=7eff6f736325 SP=7ffd95d43a50.  Backtrace:

 

IMB-MPI1:175943 terminated with signal 11 at PC=7f1ed8f57325 SP=7ffc7c998980.  Backtrace:

/lib64/libpsm2.so.2(+0x2d325)[0x7f7f4e776325]

/lib64/libpsm2.so.2(+0x2d7d8)[0x7f7f4e7767d8]

/lib64/libpsm2.so.2(+0x33b2d)[0x7f7f4e77cb2d]

/lib64/libpsm2.so.2(+0x3473e)[0x7f7f4e77d73e]

/lib64/libpsm2.so.2(+0x21431)[0x7f7f4e76a431]

/lib64/libpsm2.so.2(+0x1e112)[0x7f7f4e767112]

/lib64/libpsm2.so.2(+0x1d587)[0x7f7f4e766587]

/lib64/libpsm2.so.2(psm2_mq_ipeek+0x92)[0x7f7f4e761cb2]

/lib64/libfabric.so.1(+0x7b3e1)[0x7f7f4f52c3e1]

/lib64/libfabric.so.1(+0x7bc07)[0x7f7f4f52cc07]

/lib64/libpsm2.so.2(+0x2d325)[0x7eff6f736325]

/lib64/libpsm2.so.2(+0x2d7d8)[0x7eff6f7367d8]

/lib64/libpsm2.so.2(+0x33b2d)[0x7eff6f73cb2d]

/lib64/libpsm2.so.2(psm2_mq_irecv2+0x179)[0x7eff6f721209]

 

IMB-MPI1:175171 terminated with signal 11 at PC=7f57bba71325 SP=7ffca2c18fd0.  Backtrace:

/lib64/libpsm2.so.2(+0x2d325)[0x7f1ed8f57325]

/lib64/libpsm2.so.2(+0x2d7d8)[0x7f1ed8f577d8]

/lib64/libpsm2.so.2(+0x33b2d)[0x7f1ed8f5db2d]

/lib64/libpsm2.so.2(+0x3473e)[0x7f1ed8f5e73e]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x43b6b6)[0x7f7f50cb36b6]

/lib64/libfabric.so.1(+0x7e89c)[0x7eff704ef89c]

/lib64/libpsm2.so.2(+0x2d325)[0x7f57bba71325]

/lib64/libpsm2.so.2(+0x2d7d8)[0x7f57bba717d8]

/lib64/libpsm2.so.2(+0x33b2d)[0x7f57bba77b2d]

/lib64/libpsm2.so.2(psm2_mq_irecv2+0x179)[0x7f57bba5c209]

/lib64/libpsm2.so.2(+0x21431)[0x7f1ed8f4b431]

/lib64/libpsm2.so.2(+0x1e112)[0x7f1ed8f48112]

/lib64/libpsm2.so.2(+0x1d587)[0x7f1ed8f47587]

/lib64/libpsm2.so.2(psm2_mq_ipeek+0x92)[0x7f1ed8f42cb2]

/lib64/libfabric.so.1(+0x7b3e1)[0x7f1ed9d0d3e1]

/lib64/libfabric.so.1(+0x7bc07)[0x7f1ed9d0dc07]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x43b6b6)[0x7f1edb4946b6]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x3cf230)[0x7f7f50c47230]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x45241e)[0x7eff71c8a41e]

/lib64/libfabric.so.1(+0x7e89c)[0x7f57bc82a89c]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x3cf230)[0x7f1edb428230]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(PMPIDI_CH3I_Progress+0x88f)[0x7f1edb1a4f7f]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(PMPIDI_CH3I_Progress+0x88f)[0x7f7f509c3f7f]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x3ab780)[0x7eff71be3780]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x45241e)[0x7f57bdfc541e]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x580672)[0x7f1edb5d9672]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x580672)[0x7f7f50df8672]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x2e3db9)[0x7eff71b1bdb9]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0xe0bf9)[0x7eff71918bf9]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x3ab780)[0x7f57bdf1e780]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x2e3f50)[0x7f1edb33cf50]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0xe0c64)[0x7f1edb139c64]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x2e3f50)[0x7f7f50b5bf50]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(MPI_Allgather+0xc9e)[0x7eff7191a58e]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40e66c]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x407f88]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x2e3db9)[0x7f57bde56db9]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(MPI_Allgather+0xc9e)[0x7f1edb13b58e]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40e66c]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x407f88]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40253e]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0xe0c64)[0x7f7f50958c64]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40253e]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0xe0bf9)[0x7f57bdc53bf9]

/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f1eda179b15]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x401fa9]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(MPI_Allgather+0xc9e)[0x7f7f5095a58e]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40e66c]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x407f88]

/lib64/libc.so.6(__libc_start_main+0xf5)[0x7eff70958b15]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x401fa9]

/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(MPI_Allgather+0xc9e)[0x7f57bdc5558e]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40e66c]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x407f88]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40253e]

/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f57bcc93b15]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x401fa9]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40253e]

/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f7f4f998b15]

/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x401fa9]

 

Do you have any suggestion?

Many thanks for your support,

Simone

 

0 Kudos
1 Reply
Simone_Tinti
Beginner
37 Views

Hello,

don't mind anymore about this issue, it disappeared after we re-deployed OS.

Best regards,

Simone

0 Kudos