- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I've run into a problem when using KNL x200 series when using more than 2 server (up to two nodes everything works) and omnipath interface. I've noticed this using HPL, then obtained the same error using INTEL MPI benchmark. The error happens during Allgather only with packages > 2MB
I use Intel MPI as included in parallel studio 2017 update2, I've run different runtime option (e.g. expliciting tmi fabrics):
mpirun -ppn 1 -np 4 -hosts wn51,wn52,wn53,wn54 /opt/intel/imb/2017.1.174/src/IMB-MPI1
#----------------------------------------------------------------
# Benchmarking Allgather
# #processes = 4
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.67 0.76 0.72
1 1000 11.37 11.96 11.66
2 1000 11.15 11.52 11.34
4 1000 11.38 11.58 11.48
8 1000 13.69 13.98 13.88
16 1000 14.86 15.57 15.15
32 1000 14.94 15.35 15.18
64 1000 15.00 15.24 15.12
128 1000 15.66 15.89 15.79
256 1000 15.54 15.95 15.73
512 1000 17.34 17.63 17.46
1024 1000 18.14 18.32 18.24
2048 1000 19.20 19.45 19.34
4096 1000 24.10 24.57 24.38
8192 1000 30.93 31.43 31.18
16384 1000 43.16 43.84 43.41
32768 1000 63.46 63.96 63.79
65536 640 116.96 119.40 118.01
131072 320 199.62 204.70 202.43
262144 160 209.99 215.50 212.73
524288 80 590.63 609.70 599.36
1048576 40 714.76 735.01 725.28
2097152 20 1549.57 1599.07 1573.02
IMB-MPI1:176946 terminated with signal 11 at PC=7f7f4e776325 SP=7fff0709d780. Backtrace:
IMB-MPI1:175134 terminated with signal 11 at PC=7eff6f736325 SP=7ffd95d43a50. Backtrace:
IMB-MPI1:175943 terminated with signal 11 at PC=7f1ed8f57325 SP=7ffc7c998980. Backtrace:
/lib64/libpsm2.so.2(+0x2d325)[0x7f7f4e776325]
/lib64/libpsm2.so.2(+0x2d7d8)[0x7f7f4e7767d8]
/lib64/libpsm2.so.2(+0x33b2d)[0x7f7f4e77cb2d]
/lib64/libpsm2.so.2(+0x3473e)[0x7f7f4e77d73e]
/lib64/libpsm2.so.2(+0x21431)[0x7f7f4e76a431]
/lib64/libpsm2.so.2(+0x1e112)[0x7f7f4e767112]
/lib64/libpsm2.so.2(+0x1d587)[0x7f7f4e766587]
/lib64/libpsm2.so.2(psm2_mq_ipeek+0x92)[0x7f7f4e761cb2]
/lib64/libfabric.so.1(+0x7b3e1)[0x7f7f4f52c3e1]
/lib64/libfabric.so.1(+0x7bc07)[0x7f7f4f52cc07]
/lib64/libpsm2.so.2(+0x2d325)[0x7eff6f736325]
/lib64/libpsm2.so.2(+0x2d7d8)[0x7eff6f7367d8]
/lib64/libpsm2.so.2(+0x33b2d)[0x7eff6f73cb2d]
/lib64/libpsm2.so.2(psm2_mq_irecv2+0x179)[0x7eff6f721209]
IMB-MPI1:175171 terminated with signal 11 at PC=7f57bba71325 SP=7ffca2c18fd0. Backtrace:
/lib64/libpsm2.so.2(+0x2d325)[0x7f1ed8f57325]
/lib64/libpsm2.so.2(+0x2d7d8)[0x7f1ed8f577d8]
/lib64/libpsm2.so.2(+0x33b2d)[0x7f1ed8f5db2d]
/lib64/libpsm2.so.2(+0x3473e)[0x7f1ed8f5e73e]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x43b6b6)[0x7f7f50cb36b6]
/lib64/libfabric.so.1(+0x7e89c)[0x7eff704ef89c]
/lib64/libpsm2.so.2(+0x2d325)[0x7f57bba71325]
/lib64/libpsm2.so.2(+0x2d7d8)[0x7f57bba717d8]
/lib64/libpsm2.so.2(+0x33b2d)[0x7f57bba77b2d]
/lib64/libpsm2.so.2(psm2_mq_irecv2+0x179)[0x7f57bba5c209]
/lib64/libpsm2.so.2(+0x21431)[0x7f1ed8f4b431]
/lib64/libpsm2.so.2(+0x1e112)[0x7f1ed8f48112]
/lib64/libpsm2.so.2(+0x1d587)[0x7f1ed8f47587]
/lib64/libpsm2.so.2(psm2_mq_ipeek+0x92)[0x7f1ed8f42cb2]
/lib64/libfabric.so.1(+0x7b3e1)[0x7f1ed9d0d3e1]
/lib64/libfabric.so.1(+0x7bc07)[0x7f1ed9d0dc07]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x43b6b6)[0x7f1edb4946b6]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x3cf230)[0x7f7f50c47230]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x45241e)[0x7eff71c8a41e]
/lib64/libfabric.so.1(+0x7e89c)[0x7f57bc82a89c]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x3cf230)[0x7f1edb428230]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(PMPIDI_CH3I_Progress+0x88f)[0x7f1edb1a4f7f]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(PMPIDI_CH3I_Progress+0x88f)[0x7f7f509c3f7f]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x3ab780)[0x7eff71be3780]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x45241e)[0x7f57bdfc541e]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x580672)[0x7f1edb5d9672]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x580672)[0x7f7f50df8672]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x2e3db9)[0x7eff71b1bdb9]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0xe0bf9)[0x7eff71918bf9]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x3ab780)[0x7f57bdf1e780]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x2e3f50)[0x7f1edb33cf50]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0xe0c64)[0x7f1edb139c64]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x2e3f50)[0x7f7f50b5bf50]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(MPI_Allgather+0xc9e)[0x7eff7191a58e]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40e66c]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x407f88]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0x2e3db9)[0x7f57bde56db9]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(MPI_Allgather+0xc9e)[0x7f1edb13b58e]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40e66c]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x407f88]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40253e]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0xe0c64)[0x7f7f50958c64]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40253e]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(+0xe0bf9)[0x7f57bdc53bf9]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f1eda179b15]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x401fa9]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(MPI_Allgather+0xc9e)[0x7f7f5095a58e]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40e66c]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x407f88]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7eff70958b15]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x401fa9]
/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/lib/libmpi.so.12(MPI_Allgather+0xc9e)[0x7f57bdc5558e]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40e66c]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x407f88]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40253e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f57bcc93b15]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x401fa9]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x40253e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f7f4f998b15]
/opt/intel/imb/2017.1.174/src/IMB-MPI1[0x401fa9]
Do you have any suggestion?
Many thanks for your support,
Simone
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
don't mind anymore about this issue, it disappeared after we re-deployed OS.
Best regards,
Simone
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page