Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Segfault in DAPL with Mellanox OFED 2.1

Ben3
Beginner
370 Views

Hi,

We're having a problem with the Intel MPI library crashing since we've updated to the latest Mellanox OFED 2.1. For example, the test program supplied with Intel MPI (test/test.f90) crashes with a segfault. I compiled it using

mpif90 -debug all /apps/intel-mpi/4.1.1.036/test/test.f90 -o test.x

and managed to get a back trace from the crash using idbc:

#0  0x00007fcb9418f078 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#1  0x00007fcb94190bf7 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#2  0x00007fcb94191543 in MPID_nem_dapl_rc_init_20 () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#3  0x00007fcb941de883 in MPID_nem_dapl_init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#4  0x00007fcb94276fc6 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#5  0x00007fcb9427547c in MPID_nem_init_ckpt () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#6  0x00007fcb94276ca7 in MPID_nem_init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#7  0x00007fcb94128070 in MPIDI_CH3_Init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#8  0x00007fcb94265bad in MPID_Init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#9  0x00007fcb9423c38f in MPIR_Init_thread () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#10 0x00007fcb94230258 in PMPI_Init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#11 0x00007fcb946f331f in pmpi_init__ () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpigf.so.4
#12 0x0000000000403005 in main () at /apps/intel-mpi/4.1.1.036/test/test.f90:28
#13 0x0000000000402fbc in main ()

We are running CentOS 6.5.

Cheers,

Ben

0 Kudos
2 Replies
James_T_Intel
Moderator
370 Views

Hi Ben,

Thank you for this report.  I have submitted this to our developers for further investigation.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
James_T_Intel
Moderator
370 Views

Hi Ben,

Please try using Mellanox* OFED 2.1-1.0.6.

0 Kudos
Reply