Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
2275 Discussions

Segfault in DAPL with Mellanox OFED 2.1

Ben3
Beginner
715 Views

Hi,

We're having a problem with the Intel MPI library crashing since we've updated to the latest Mellanox OFED 2.1. For example, the test program supplied with Intel MPI (test/test.f90) crashes with a segfault. I compiled it using

mpif90 -debug all /apps/intel-mpi/4.1.1.036/test/test.f90 -o test.x

and managed to get a back trace from the crash using idbc:

#0  0x00007fcb9418f078 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#1  0x00007fcb94190bf7 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#2  0x00007fcb94191543 in MPID_nem_dapl_rc_init_20 () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#3  0x00007fcb941de883 in MPID_nem_dapl_init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#4  0x00007fcb94276fc6 in ?? () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#5  0x00007fcb9427547c in MPID_nem_init_ckpt () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#6  0x00007fcb94276ca7 in MPID_nem_init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#7  0x00007fcb94128070 in MPIDI_CH3_Init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#8  0x00007fcb94265bad in MPID_Init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#9  0x00007fcb9423c38f in MPIR_Init_thread () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#10 0x00007fcb94230258 in PMPI_Init () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpi.so.4
#11 0x00007fcb946f331f in pmpi_init__ () from /apps/intel-mpi/4.1.1.036/intel64/lib/libmpigf.so.4
#12 0x0000000000403005 in main () at /apps/intel-mpi/4.1.1.036/test/test.f90:28
#13 0x0000000000402fbc in main ()

We are running CentOS 6.5.

Cheers,

Ben

0 Kudos
2 Replies
James_T_Intel
Moderator
715 Views

Hi Ben,

Thank you for this report.  I have submitted this to our developers for further investigation.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
James_T_Intel
Moderator
715 Views

Hi Ben,

Please try using Mellanox* OFED 2.1-1.0.6.

0 Kudos
Reply