Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
58 Views

Intel MPI 2018.4 error

Hi 

    Is there any way of diagnosing what might be causing the following error? 

PANIC in ../../src/mpid/ch3/channels/nemesis/netmod/ofa/cm/dapl/common/dapl_evd_cq_async_error_callb.c:71:dapl_evd_cq_async_error_callback
NULL == context
 

 Intel MPI 2018.4 run using release_mt version of libmpi.so

 I_MPI_FABRICS=shm:ofa

 Running with MPI_THREAD_MULTIPLE on Centos 7.2 with mlx_5 hardware

Thanks

 Jamil

 

0 Kudos
4 Replies
Highlighted
Employee
58 Views

Hi Jamil,

Coud you please try I_MPI_FABRICS=shm:dapl with the same scenario?

BR,

Dmitry

0 Kudos
Highlighted
Beginner
58 Views

Hi Dimitry

  It works as expected with no errors when using the dapl. 
 

Jamil

0 Kudos
Highlighted
Beginner
58 Views

Hi Dimitry

   I have a case that fails with  I_MPI_FABRICS=shm:dapl

Here is the error

prod-0026:UCM:2bfae:c5ca9700: 18942905 us(18942905 us!!!): dapl async_event CQ (0x43d68f0) ERR 0
prod-0026:UCM:2bfae:c5ca9700: 18942927 us(22 us):  -- dapl_evd_cq_async_error_callback (0x42ec630, 0x4329010, 0x7fa2c5ca8d30, 0x43d68f0)
prod-0026:UCM:2bfae:c5ca9700: 18942944 us(17 us): dapl async_event QP (0x42abda0) Event 1

Could this be caused by an OFED bug. The system is running Melanox OFED.3.2

Jamil

0 Kudos
Highlighted
Beginner
58 Views

Hi Dimitry 

   Switching to Intel 2019u2 and using the release version of libmpi seems to work. The release_mt version of libmpi causes a deadlock. 
I will submit a separate post with the stack trace as I looks like a bug in intel mpi.

 Thanks

 Jamil

 

0 Kudos