Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

IntelMPI DAPL Question

drMikeT
New Contributor I
2,642 Views

Dear MPI team,

 

I started receiving these messages from a node after I restarted a slowly moving MPI job.

I can tell these originate from IntelMPI. Do you have any suggestions as to what may be triggering them?

 

gl0396:SCM:4a7f:aaae7d40: 18 us(18 us):  open_hca: device mlx4_0 not found
gl0396:SCM:4a7f:aaae7d40: 16 us(16 us):  open_hca: device mlx4_0 not found
gl0397:UCM:493a:aaae7d40: 48102 us(48102 us):  create_ah: ERR Invalid argument
[359:gl0397][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:247] error(0x30000): ofa-v2-mlx5_0-1u: could not connect DAPL endpoints: DAT_INSUFFICIENT_RESOURCES()
gl0397:UCM:493a:aaae7d40: 48130 us(28 us): UCM connect: snd ERR -> cm_lid 0 cm_qpn ac1009c0 r_psp 4a7f p_sz=24
[356:gl0394][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:247] error(0x30000): ofa-v2-mlx5_0-1u: could not connect DAPL endpoints: DAT_INSUFFICIENT_RESOURCES()

 

 

Thank you!

Michael

0 Replies
Reply