- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear MPI team,
I started receiving these messages from a node after I restarted a slowly moving MPI job.
I can tell these originate from IntelMPI. Do you have any suggestions as to what may be triggering them?
gl0396:SCM:4a7f:aaae7d40: 18 us(18 us): open_hca: device mlx4_0 not found gl0396:SCM:4a7f:aaae7d40: 16 us(16 us): open_hca: device mlx4_0 not found gl0397:UCM:493a:aaae7d40: 48102 us(48102 us): create_ah: ERR Invalid argument [359:gl0397][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:247] error(0x30000): ofa-v2-mlx5_0-1u: could not connect DAPL endpoints: DAT_INSUFFICIENT_RESOURCES() gl0397:UCM:493a:aaae7d40: 48130 us(28 us): UCM connect: snd ERR -> cm_lid 0 cm_qpn ac1009c0 r_psp 4a7f p_sz=24 [356:gl0394][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:247] error(0x30000): ofa-v2-mlx5_0-1u: could not connect DAPL endpoints: DAT_INSUFFICIENT_RESOURCES()
Thank you!
Michael
Link Copied
0 Replies
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page