Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

[6] Assertion failed in file ../../segment.c

Victor_V_
Beginner
1,355 Views

Hi, 

 

we have compiled our parallel code by using the latest Intel's software stack. We do use a lot of passive RMA one-sided PUT/GET operations along with a derived datatypes. Now we are expericincing problem that sometimes our application fails with either segmentation fault or with the following error message:

 

 

[6] Assertion failed in file ../../segment.c at line 669: cur_elmp->curcount >= 0

[6] internal ABORT - process 6

 

The Intel's inspector shows a problem inside the Intel MPI library:


libmpi_dbg.so.4!MPID_Segment_blkidx_m2m - segment_packunpack.c:313
libmpi_dbg.so.4!MPID_Segment_manipulate - segment.c:552
libmpi_dbg.so.4!MPID_Segment_unpack - segment_packunpack.c:88
libmpi_dbg.so.4!MPIDI_CH3U_Receive_data_found - ch3u_handle_recv_pkt.c:190
libmpi_dbg.so.4!MPIDI_CH3_PktHandler_GetResp - ch3u_rma_sync.c:3691
libmpi_dbg.so.4!MPID_nem_handle_pkt - ch3_progress.c:1477
libmpi_dbg.so.4!MPIDI_CH3I_Progress - ch3_progress.c:498
libmpi_dbg.so.4!MPIDI_Win_unlock - ch3u_rma_sync.c:1959
libmpi_dbg.so.4!PMPI_Win_unlock - win_unlock.c:119

 

Does it mean that the something is wrong with the derived datatypes? If yes, how I can debug the problem? The problem never appears within OpenMPI. 

 

The SW stack used:

Intel C/Fortran compilers v15.0.0.090

Intel MPI Library v5.0.1.035

 

Any help will be greatly appreciated!

 

Best,

Victor.

 

0 Kudos
2 Replies
James_T_Intel
Moderator
1,355 Views

Can you provide a reproducer code?

0 Kudos
Victor_V_
Beginner
1,355 Views

James Tullos (Intel) wrote:

Can you provide a reproducer code?

Dear James,

I am trying to narrow code. However, right now I am facing an another problem with derived datatypes. Enclosed please find a reproducer code. Just compile  it and pass  the following parameters:

mpicc  mpi_tvec2_rma.c -o mpi_tvec2_rma
mpirun -np 8 ./mpi_tvec2_rma 128 40000

When I am using the Intel MPI (Intel C compiler) v4.1.3.048 (v15.0.0) it crashes with the following error message:

Assertion failed in file src/mpid/ch3/src/ch3u_handle_send_req.c at line 61: win_ptr->at_completion_counter >= 0
internal ABORT - process 0

The MPICH developers claimed that this problem has been probably fixed in development version of MPICH3. I will check it out. However, if I switch to Intel MPI v5.0.1.035, then it is getting more and more interesting:

Fatal error in MPI_Win_lock: Other MPI error, error stack:
MPI_Win_lock(165)......................: MPI_Win_lock(lock_type=234, rank=1, assert=0, win=0xa0000000) failed
MPIDI_Win_lock(2702)...................: 
MPIDI_CH3I_Acquire_local_lock(3615)....:  Detected an error while in progress wait for RMA messages
MPIDI_CH3I_Progress(504)...............: 
MPID_nem_handle_pkt(1368)..............: 
MPIDI_CH3_PktHandler_EagerSend(748)....: failure occurred while posting a receive for message data (MPIDI_CH3_PKT_EAGER_SEND)
MPIDI_CH3U_Receive_data_unexpected(253): Out of memory (unable to allocate -1703399408 bytes)

It seems to me as an integer overflow problem somewhere inside Intel MPI. Could you please have a look at it?

 

With best regards,

Victor.

0 Kudos
Reply