Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Internal MPI error on user but not as root

Luis_Diego_C_
Beginner
1,321 Views

I'm trying to run a fortran code, but when I run it I'm getting this message:

Fatal error in PMPI_Waitall: Other MPI error, error stack:
PMPI_Waitall(405)...............: MPI_Waitall(count=5, req_array=0xb0b8c8, status_array=0xb14e68) failed
MPIR_Waitall_impl(221)..........: fail failed
PMPIDI_CH3I_Progress(623).......: fail failed
pkt_RTS_handler(317)............: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(288): fail failed
dcp_recv(154)...................: Internal MPI error!  cannot read from remote process

 

If i run it in root, it works.

I try updating my parallel studio xe from 2017 to 2017.1, but didn't make any difference.

Any news/help with this issue?

0 Kudos
5 Replies
James_T_Intel
Moderator
1,321 Views

Check ulimits -a for both root and a normal user.  These should be the same.

0 Kudos
Victor_G_1
Beginner
1,321 Views

Hi

I have got the same error:

Fatal error in PMPI_Gatherv: Other MPI error, error stack:
PMPI_Gatherv(1001)..............: MPI_Gatherv failed(sbuf=0x256d2c0, scount=9210, MPI_DOUBLE, rbuf=0x258c560, rcnts=0x2559f00, displs=0x2559f20, MPI_DOUBLE, root=0, MPI_COMM_WORLD) failed
MPIR_Gatherv_impl(545)..........: fail failed
I_MPIR_Gatherv_intra(611).......: fail failed
MPIR_Gatherv(422)...............: fail failed
MPIC_Irecv(857).................: fail failed
MPID_Irecv(160).................: fail failed
MPID_nem_lmt_RndvRecv(208)......: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(302): fail failed
dcp_recv(165)...................: Internal MPI error!  Cannot read from remote process

Does Intel have MPI version with a solution?

Victor

0 Kudos
Victor_G_1
Beginner
1,320 Views

Hi

I have the same error. Does Intel have MPI version without the error?

Error log:

Fatal error in PMPI_Gatherv: Other MPI error, error stack:
PMPI_Gatherv(1001)..............: MPI_Gatherv failed(sbuf=0x256d2c0, scount=9210, MPI_DOUBLE, rbuf=0x258c560, rcnts=0x2559f00, displs=0x2559f20, MPI_DOUBLE, root=0, MPI_COMM_WORLD) failed
MPIR_Gatherv_impl(545)..........: fail failed
I_MPIR_Gatherv_intra(611).......: fail failed
MPIR_Gatherv(422)...............: fail failed
MPIC_Irecv(857).................: fail failed
MPID_Irecv(160).................: fail failed
MPID_nem_lmt_RndvRecv(208)......: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(302): fail failed
dcp_recv(165)...................: Internal MPI error!  Cannot read from remote process

 

0 Kudos
McCalpinJohn
Honored Contributor III
1,321 Views

It is important to check the "ulimit" from inside the context of the MPI job -- if you log in to the remote node a different set of limits may apply.

You can easily do this by launching a script instead of the MPI executable and having the script echo the hostname and then execute "ulimit -a".

0 Kudos
Victor_G_1
Beginner
1,321 Views

Actually I've used other workaround. I set "export I_MPI_SHM_LMT=shm". And the issue has been resolved.

But It will be nice to just update MPI. Does the issue fixed or will be fixed in nearest future? 

Our group going to register our software locally (target date is October) and any kind of used workaround should be properly described :(

Victor

0 Kudos
Reply