Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Luis_Diego_C_
Beginner
277 Views

Internal MPI error on user but not as root

I'm trying to run a fortran code, but when I run it I'm getting this message:

Fatal error in PMPI_Waitall: Other MPI error, error stack:
PMPI_Waitall(405)...............: MPI_Waitall(count=5, req_array=0xb0b8c8, status_array=0xb14e68) failed
MPIR_Waitall_impl(221)..........: fail failed
PMPIDI_CH3I_Progress(623).......: fail failed
pkt_RTS_handler(317)............: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(288): fail failed
dcp_recv(154)...................: Internal MPI error!  cannot read from remote process

 

If i run it in root, it works.

I try updating my parallel studio xe from 2017 to 2017.1, but didn't make any difference.

Any news/help with this issue?

0 Kudos
5 Replies
James_T_Intel
Moderator
277 Views

Check ulimits -a for both root and a normal user.  These should be the same.

Victor_G_1
Beginner
277 Views

Hi

I have got the same error:

Fatal error in PMPI_Gatherv: Other MPI error, error stack:
PMPI_Gatherv(1001)..............: MPI_Gatherv failed(sbuf=0x256d2c0, scount=9210, MPI_DOUBLE, rbuf=0x258c560, rcnts=0x2559f00, displs=0x2559f20, MPI_DOUBLE, root=0, MPI_COMM_WORLD) failed
MPIR_Gatherv_impl(545)..........: fail failed
I_MPIR_Gatherv_intra(611).......: fail failed
MPIR_Gatherv(422)...............: fail failed
MPIC_Irecv(857).................: fail failed
MPID_Irecv(160).................: fail failed
MPID_nem_lmt_RndvRecv(208)......: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(302): fail failed
dcp_recv(165)...................: Internal MPI error!  Cannot read from remote process

Does Intel have MPI version with a solution?

Victor

Victor_G_1
Beginner
277 Views

Hi

I have the same error. Does Intel have MPI version without the error?

Error log:

Fatal error in PMPI_Gatherv: Other MPI error, error stack:
PMPI_Gatherv(1001)..............: MPI_Gatherv failed(sbuf=0x256d2c0, scount=9210, MPI_DOUBLE, rbuf=0x258c560, rcnts=0x2559f00, displs=0x2559f20, MPI_DOUBLE, root=0, MPI_COMM_WORLD) failed
MPIR_Gatherv_impl(545)..........: fail failed
I_MPIR_Gatherv_intra(611).......: fail failed
MPIR_Gatherv(422)...............: fail failed
MPIC_Irecv(857).................: fail failed
MPID_Irecv(160).................: fail failed
MPID_nem_lmt_RndvRecv(208)......: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(302): fail failed
dcp_recv(165)...................: Internal MPI error!  Cannot read from remote process

 

McCalpinJohn
Black Belt
277 Views

It is important to check the "ulimit" from inside the context of the MPI job -- if you log in to the remote node a different set of limits may apply.

You can easily do this by launching a script instead of the MPI executable and having the script echo the hostname and then execute "ulimit -a".

Victor_G_1
Beginner
277 Views

Actually I've used other workaround. I set "export I_MPI_SHM_LMT=shm". And the issue has been resolved.

But It will be nice to just update MPI. Does the issue fixed or will be fixed in nearest future? 

Our group going to register our software locally (target date is October) and any kind of used workaround should be properly described :(

Victor

Reply