Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Fata Error using MPI in Linux

guccione__pietro
Beginner
16,169 Views

Hi,

I'm using a virtual Linux Ubuntu machine (Linux-VirtualBox 4.4.0-101-generic #124-Ubuntu SMP Fri Nov 10 18:29:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux), with 8GB RAM.

For a process on Matlab, the software requires Intel MPI runtime package v4.1.3.045 or superior. Instead, I've installed the 2018.1.163 version, being not sure about the 2018 number version.

Using 8 cores in the processing, the software went in error, with the following error:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(224)...................: MPI_Recv(buf=0x7f566d59c040, count=9942500, MPI_FLOAT, src=3, tag=5, MPI_COMM_WORLD, status=0x7ffc43a72b60) failed
PMPIDI_CH3I_Progress(658).......: fail failed
MPID_nem_handle_pkt(1450).......: fail failed
pkt_RTS_handler(317)............: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(302): fail failed
dcp_recv(165)...................: Internal MPI error!  Cannot read from remote process
 Two workarounds have been identified for this issue:
 1) Enable ptrace for non-root users with:
    echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
 2) Or, use:
    I_MPI_SHM_LMT=shm

 

Reducing the number of cores to 4, the process hangs for more than 3 hours and I'm not sure it is still working.

What could be the problem?

thank you

Pietro

 

0 Kudos
1 Reply
Arige__Krishna
Beginner
16,169 Views

As suggested in the error: 

you can run the command

I_MPI_SHM_LMT=shm

You can read more details about it at 

https://software.intel.com/en-us/mpi-developer-reference-linux-shared-memory-control

It means: large message transfer (LMT)  mechanism for the shared memory 

 

Also, you can edit ptrace value from 1 to 0 by 

echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

but, this might not be recommended, 

read more about the Ptrace value

https://www.kernel.org/doc/Documentation/security/Yama.txt ;

0 Kudos
Reply