- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When use mpirun, there is one error. Need your help.
Fatal error in PMPI_Send
PMPI_Send(177) : MPI_Send(buf=xxxxxx, count=1,MPI_INT,dest=0,tag=1,MPI_COMM_WORLD) failed
MPID_Send(256)
MPIDI_OFI_send_lightweight(52)
MPIDI_OFI_send_handler(704): OFI tagged inject failed(of i_impl.h:704:MPIDI_OFI_send_hander: No such file or drectory)
- Tags:
- Intel MPI
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
From the error you have posted, we were not sure where it had been failed in MPI_SEND().
Can you share the source code or any reproducible code with us so that we can debug the code from our side?
If sharing code isn't possible you can check corrcectness of code using ITAC(Intel Trace analyzer and Collector).
source the itacvars.sh - source <install_dir>/2019.x.xx/bin/itacvars.sh
and then run mpi with -check_mpi flag
mpirun -np <> -check_mpi ./program
For more info please check: https://software.intel.com/content/www/us/en/develop/documentation/itc-user-and-reference-guide/top/user-guide/correctness-checking/correctness-checking-of-mpi-applications.html
Post the logs after running with ITAC and setting I_MPI_DEBUG=10
export I_MPI_DEBUG=10
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank for your response.
I am using MPI 2019.0.117. I used out of box source code. It is in /opt/intel/impi/2019.0.117/test/test.c
# pwd
/opt/intel/impi/2019.0.117/test
# mpiicc -cc=gcc test.c testc
......
# pwd
/opt/intel/impi/2019.0.117/intel64/bin
# export I_MPI_DEBUG=10
# mpirun -np 80 -ppn 1 -hosts master.localdomain,node.localdomain ../../test/testc
[0] MPI startup(): libfabric version 1.6.1a1-impi
[0] MPI startup(): libfabric version provider: sockets
wait for long time, no any response
So ctrl+c
then exit, ssh root@master.localdomain without login
# cd /opt/intel/impi/2019.0.117/intel64/bin
# mpirun -np 80 -ppn 1 -hosts master.localdomain,node.localdomain ../../test/testc
helloworld: rank 0 of 80 running on master.localdomain
helloworld: rank 1 of 80 running on node.localdomain
.....
helloworld: rank 79 of 80 running on node.localdomain
It can work.
Some times need to exit to ssh again. then run it. It is ok.
Maybe it is environment issue. I need detect it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Based on the documentation on IMPI errors we think this error might be due to interconnect and provider mismatch.
Please refer this for more details: https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-windows/top/troubleshooting/error-message-fatal-error.html
Do both your nodes master.localdomain, node.localdomain have the same type of hardware interconnect?
Could you please provide us with all available providers in your node, you can get that by running fi_info.
Also, share us the full I_MPI_DEBUG log.
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It looks the problem is with the configuration the NIC card in the master.localdomain node. Could you please check the configuration once and see if everything is alright?
Also along with logs of I_MPI_DEBUG as I previously asked, could you also provide logs after setting FI_LOG_LEVEL=DEBUG.
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are closing this thread assuming your problem is resolved.
If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
Regards
Prasanth

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page