- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I was running MPI job on multiple nodes with Intel MPI 2021.1.1, jobs aborted due to the following error:
[1690463626.483072] [n148:434957:0] cma_ep.c:62 UCX ERROR process_vm_readv(pid=434958 length=42432) returned -1: No such process
[1690463626.521152] [n148:434969:0] cma_ep.c:62 UCX ERROR process_vm_readv(pid=434968 length=42432) returned -1: No such process
[1690463626.522181] [n148:434970:0] cma_ep.c:62 UCX ERROR process_vm_readv(pid=434969 length=42432) returned -1: No such process
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 434942 RUNNING AT n148
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 434942 RUNNING AT n148
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
...
Under what conditions does this error occur?
It is difficult to provide detailed information such as execution script, but I hope to obtain some clues for resolving this error.
MPI job was running on 16 nodes, and the same job was running on other nodes at the same time.
Information on OS, kernel, and ucx versions is below:
OS: CentOS 8.4
kernel: 4.18.0-305.25.1.el8_4.x86_64
OFED: MLNX_OFED_LINUX-4.9-4.0.8.0
UCX: 1.8.0
Thanks,
1kan
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in the Intel forums.
Could you please let us know whether you are facing a similar issue with the latest version of Intel oneAPI 2023.2?
Could you please try with the supported OS version. For more details please refer to the below link.
Please provide us the complete debug log setting I_MPI_DEBUG=10 and also the command line you have been using.
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Shivani,
Thank you for your reply.
Intel oneAPI 2023.2 is not installed on the system I am using.
I just want to run MPI jobs using Intel oneAPI 2021.1.1.
If you know anything about what causes this error, please let me know.
Thanks,
1kan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Please provide us the complete debug log setting I_MPI_DEBUG=10 and also the command line you have been using.
Could you please provide us with the sample reproducer and steps to reproduce the issue at our end?
Could you also please let us know whether you are able to run your application on a single node and Intel MPI benchmark on a multi-node which will help us to
investigate the issue at our end?
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
As we did not hear back from you could you please respond to my previous post?
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Shivani,
Sorry for the late reply.
As you suggested, I would consider using Intel oneAPI 2023.2 to run MPI jobs.
Thanks,
1kan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please let us know whether you are facing a similar issue with the latest version of Intel oneAPI 2023.2?
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
As we did not hear back from you could you please respond to my previous post?
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. This thread will be no longer monitored by Intel. If you need any further assistance, please post a new question.
Thanks & Regards
Shivani
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page