- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hello,
I was running MPI job on multiple nodes with Intel MPI 2021.1.1, jobs aborted due to the following error:
[1690463626.483072] [n148:434957:0] cma_ep.c:62 UCX ERROR process_vm_readv(pid=434958 length=42432) returned -1: No such process
[1690463626.521152] [n148:434969:0] cma_ep.c:62 UCX ERROR process_vm_readv(pid=434968 length=42432) returned -1: No such process
[1690463626.522181] [n148:434970:0] cma_ep.c:62 UCX ERROR process_vm_readv(pid=434969 length=42432) returned -1: No such process
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 434942 RUNNING AT n148
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 434942 RUNNING AT n148
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
...
Under what conditions does this error occur?
It is difficult to provide detailed information such as execution script, but I hope to obtain some clues for resolving this error.
MPI job was running on 16 nodes, and the same job was running on other nodes at the same time.
Information on OS, kernel, and ucx versions is below:
OS: CentOS 8.4
kernel: 4.18.0-305.25.1.el8_4.x86_64
OFED: MLNX_OFED_LINUX-4.9-4.0.8.0
UCX: 1.8.0
Thanks,
1kan
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
Thanks for posting in the Intel forums.
Could you please let us know whether you are facing a similar issue with the latest version of Intel oneAPI 2023.2?
Could you please try with the supported OS version. For more details please refer to the below link.
Please provide us the complete debug log setting I_MPI_DEBUG=10 and also the command line you have been using.
Thanks & Regards
Shivani
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Shivani,
Thank you for your reply.
Intel oneAPI 2023.2 is not installed on the system I am using.
I just want to run MPI jobs using Intel oneAPI 2021.1.1.
If you know anything about what causes this error, please let me know.
Thanks,
1kan
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
Please provide us the complete debug log setting I_MPI_DEBUG=10 and also the command line you have been using.
Could you please provide us with the sample reproducer and steps to reproduce the issue at our end?
Could you also please let us know whether you are able to run your application on a single node and Intel MPI benchmark on a multi-node which will help us to
investigate the issue at our end?
Thanks & Regards
Shivani
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
As we did not hear back from you could you please respond to my previous post?
Thanks & Regards
Shivani
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Shivani,
Sorry for the late reply.
As you suggested, I would consider using Intel oneAPI 2023.2 to run MPI jobs.
Thanks,
1kan
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
Could you please let us know whether you are facing a similar issue with the latest version of Intel oneAPI 2023.2?
Thanks & Regards
Shivani
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
As we did not hear back from you could you please respond to my previous post?
Thanks & Regards
Shivani
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
We have not heard back from you. This thread will be no longer monitored by Intel. If you need any further assistance, please post a new question.
Thanks & Regards
Shivani