Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2214 Discussions

UCX ERROR with Intel MPI 2021.1.1

1kan
Beginner
3,013 Views

Hello,

I was running MPI job on multiple nodes with Intel MPI 2021.1.1, jobs aborted due to the following error:

 

[1690463626.483072] [n148:434957:0] cma_ep.c:62 UCX ERROR process_vm_readv(pid=434958 length=42432) returned -1: No such process
[1690463626.521152] [n148:434969:0] cma_ep.c:62 UCX ERROR process_vm_readv(pid=434968 length=42432) returned -1: No such process
[1690463626.522181] [n148:434970:0] cma_ep.c:62 UCX ERROR process_vm_readv(pid=434969 length=42432) returned -1: No such process

 

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 434942 RUNNING AT n148
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 434942 RUNNING AT n148
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
...

 

Under what conditions does this error occur?

It is difficult to provide detailed information such as execution script, but I hope to obtain some clues for resolving this error.

MPI job was running on 16 nodes, and the same job was running on other nodes at the same time.

Information on OS, kernel, and ucx versions is below:
  OS: CentOS 8.4
  kernel: 4.18.0-305.25.1.el8_4.x86_64
  OFED: MLNX_OFED_LINUX-4.9-4.0.8.0
  UCX: 1.8.0

 

Thanks,

1kan

0 Kudos
8 Replies
ShivaniK_Intel
Moderator
2,959 Views

Hi,


Thanks for posting in the Intel forums.


Could you please let us know whether you are facing a similar issue with the latest version of Intel oneAPI 2023.2?


Could you please try with the supported OS version. For more details please refer to the below link.


https://www.intel.com/content/www/us/en/developer/articles/system-requirements/mpi-library-system-requirements.html


Please provide us the complete debug log setting I_MPI_DEBUG=10 and also the command line you have been using.


Thanks & Regards

Shivani


0 Kudos
1kan
Beginner
2,915 Views

Hi Shivani,

 

Thank you for your reply.

Intel oneAPI 2023.2 is not installed on the system I am using.
I just want to run MPI jobs using Intel oneAPI 2021.1.1.
If you know anything about what causes this error, please let me know.

 

Thanks,

1kan

0 Kudos
ShivaniK_Intel
Moderator
2,879 Views

Hi,


Please provide us the complete debug log setting I_MPI_DEBUG=10 and also the command line you have been using.


Could you please provide us with the sample reproducer and steps to reproduce the issue at our end?


Could you also please let us know whether you are able to run your application on a single node and Intel MPI benchmark on a multi-node which will help us to

investigate the issue at our end?


Thanks & Regards

Shivani


0 Kudos
ShivaniK_Intel
Moderator
2,828 Views

Hi,


As we did not hear back from you could you please respond to my previous post?


Thanks & Regards

Shivani


0 Kudos
1kan
Beginner
2,753 Views

Hi Shivani,

 

Sorry for the late reply.

As you suggested, I would consider using Intel oneAPI 2023.2 to run MPI jobs.

 

Thanks,

1kan

0 Kudos
ShivaniK_Intel
Moderator
2,719 Views

Hi,


Could you please let us know whether you are facing a similar issue with the latest version of Intel oneAPI 2023.2?


Thanks & Regards

Shivani



0 Kudos
ShivaniK_Intel
Moderator
2,671 Views

Hi,


As we did not hear back from you could you please respond to my previous post?


Thanks & Regards

Shivani


0 Kudos
ShivaniK_Intel
Moderator
2,592 Views

Hi,

We have not heard back from you. This thread will be no longer monitored by Intel. If you need any further assistance, please post a new question.


Thanks & Regards

Shivani


0 Kudos
Reply