Solved: Re:Maximum number of MPI processes

Viet-Duc · ‎06-16-2020

Hello,

I would like to ask about the maximum number of MPI processes supported in Intel MPI 2019.0.5

Case in point, we was conducting massive parallel tests on Knight-Landings cluster. The total number of cores was 4096 x 64 ~ 260K at which the test hung without updating output file for extended period of time. We suspect this may be due an intrinsic limitation of Intel MPI.

Clarification is much appreciated.

Regards.

GouthamK_Intel · ‎06-28-2020

Hi,

Can you provide details of the application that you are running?

Can you please provide how much time the IMPI job ran before giving this error?

In some cases, the job scheduler has a time limit after which it kills the job. So when are you getting the errors(immediately after you give the launch command or is it after a delay of say one hour after launch of the program)?

You can go through this link for more details regarding this error:

https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/troubleshooting/error-message-bad-file-descriptor.html

Can you tell us which application you were using for this test?

Please check whether there are any limitations set in your system?

Thanks

View solution in original post

PrasanthD_intel · ‎06-17-2020

Hi,

By default, Intel IMPI does not limit the number of processes you launch.

Please refer to the Intel MPI Developer reference

https://software.intel.com/content/dam/develop/public/us/en/documents/intelmpi-2019-developer-reference-linux.pdf

Refer the section 2.3.1

Could give please provide more details about the error you were getting so that we can help you.

Regards

Prasanth

Viet-Duc · ‎06-22-2020

Hi.

Thanks for clarification. The following error messages were observed at 512 nodes, which was approximately 32K processes.

At the moment, we could not push beyond this limit.

[proxy:0:0@node2948] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:362): write error (Broken pipe)
[mpiexec@node2948] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:362): write error (Bad file descriptor)
[mpiexec@node2948] wait_proxies_to_terminate (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:528): downstream from host node4001 exited with status 255

GouthamK_Intel · ‎06-28-2020

Hi,

Can you provide details of the application that you are running?

Can you please provide how much time the IMPI job ran before giving this error?

In some cases, the job scheduler has a time limit after which it kills the job. So when are you getting the errors(immediately after you give the launch command or is it after a delay of say one hour after launch of the program)?

You can go through this link for more details regarding this error:

https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/troubleshooting/error-message-bad-file-descriptor.html

Can you tell us which application you were using for this test?

Please check whether there are any limitations set in your system?

Thanks

PrasanthD_intel · ‎07-07-2020

Hi,

Could you please let us know if your issue is resolved.

If not do let us know. So that we will be able to help you regarding the same.

Regards

Prasanth

Viet-Duc · ‎07-08-2020

Hi, Prasanth

I apologize for my bad form.

Since the forum transition, I have been waiting for comments from Intel staff as well. The last email I received via notification dated back to June, 25th as you can see the attachments. Thus, I completely misses the your replies until now.

The link provided by Goutham helps us narrow the issue to out-of-memory manager (OOM) since VASP5 crashes during initialization state. There was no timing restriction imposed on the scheduler part.

We are monitoring the job. If no issue arises, I will accept Goutham's answer as solution and close the thread.

Regards.

PrasanthD_intel · ‎07-15-2020

Hi Viet,

Could you please update whether your issue has been resolved or not?

If not do let us know the error you were getting during initialization. So that we will be able to help you.

Regards

Prasanth

Viet-Duc · ‎07-16-2020

I've accepted the solution.

The problem was solved by reducing memory used by computation as suggested by Intel staff.

Before:

256 nodes: OK

256 ~ 4096 nodes : bad file description

After reducing memory:

1024 nodes: OK

1024 ~ 4096 nodes: bad file description

For now, we set 1024-node performance as reference. It is difficult to construct a input that can run at 4096-node limit without exhausting KNL's limited memory. But if it is not sufficiently large enough, some rank has no data, leading to internal error. Also, we only have small window every month for testing during maintenance.

In any case, we greatly appreciate your insights and follow-through.

As I understand, the more MPI ranks there is, the more memory per node, used by MPI, increases probably linearly. Before closing the issue, could you kindly comment on this point ?

Regards.

PrasanthD_intel · ‎07-22-2020

Hi Viet,

Yes as the MPI processes increases the library takes up more memory which inherently limits the total processes you can launch. Also, the increase is not linear.

Since you are getting memory issues we suggest you use Hybrid programming model of MPI/OpenMP to reduce memory usage.

You can launch 1 rank per node and then launch threads in that process using OpenMP. This will reduce the memory footprint of the MPI library.

Also, you can check Multiple Endpoints support(https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/additional-supported-features/multiple-endpoints-support.html) if you are planning to use the Hybrid programming model.

Regards

Prasanth

Viet-Duc · ‎07-22-2020

We will take your suggestion regarding hybrid MPI/OpenMP approach into consideration.

Thanks.

PrasanthD_intel · ‎07-22-2020

Hi Viet,

Thanks for the confirmation. This issue has been resolved and we will no longer respond to this thread. Please start a new thread if you require additional assistance from Intel. Any further interaction in this thread will be considered community only

Regards

Prasanth