hidden text to trigger early load of fonts ПродукцияПродукцияПродукцияПродукция Các sản phẩmCác sản phẩmCác sản phẩmCác sản phẩm المنتجاتالمنتجاتالمنتجاتالمنتجات מוצריםמוצריםמוצריםמוצרים
Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2231 Discussions

IMPI oversubscribing CPUs to ranks

Tim_Pook
Beginner
1,509 Views

 

Context:

Running job via PBS Pro batch scheduler on compute node with 128 cores.

When requesting 64 cores for the job, only 32 cores are used ( found via htop )

When requestion 128 cores for the same job, it uses all 128 cores.

No hyperthreading.

64 core job:

Pins cpu 0 to rank 0 and 32, cpu 1 to rank 1 and 33 etc. Thus, cpu's 32-63 are ignored.

64 cores - MPI debug64 cores - MPI debug

These are the other enabled environment variables:

Other envvarsOther envvars

 

So far I've fixed this with I_MPI_HYDRA_TOPOLIB=ipl, but this also causes other issues when trying to run jobs over InfiniBand so isn't ideal. Also, the pinning behaviour isn't desirable as shown in screenshot below.

Screenshot 2021-09-15 at 12.51.33 PM.png

 

Any advise on how to enforce proper process pinning would be very helpful.

 

0 Kudos
1 Solution
Tim_Pook
Beginner
1,483 Views

Manage to fix this problem.

 

Though it was an issue with MPI and it only occured when we updated from intel mpi 2018 to intel mpi 2019, but was actually caused by PBS / cgroups, where scheduler was forcing the job to only use 32 cores.

 

 

View solution in original post

0 Kudos
2 Replies
Tim_Pook
Beginner
1,484 Views

Manage to fix this problem.

 

Though it was an issue with MPI and it only occured when we updated from intel mpi 2018 to intel mpi 2019, but was actually caused by PBS / cgroups, where scheduler was forcing the job to only use 32 cores.

 

 

0 Kudos
ShivaniK_Intel
Moderator
1,471 Views

Hi,

 

Thanks for reaching out to us.

 

Glad to know that your issue is resolved. Thanks for sharing the solution with us. If you need any additional information,  please post a new question as this thread will no longer be monitored by Intel.

 

Thanks & Regards

Shivani

 

0 Kudos
Reply