Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Tim_Pook
Beginner
79 Views

IMPI oversubscribing CPUs to ranks

Jump to solution

 

Context:

Running job via PBS Pro batch scheduler on compute node with 128 cores.

When requesting 64 cores for the job, only 32 cores are used ( found via htop )

When requestion 128 cores for the same job, it uses all 128 cores.

No hyperthreading.

64 core job:

Pins cpu 0 to rank 0 and 32, cpu 1 to rank 1 and 33 etc. Thus, cpu's 32-63 are ignored.

64 cores - MPI debug64 cores - MPI debug

These are the other enabled environment variables:

Other envvarsOther envvars

 

So far I've fixed this with I_MPI_HYDRA_TOPOLIB=ipl, but this also causes other issues when trying to run jobs over InfiniBand so isn't ideal. Also, the pinning behaviour isn't desirable as shown in screenshot below.

Screenshot 2021-09-15 at 12.51.33 PM.png

 

Any advise on how to enforce proper process pinning would be very helpful.

 

0 Kudos
1 Solution
Tim_Pook
Beginner
53 Views

Manage to fix this problem.

 

Though it was an issue with MPI and it only occured when we updated from intel mpi 2018 to intel mpi 2019, but was actually caused by PBS / cgroups, where scheduler was forcing the job to only use 32 cores.

 

 

View solution in original post

2 Replies
Tim_Pook
Beginner
54 Views

Manage to fix this problem.

 

Though it was an issue with MPI and it only occured when we updated from intel mpi 2018 to intel mpi 2019, but was actually caused by PBS / cgroups, where scheduler was forcing the job to only use 32 cores.

 

 

View solution in original post

ShivaniK_Intel
Moderator
41 Views

Hi,

 

Thanks for reaching out to us.

 

Glad to know that your issue is resolved. Thanks for sharing the solution with us. If you need any additional information,  please post a new question as this thread will no longer be monitored by Intel.

 

Thanks & Regards

Shivani

 

Reply