- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Context:
Running job via PBS Pro batch scheduler on compute node with 128 cores.
When requesting 64 cores for the job, only 32 cores are used ( found via htop )
When requestion 128 cores for the same job, it uses all 128 cores.
No hyperthreading.
64 core job:
Pins cpu 0 to rank 0 and 32, cpu 1 to rank 1 and 33 etc. Thus, cpu's 32-63 are ignored.
These are the other enabled environment variables:
So far I've fixed this with I_MPI_HYDRA_TOPOLIB=ipl, but this also causes other issues when trying to run jobs over InfiniBand so isn't ideal. Also, the pinning behaviour isn't desirable as shown in screenshot below.
Any advise on how to enforce proper process pinning would be very helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Manage to fix this problem.
Though it was an issue with MPI and it only occured when we updated from intel mpi 2018 to intel mpi 2019, but was actually caused by PBS / cgroups, where scheduler was forcing the job to only use 32 cores.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Manage to fix this problem.
Though it was an issue with MPI and it only occured when we updated from intel mpi 2018 to intel mpi 2019, but was actually caused by PBS / cgroups, where scheduler was forcing the job to only use 32 cores.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
Glad to know that your issue is resolved. Thanks for sharing the solution with us. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Thanks & Regards
Shivani
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page