Intel® HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2153 Discussions

Integration of IntelMPI with Torque/Maui for resource tracking and job management

HPC-TAMU
Beginner
375 Views
Hello, we have installed the entire Intel S/W stack (compilers, perf libs/MKL, MPI) etc. on a production
Linux HPC cluster at our UIniversity.

We are using a Torque/Maui scheduler for batch jobs.

We are wondering how much does IntelMPI integrates with this batch scheduler. We are encountering the
following issues and we do not know if they are supported with IntelMPI as they are not addressed at the
documents.

1) Launch MPI tasks via the PBS scheduler method and avoid using the MPDs via ssh to remote
nodes;

2) MPI process tracking by a scheduler? Does mpirun pass this info to the scheduler so that when say we suspend or kill an IntelMPI job the scheduler can track all the involved processes and suspend or kill them?

3) Memory use enforcement: one important issue related to 2) above, is mem usage tracking by the scheduler of the total amounts of memory totally consumed on a node by the tasks of an MPI job running there: Mauoi cannot track the total memory usage to enforce memory usage as IntelMPI does no tell it which processes to track for memory (and other resource) consumption.

Can IntelMPI provide this info to Torque/Maui so that we can use the native scheduler mechanisms to track memory usage and let the scheduler enforce the memory limits by say killing a job which violates it?


4) Documentation for integrating IntelMPI with a batch scheduler is non-existent at least in the ref manual. Where can I find this info?

Thank you much.

Michael
0 Kudos
1 Reply
HPC-TAMU
Beginner
375 Views
Can someone let me know which batch schedulers can cooperate well with the Inel MPI stack so that the scheduler can track their resource usage and suspend/resume them properly?

Is there any other documentation outside the IMPI ref manual or trhe getting started guides on this issue ?

Thanks .....

Michael -- TAMU
0 Kudos
Reply