Integration of IntelMPI with Torque/Maui for resource tracking and job management
Hello, we have installed the entire Intel S/W stack (compilers, perf libs/MKL, MPI) etc. on a production Linux HPC cluster at our UIniversity.
We are using a Torque/Maui scheduler for batch jobs.
We are wondering how much does IntelMPI integrates with this batch scheduler. We are encountering the following issues and we do not know if they are supported with IntelMPI as they are not addressed at the documents.
1) Launch MPI tasks via the PBS scheduler method and avoid using the MPDs via ssh to remote nodes;
2) MPI process tracking by a scheduler? Does mpirun pass this info to the scheduler so that when say we suspend or kill an IntelMPI job the scheduler can track all the involved processes and suspend or kill them?
3) Memory use enforcement: one important issue related to 2) above, is mem usage tracking by the scheduler of the total amounts of memory totally consumed on a node by the tasks of an MPI job running there: Mauoi cannot track the total memory usage to enforce memory usage as IntelMPI does no tell it which processes to track for memory (and other resource) consumption.
Can IntelMPI provide this info to Torque/Maui so that we can use the native scheduler mechanisms to track memory usage and let the scheduler enforce the memory limits by say killing a job which violates it?
4) Documentation for integrating IntelMPI with a batch scheduler is non-existent at least in the ref manual. Where can I find this info?