I have a cluster of 8-sock quad core systems running Redhat 5.2. It seems that whenever I try to run multiple MPI jobs to a single node all the jobs end up running on the same processors. For example, if I were to submit 4 8-way jobs to a single box they all end up in CPUs 0 to 7, leaving 8 to 31 idle.
I then tried all sorts of I_MPI_PIN_PROCESSOR_LIST combinations but short of explicitly listing out the processors at each run, they all end up still hanging on to CPUs 0-7. Browsing through the mpiexec script, I realise that it is doing a taskset on each run.
As my jobs are all submitted through a scheduler (PBS in this case) I cannot possibly know at job submission time which CPUs are not used. So is there a simple way to tell mpiexec to set the taskset affinity correctly at each run so that it will choose only the idle processors?
I_MPI_PIN=off) is a viable option.
I_MPI_PIN_DOMAIN=auto. You can either do so for all jobs on the node, or for each subsequent job (job 1 will still be pinned to cores 0-7).
Could you please look at my post http://software.intel.com/en-us/forums/topic/365457
Gergana Slavova (Intel) wrote:
Certainly, disabling process pinning altogether (by setting I_MPI_PIN=off) is a viable option.
Another workaround we recommend is to let Intel MPI Library define processor domains for your system but let the OS take over in pinning to available "free" cores. To do so, you need to simply set I_MPI_PIN_DOMAIN=auto. You can either do so for all jobs on the node, or for each subsequent job (job 1 will still be pinned to cores 0-7).
What's really going on behind the scenes is that, since domains are defined as #cores/#procs, we're setting the #cores here to be equal to the #procs (so you have 1 core per domain).
Note that you can only use this if you have Intel MPI Library 3.1 Build 038 or newer.
I hope this helps. Let me know if this improves the situation.