- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a cluster single node of 4 socket total 32 corers. The systems running Redhat 6.3, and intelmpi 4 update 3. I am using Slurm to start mpi jobs. It seems that whenever I try to run multiple MPI jobs to a single node all the jobs end up running on the same processors. Moreover i notice that the job use all the cores in the node. For example: i started with the first mpi job using Slurm on the node with 8 cores; and i notice that the first mpi task run on 0 to 3 cpus, the2-ndmpi task on 4-7 cpus, and so on the last task on 28-31. Each mpi task used 4 cores instead 1. i started the 2-nd job with 8 cores, and i notice the same and they run on the same 32 cpus of the first job.
there a way to tell mpirun using Slurm to set the taskset affinity correctly at each run so that it will choose only the idle processors according the Slurm?
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As far as I know, you must set I_MPI_PIN_DOMAIN=off in order for this to work at all. If you can demonstrate value for receiving CPU assignments from slurm, you might file a feature request. I don't thnk spllitting it down to the core level for separate jobs is likely to work well. Maybe splitting down to socket level could be useful. You could make a case that clusters with nodes of 4 or more CPUs will be more valuable with such a feature.
If your request turns out to be out of the main stream, you might have to script it yourself, using KMP_AFFINITY or the OpenMP 4.0 equivalent to assign cores to each job.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi David,
If you are using cpuset, the current version of the Intel® MPI Library does not support it. The next release will, so if that is the case, just sit tight for a bit longer.
If not, let me know and we'll work from there.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks james,
I am not using cpuset. I assume that slume that will do the work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi David,
I misunderstood your original question, so let's change approach. We do not currently check resource utilization from the job manager. Internally, we typically reserve an entire node for a single job when running, as two different MPI jobs do not communicate with each other.
At present, the only way to do this is manually. You'll need to get a list of available cores from SLURM*. Is your application single or multi-threaded?
If single threaded, then you'll set I_MPI_PIN_PROCESSOR_LIST to match the available (and desired) cores, with one rank going to each core. This will define a single core for each rank to use.
If multi-threaded, then you'll set I_MPI_PIN_DOMAIN instead. This will set a group of cores available for each rank, and you'll use KMP_AFFINITY to control the thread placement within that domain.
There are quite a few syntax options for each of these variables, so please check the Reference Manual for full details.
As Tim said, if you're interested, I can file a feature request for this capability.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
![](/skins/images/BB1F1F4A87ADD5519B4C7EA2DE1D225A/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page