- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I think I have a problem with process pinning, for older version of Intel MPI (4.0.1). The version cannot be changed because it is bundled with the user's application ( Accelrys Material Studio) and there are tons of scripts surrounding it. The code works when started interactively, but when run under the Torque batch system, there are following messages:
[6] MPI startup(): set domain {10,11} fails on node XXX.local
[5] MPI startup(): set domain {9} fails on node XXX.local
[7] MPI startup(): set domain {10,11} fails on node XXX.local
[4] MPI startup(): set domain {9} fails on node XXX.local
The code then fails when is run accross the nodes, or runs slow within a single node. I can run the same code, same data interactibvely accross the nodes, and I dont see teh "set domain" messages.
Our site uses Torque cpusets. So I suspect the difference between running interactively or from a batch script is the cpusets and pinning of the processes.
First question: am I correct? What do these "set domain fails" messages really mean?). Torque gives the list of CPU cores allocated for the job in its cpuset: /dev/cpuset/torque/JOB_ID/cpus will contain something like "8-11" or "0-7". I have tried to pass it to Intel MPI as follows:
range=`cat /dev/cpuset/torque/$PBS_JOBID/cpus`
export I_MPI_PIN=enable
export I_MPI_PIN_PROCS=$range
[... RunMatServer.sh starts ...]
It seems to pin cores to something. And I dont get the "set domain" messages anymore. The second question is, is that a right/correct way to interface Torque cpusets to IntelMPI jobs?
--
Grigory Shamov
University of Manitoba / Westgrid
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page