I found out that "mpirun" is allocating different parallel jobs in the same cpu in a particular node, with a big lost of efficiency.
For example, job1 is submitted to run in 4 cores and it allocate the first 4 cpus in node n001 (node n001 has 16 cores); a second job2 is submitted to run in 4 cores (mpirun -n 4 exe) and in principle, it should run in the next available 4-free-cpus. However, it is not happen like that. The two jobs are sharing the same 4 cpus with a efficiency of 50% along of the run.
I compiled openmpi and I tesed it. I do not have this problem with openmpi.
Have someone found this problem before?
Is there a simple solution for that?
Any help is highly welcome.
Juarez L. F. Da Silva
Of course, the usual way to keep MPI jobs separate on a cluster is to use the job scheduler, such as PBS, torque, SGE.
tim18 is right - the best way to control workload is using of job shedulers. Intel's mpirun has internal support of PBS Pro. If you need to use SGE please read this document.
If you want to control cpu allocation on your own you can use I_MPI_PIN_DOMAIN env variable - please read the Reference Manual for detailed description.
Generally speaking, mpirun or mpiexec don't know about cpu usage.
I am using Torque PBS queue system to manage all jobs in the cluster, however, it does not solve the problem.
I have several nodes with 16 cores and I would like to run 4 parallel jobs per node using 4 cores each. At the moment, using MPI mpirun from intel, and submited using torque PBS system, all 4 parallel jobs submitted at the same node share 4 cpus, i.e., 25% for each parallel job, while all other 12 cpus are empty. This happen with a submission using PBS or locally. It also happen by submitting locally in the head node using mpirun. I understand, that somehow the configuration is to run ONE parallel job per node and I would like to change it to allowed to run at least 4 parallel jobs per node.
I will check all your suggestions.
Juarez L. F. Da Silva
There was a thread earlier within this forum that dealt with a similar issue. Here are all the details.
Basically, you can also try setting the
I_MPI_PIN_DOMAIN environment variable to auto before running each Intel MPI job. For example:
$ export I_MPI_PIN_DOMAIN=auto
$ mpirun ...
or, alternatively (if you're using
$ mpirun -r ssh -genv I_MPI_PIN_DOMAIN auto ...
Let us know how it goes.