I run the program by the following command:
mpiexec -wdir z:\directional -mapall -hosts 10 n01 5 n02 5 n03 5 n04 5 n05 5 n06 5 n07 5 n08 5 n09 5 n10 5 test
The cluster has 10 nodes with 24 logical cores ( 2*Intel(R) Xeon(R) CPU x5675) on every node. The program test have openMP based parallel calculation in some part, but also a considerable part is not parallelized. However, the problem is that the program 'test' only use 4 cores when running in parallel part (total CPU usage is only 80%), I noticed that when set I_MPI_PIN_DOMAIN=omp, every process 'test' will use all 24 cores. I have tested the program 'test' on one node by
mpiexec -wdir z:\directional -mapall -n 5 test
The program 'test' runs what I wanted (total CPU usage is 100% when in parallel part).
Now the problem is that the first command failed after I set I_MPI_PIN_DOMAIN=omp:
Fatal error in MPI_Init: Other MPI error, error stack:
MPID_Init(195).......................: channel initialization failed
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available.
What should I do to let the program use 100% CPU on every node?
Thanks for your quick reply. I have already set omp_num_threads=24 on every node which means that the parallel part of the program can use 100% CPU resource.
I have tested on one node by
mpiexec -n 5 test
with and without setting I_MPI_PIN_DOMAIN=omp. If don't set that, the total CPU usage is only 80% and if set that, the total CPU usage is 100%. The program runs a little faster when setting I_MPI_PIN_DOMAIN=omp.
Now the problem is that the mpiexec can't work for 10-node cluster after set I_MPI_PIN_DOMAIN=omp.