Dear Intel HPC community,
I’m trying to optimize the scalability of my hybrid Fortran MPI/OpenMP code based on PETSc and MUMPS that I’m running on a SGI ICE-X cluster and am seeking advice and good practice rules for optimum process bindings/affinity.
The code executes in a pure MPI fashion until MUMPS is called and that is when hybrid MPI/multithreading is exploited (calls to multithreaded BLAS routines)
The cluster CPU architecture is heterogeneous, compute nodes can be Westmere, Sandy Bridge, Haswell and Broadwell.
I’m compiling the code with appropriate ‘-x’ arch options and -openmp, with the Intel Fortran compiler 17.0.6, the Intel MPI Library 2018.1.163 and also linking with the multithreaded Intel MKL 2017.6.
PBSPro is used to launch jobs on the cluster, where I can choose in the script nselect, ncpus, mpiprocs, ompthreads.
I did a lot of research and reading about affinity - but as most presentations and webpages I saw about hybrid MPI/OMP situations are neither generic nor clear enough for me I’d rather ask here.
I appreciate your support and am happy to provide additional information.
Thank you in advance!
Link Copied
For more complete information about compiler optimizations, see our Optimization Notice.