I'm using a tool (dplace) that binds processes/threads to specific processor cores to improve my code performance.
When using a MPI library, I have to skip the "shepherd process". However, I do not find any information on "shepherd processes" for Intel MPI
(is there only 1 shepherd process that is created for the entire pool of MPI processes or 1 shepherd process is created for each MPI process?).
Thanks in avance!
I guess both of you are discussing a linux implementation. One of those "shepherd" threads is started by pthreads library and so is common to libgomp and libiomp5, or any use of pthreads. Another is started by libiomp5 (and would happen when using it with gcc as well as icc).
In rare circumstances, "shepherd" threads might busy one logical processor of a core when HT is enabled and worker threads pinned according to OMP_PLACES=cores can still run on the other logical process. In any case, you would want to avoid disturbing your affinity setting topology for worker threads on account of those shepherd threads.
some interesting prior references:
As Intel MPI and more recently OpenMPI adopted their own hybrid pinning schemes, it may not be surprising that references to external schemes are fairly old.