I tried to pin processes to core allocated by SGE which support processor affinity(over 6.2u5). The normal MPI programs work by using I_MPI_PIN_PROCESSOR_LIST. However, The Hybrid(MPI + OpenMP) programs does not work by usign I_MPI_PIN_DOMAIN.
That probably goes from the misunderstanding of the I_MPI_PIN_DOMAIN logic.
I_MPI_PIN_DOMAIN doesn't limit number of processors to the one you used in a mask. It creates domains! In your case (-genv I_MPI_PIN_DOMAIN ) 2 domains will be created: the first one will contain only one processor - 1th one, and the second domain will contain all other processors. The problem here is that a domain with 0-th processor will be used first. That is why you see such behaviour.
Much better to use not exact mask but domain size. For example you know that you have a processor with 4 cores on each and 2 processors in a socket (8 cores). You can create 2 domains size of 4 and Intel MPI library will automatically create these domains so that processes will be allocated as close as possible inside of a domain. (I_MPI_PIN_DOMAIN=4) Or even better to use I_MPI_PIN_DOMAIN=socket.
You can create domains so that processes will be allocated to share cache memory, e.g.: I_MPI_PIN_DOMAIN=cache2 Any MPI process and its openMP threads will share on domain.
Might be you need to try I_MPI_PIN_PROCESSOR_LIST environment variable?