The Intel MPI Library provides a way to control the value of the spin count when polling the fabrics on a cluster by using the I_MPI_SPIN_COUNT env variable. Its argument should be the number of times the polling loop will spin before freeing the processors. Generally, a smaller value with release the processor more frequently.
The default settings are looping 1 time for the sockets/shared memory devices and 250 times when using RDMA-based devices. You can try changing the value and see if it makes a difference for your application. More information is in the Intel MPI Library Reference Manual in the
I played around with I_MPI_SPIN_COUNT already and it doesn't seem to make a difference. I've also run my application under MPICH2 and MVAPICH2 without this problem.
Please try the latest Intel MPI Library 3.1 with the I_MPI_WAIT_MODE environment variable set to enable. It shouldwork for following the sock, shm, or ssmdevices. Let us know how the suggestion help.
Best regards, Andrey