Have you tried using the tuning capability? This is described in the Intel MPI Library Reference Manual (Chapter4 in Windows* or Chapter3 in Linux*). We provide an automatic tuning utility that will test different values for the tuning parameters based on results from the Intel MPI Benchmarks or a different, user-specified application.
This should give you a starting point to improve your application's performance. If you can provide more details about the application (resource and communication usage in particular) and the systems in questions, I can try to give some additional information.
Sincerely, James Tullos Technical Consulting Engineer Intel Cluster Tools