change other MPI environment variables, particularly any that would tune MPI for the MIC system architecture?
As a side question, has anyone written a Tuning and Tweaking guide for IMPI for Phi? For example, what I_MPI variables could one use to help tune an app targeting 480 ranks across 8 Phis?
Are the MPI processes single threaded?
If yes, then you should realize that 480 ranks over 8 Phis results in one thread per core (assuming you somehow restrict one process per core).
With Xeon Phi, a second hardware thread running within a core is almost free. Therefore, consider using 960 ranks over 8 Phis (also try 1440).
While I haven't done this, you might try I_MPI_PIN_DOMAIN=core, or, I_MPI_PIN_DOMAIN=480:scatter
I am not sure how this applies when you have multiple MIC's (as to if this is also multiple nodes).
What you asked for (480 ranks) is one process per core across 8 MICs.
Tim Prince may be able to answer this better.
good point - for Phi you need 2 or more threads per core to get peak out of the core. if the MPI processes are single threaded, the performance on Phi may be disappointing.