I am currently using the MPI distributed graph topologies and I allow rank reordering (https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node195.htm#Node195).
However, after small tests, I noticed that the Intel MPI (2019) does not reorder my ranks.
Since it increases the complexity of the code, I would like to be sure that in some cases it will be useful.
Does Intel MPI reorder the ranks for MPI topologies? If yes, what are the requirements (machine files etc...)?
Thank you very much for your help!
I am Thomas' colleague and we ran together into this unexpected behavior of the MPI_Dist_graph_create routine. This might come from the fact that we misunderstand the specifications of the routine: we actually did not find that much documentation on it.
I provide a reproducer in attachement. Here is the rationale of the example:
An allocation of 4 cpus is requested on a cluster, evenly scattered across 2 nodes. Obviously, the ordering obtained from MPI_COMM_WORLD is such that ranks 0 and 1 are located on the first node, and ranks 2 and 3 are on the second node. Now, we want to specify that ranks 0 and 2 will have to communicate a lot together, and the same for 1 and 3. In that sense, an optimised ordering of the ranks should group those ranks on each node, so as to benefit from the faster inner-node communications. So, we use MPI_Dist_graph_create_adjacent to create a graph with two edges: one between 0 and 2, and one between 1 and 3. We would expect ranks to be permuted in the new communicator, e.g. with ranks 0 and 2 on the first node.
In practice, with all the tests we have been running on various clusters and with various graphs, we never managed to obtain rank reordering from MPI_Dist_graph_create_adjacent, even though the argument 'reorder' was set to 1. We always got the same ordering as in MPI_COMM_WORLD. Also, we did not find what values can be used for 'info', and what are the consequences on the optimization.
Additionally, are there extra information to give during the installation or the initialisation of IMPI, so that it be aware of the network architecture of the cluster and can produce sensible optimizations?
Thank you in advance for your support,