Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Is Intel MPI torus aware?

rcummins64
Beginner
376 Views

I am working on a cluster that is a 3D torus infiniband topology. I am trying to scale a job to more than 300 nodes and I am finding that my jobs fail with what appears to be congestion on the fabic. I have been searching for the cause of the congestion and it appears that the "application" needs to query the subnet manager (SM) for the SLs to properly manage fabric traffic. I did some checking and it appears that my verion of Intel MPI, 4.0.2.003, is not "torus" aware and does not actually query the SM for the SLs to properly route traffic. Can someone either confirm or refute my findings and if refute, please tell me how to tell mpiexec to query the SM for SLs at runtime?

0 Kudos
1 Solution
James_T_Intel
Moderator
376 Views
Hi Robert,

Unfortunately, the Intel MPI Library currently does not support network topology awareness. Some of the collective operations can use topology aware algorithms, and this capability might help you. The full list of algorithms and details of how to set them are in section 3.5.1 of the Intel MPI Library for Linux* OS Reference Manual. As an example, with MPI_Bcast, you can set I_MPI_ADJUST_BCAST=4 to use the topology aware binomial algorithm for all message sizes when using MPI_Bcast.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

View solution in original post

0 Kudos
3 Replies
James_T_Intel
Moderator
377 Views
Hi Robert,

Unfortunately, the Intel MPI Library currently does not support network topology awareness. Some of the collective operations can use topology aware algorithms, and this capability might help you. The full list of algorithms and details of how to set them are in section 3.5.1 of the Intel MPI Library for Linux* OS Reference Manual. As an example, with MPI_Bcast, you can set I_MPI_ADJUST_BCAST=4 to use the topology aware binomial algorithm for all message sizes when using MPI_Bcast.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
rcummins64
Beginner
376 Views
Thanks for the prompt reply. One final question, is it safe to assume that even with setting I_MPI_ADJUST_BCAST=4 this is not likely to scale to 1000 nodes? I could see getting over the next "hump" but 3x where I fall over now?
0 Kudos
James_T_Intel
Moderator
376 Views
Hi Robert,

The only answer I have for that is to try it and see. I don't have the details of the collective algorithms used and thus can't comment on their scalability. There are other topology aware algorithms (MPI_Bcast has 3), my recommendation is to try them on your system with your application and use thecombination that works best for you.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Reply