Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2202 Discussions

Looking for equivalent options from Cray MPI to Intel MPI

crtierney42
New Contributor I
580 Views
Cray MPI (based on mpich2) provides ways to change several internal defaults.
They are used to get a code that my user has running on an XT. I know there
are issues with the code, but if the options exist in Intel MPI, it would get us
going faster. I have looked through the Intel MPI 4.0 Beta manual, and didn't
find any equivalents, but I am asking the question in case I missed the option
or there are some undocumented options that could help:

1) MPICH_MAX_SHORT_MSG_SIZE - Determines when to use Eager vs. Rendevous protocol,
does this just sound like I_MPI_RDMA_EAGER_THRESHOLD?

2) MPICH_PTL_UNEX_EVENTS - Define the total number of unexpected events allowed

3) MPICH_UNEX_BUFFER_SIZE - Set the buffer size for the unexpected events

4) MPICH_ENV_DISPLAY - Display all settings used by the MPI

Options 2 and 3 are the most important, as I believe the code path that hangs
when run with 2000+ cores is sending too many unexpected events.

Thanks,
Craig

0 Kudos
2 Replies
Gergana_S_Intel
Employee
580 Views

Hi Craig,

Quoting - crtierney42
1) MPICH_MAX_SHORT_MSG_SIZE - Determines when to use Eager vs. Rendevous protocol,
does this just sound like I_MPI_RDMA_EAGER_THRESHOLD?

Indeed, you're exactly right. I_MPI_EAGER_THRESHOLD (without the RDMA in the name) sets the cutoff valuebetween using the eager or rendezvous protocols for all devices. The default is ~260KB - any messages shorter or equal to that will use eager, any messages larger will use rendezvous.

Quoting - crtierney42
2) MPICH_PTL_UNEX_EVENTS - Define the total number of unexpected events allowed
3) MPICH_UNEX_BUFFER_SIZE - Set the buffer size for the unexpected events

You can take a look at the description for the I_MPI_DAPL_CONN_EVD_SIZE env variable. This is used to define the size of the event queue. The default value is [2*(#procs) + 32] but you can go ahead and try increasing it. Reading the description for MPICH_PTL_UNEX_EVENTS, it seemed to be the most related.

Alternatively, when you say "unexpected events", it makes me think you have some issue scaling out using OFED - is that correct? In this case, simply updating to the latest DAPL drivers should help. What OFED and/or DAPL versions do you have installed?

If you've upgraded to OFED 1.4.1, it contains the new Socket CM (scm) provider instead of the existing cma one (e.g. OpenIB-cma). The new one handles scalability a lot better so you can give that a try. Again, this is just speculation on my part, since I'm not sure what errors you're really getting.

Quoting - crtierney42
4) MPICH_ENV_DISPLAY - Display all settings used by the MPI

Set I_MPI_DEBUG=1001 - this is the highest value possilble for the library. At the startup of the job, Intel MPI Library will print out all env variables it's using.

I hope this helps. Let us know how it goes or if you have further questions (or if I misunderstood any of your questions).

Regards,
~Gergana

0 Kudos
crtierney42
New Contributor I
580 Views

Hi Craig,


Indeed, you're exactly right. I_MPI_EAGER_THRESHOLD (without the RDMA in the name) sets the cutoff valuebetween using the eager or rendezvous protocols for all devices. The default is ~260KB - any messages shorter or equal to that will use eager, any messages larger will use rendezvous.

Quoting - crtierney42
2) MPICH_PTL_UNEX_EVENTS - Define the total number of unexpected events allowed
3) MPICH_UNEX_BUFFER_SIZE - Set the buffer size for the unexpected events

You can take a look at the description for the I_MPI_DAPL_CONN_EVD_SIZE env variable. This is used to define the size of the event queue. The default value is [2*(#procs) + 32] but you can go ahead and try increasing it. Reading the description for MPICH_PTL_UNEX_EVENTS, it seemed to be the most related.

Alternatively, when you say "unexpected events", it makes me think you have some issue scaling out using OFED - is that correct? In this case, simply updating to the latest DAPL drivers should help. What OFED and/or DAPL versions do you have installed?

If you've upgraded to OFED 1.4.1, it contains the new Socket CM (scm) provider instead of the existing cma one (e.g. OpenIB-cma). The new one handles scalability a lot better so you can give that a try. Again, this is just speculation on my part, since I'm not sure what errors you're really getting.

Quoting - crtierney42
4) MPICH_ENV_DISPLAY - Display all settings used by the MPI

Set I_MPI_DEBUG=1001 - this is the highest value possilble for the library. At the startup of the job, Intel MPI Library will print out all env variables it's using.

I hope this helps. Let us know how it goes or if you have further questions (or if I misunderstood any of your questions).

Regards,
~Gergana


Sorry for the delay in response. These information should help out.

As far as #3, it isn't a scalability issue. The MPI code does not post its receives before the sends, and as I have been told by the experts that this causes MPI to use the unexpected buffers to store the messages. If there are too many, then things go bad. The user solve the problem on the Cray by increasing that buffer to 126MB and the number of events (#2) to 81920.

Really, the code is broke, but there are other problems in the code where these settings have solved the problems. I will pass on the information for testing.

Craig

0 Kudos
Reply