Since moving to more recent Intel MPI versions that use the mlx provider (FI_PROVIDER=mlx) and shm:ofi fabric, we began noticing our code crashing for larger problems that use to run without issue. We tracked down the problem to the fact that the maximum mpi tag value when using the mlx provider is much smaller than when using the dapl fabric (Intel MPI 2018 and before) or the sockets provider and shm:ofi fabric (Intel MPI 2019 and later).
A test case and output is attached for Intel MPI 2018 through 2021. For 2018 with the dapl fabric, the maximum tag value is 2147483647. For Intel 2019 and later with the sockets provider, the maximum tag value is 1073741823. For Intel 2019 and later with the mlx provider, the maximum tag value drops all the way to 1048575.
We realize that the maximum tag value is only guaranteed to be larger than 32768, but this is a drastic drop and Intel has suggested us to use the mlx provider in other threads. We understand that the mlx provider may be outside of the Intel MPI library, but is there any way that we increase the size of the maximum tag value when using the mlx provider?
Thanks for providing us with the sample code and script. Yes, we too have observed the maximum value of tag for the mlx provider has been reduced.
However, regarding a way to increase tag size, we will let you know if it is possible after contacting the internal team.
We will get back to you soon.
Thanks for being patient.
I am escalating this thread to the internal team. They will be looking into the issue and will back to you soon.
The maximum tag value was changed from Intel MPI 2018 to 2019 due to implementation reasons. I am sorry, but there is no option to increase the value.
You noticed already that the MPI 3.1 specification only guarantees a value of 32K (see page 27 of https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf). The actual value has to be queried by using MPI_Comm_get_attr. Not doing it creates a portability risk. If it is difficult for an application to stay in the tag values range by reusing values the suggested solution is to use more MPI communicators. For example, in a hybrid MPI/OpenMP code with message exchange between threads (MPI_THREAD_MULTIPLE) a communicator per thread pair could be used (with up to the value of the attribute MPI_TAG_UB different tags per communicator).