Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Intel MPI library vectorization

gutrera
Beginner
412 Views

I have been analyzing MPI parallel programs, getting the values of several hardware counters through the PAPI library on Intel(R) Xeon(R) Platinum processor. In particular I was interested in the vector instructions of my code. But , when checking the values of the counters of vector instructions (PAPI_VEC_INS)  from just the MPI calls (MPI_Send, MPI_Recv, MPI_Allreduce,...) I realized that they were 0 (the MPI library version is 2023.2.0).

I thought there should be vector instructions on that operations to make memory copies (MPI_Recv, MPI_Send..) or to perform reduction operations (MPI_Allreduce).

I repeated the experiments on a local installation  with the same results. During the installation, I haven't had any opportunity to set any option related to allow vectorization.

In addition to this, I also tried the same execution trying to force to use shared-memory (just in case) by setting the environment variable I_MPI_SHM=auto / I_MPI_SHM=bdw_avx2

with the same result.

Is there anything else I should  configure to enable vector instructions inside the Intel MPI library? Or it is just the PAPI counters cannot catch the values inside the library for any reason?

I also found this paper from 2022:

https://www.netlib.org/utk/people/JackDongarra/PAPERS/using-long-vector-pc-2022.pdf

which made me think that at least in that version the reductions were not using vector instructions. Are the last versions of IMPI library still not using vector instructions? This really surprised me but I guess there is an explanation for that.

Thanks

0 Kudos
3 Replies
TobiasK
Moderator
354 Views

@gutrera we have multiple code paths inside the library, and the code path is determined at runtime not during installation. If you can share some benchmarks that show OpenMPI is faster we will look into it.

0 Kudos
gutrera
Beginner
351 Views

Many thanks @TobiasK  for your response. I guess I haven't explained myself correctly: I am trying to figure out if the code of the Intel MPI library (in any code path) uses vector instructions. This comes out because after profiing the MPI calls (using PAPI library) of several benchmarks (code using MPI_Allreduce, MPI_Isend, MPI_Send, MPI_Rrecv), it seems that the number of vector instructions are ZERO. 

So I would like to confirm: is there any code path, where Intel MPI library uses vector instructions in the context of running the code on Intel Xeon processors, 2023.2.0 iMPI version, and performing reductions and sending/receiving operations? . If the answer is Yes, then I would like to know why my code is not taking advantage of that, or maybe PAPI counters couldn't catch the hardware counters inside the IMPI library?

Thanks!

0 Kudos
TobiasK
Moderator
269 Views

I am not aware of how we could hide those instructions from PAPI counters. However, I also cannot share which algorithms use and which do not use which instructions. You cannot influence which instructions are used. But again you should not be too concerned about such low level implementation details if the performance is fine.

Best
Tobias

0 Kudos
Reply