- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been analyzing MPI parallel programs, getting the values of several hardware counters through the PAPI library on Intel(R) Xeon(R) Platinum processor. In particular I was interested in the vector instructions of my code. But , when checking the values of the counters of vector instructions (PAPI_VEC_INS) from just the MPI calls (MPI_Send, MPI_Recv, MPI_Allreduce,...) I realized that they were 0 (the MPI library version is 2023.2.0).
I thought there should be vector instructions on that operations to make memory copies (MPI_Recv, MPI_Send..) or to perform reduction operations (MPI_Allreduce).
I repeated the experiments on a local installation with the same results. During the installation, I haven't had any opportunity to set any option related to allow vectorization.
In addition to this, I also tried the same execution trying to force to use shared-memory (just in case) by setting the environment variable I_MPI_SHM=auto / I_MPI_SHM=bdw_avx2
with the same result.
Is there anything else I should configure to enable vector instructions inside the Intel MPI library? Or it is just the PAPI counters cannot catch the values inside the library for any reason?
I also found this paper from 2022:
https://www.netlib.org/utk/people/JackDongarra/PAPERS/using-long-vector-pc-2022.pdf
which made me think that at least in that version the reductions were not using vector instructions. Are the last versions of IMPI library still not using vector instructions? This really surprised me but I guess there is an explanation for that.
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@gutrera we have multiple code paths inside the library, and the code path is determined at runtime not during installation. If you can share some benchmarks that show OpenMPI is faster we will look into it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks @TobiasK for your response. I guess I haven't explained myself correctly: I am trying to figure out if the code of the Intel MPI library (in any code path) uses vector instructions. This comes out because after profiing the MPI calls (using PAPI library) of several benchmarks (code using MPI_Allreduce, MPI_Isend, MPI_Send, MPI_Rrecv), it seems that the number of vector instructions are ZERO.
So I would like to confirm: is there any code path, where Intel MPI library uses vector instructions in the context of running the code on Intel Xeon processors, 2023.2.0 iMPI version, and performing reductions and sending/receiving operations? . If the answer is Yes, then I would like to know why my code is not taking advantage of that, or maybe PAPI counters couldn't catch the hardware counters inside the IMPI library?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am not aware of how we could hide those instructions from PAPI counters. However, I also cannot share which algorithms use and which do not use which instructions. You cannot influence which instructions are used. But again you should not be too concerned about such low level implementation details if the performance is fine.
Best
Tobias
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page